VolGAN

Deep Learning

March 2025

Abstract

( Paper Notes )

Notes on 'VolGAN: a generative model for arbitrage-free implied volatility surfaces'

Quick TLDR.

VolGAN is a GAN with it's generator trained to optimize an objective function a regularization term for the moneyness ( $M$ ) dimension and the time to maturity ( $\tau$ ) dimension to avoid arbitrage when generating the implied volatility surface.

It's discriminator, as similar to a conditional GAN takes in the output to the generator and a prior set of data points for the past implied volatility surface, to play the minimax game where the generator is trained to output synthetic realistic scenarios relative to the past implied volatility surface.

The outputs to the generator are reweighted which are a means to reduce the arbitrage, forcing the generator to produce arbitrage free values.

Abstract

VolGAN is a generative adverserial network for generating synthetic arbitrage-free implied volatility surfaces

Where arbitrage is when traders can exploit abrupt changes in the IVS, arbitrage-free indicates a smooth IVS with no illogical pricing errors.

VolGAN is trained on a series of implied volatility surfaces and underlying prices of each option to generate a realistic IVS and the underlying price of the asset.

Accounts for not only scenarios based on Gaussian probabilities, but abrupt non-Guassian changes in the IVS, in a logical arbitrage-free manner.

A generative model for implied volatility surfaces.

As input, VolGAN receives (1) the implied volatility surface at previous date, (2) two previous underlying returns (3) the realized volatility from the previous period to output (1) the return of the underlying asset and (2) the implied volatility surface.

The architecture is a Conditional GAN, composed of a generator and discriminator (as is typical of GANs).

The inputs to the GAN are,

g_t(m, \tau) = \log \sigma_t(m, \tau) \\[3mm] r_t = \log\left(\frac{S_{t+1}}{S_t}\right) \\[3mm] \gamma_t = \sqrt{\frac{21}{252} \sum_{i = 0}^{20} r^2_{t - i}}

where $g_t(\cdot)$ is the implied volatility at $t$ in log-space, $r_t$ is the return at $t$ in log-space, and $\gamma_t$ is the historic volatility over a one-month trading period (21 days out of 252 total possible trading days.).

All aggregated into an input vector $a$ ,

a_t = (r_{t-1}, t_{t-2}, \gamma_{t - 1}, g_t(m, \tau))

The generator takes in $a$ with i.i.d noise, $z_t \sim \mathcal{N}(0, I_d)$ where $I$ is the identity (serving as covariance) matrix and outputs synthetic log-space volatility and returns,

G(a_t, z_t) = (\hat{r}(z_t), \Delta \hat{g}_t(m, \tau)(z_t))

where $\hat{r}_t(z_t)$ is the synthetic predicted return and $\Delta \hat{g}_t(m, \tau)(z_t)$ is the synthetic IVS both in log-space.

The discrminator $D(\cdot)$ is a classifier which takes in an input value $(r, \Delta g)$ , either the output of the generator or the ground-truth realization of the data with a condition vector $a_t$ as defined previously.

$D(a, (r, \Delta g))$ outputs a probability that the input $(r, \Delta g)$ is drawn from the distribution of $(r_t, \Delta g_t)$ given $a_t$ .

Essentially verifying that the output of the generator is probable to have been induced from a prior volatility surface.

Both $G$ and $D$ are defined as feed-forward neural networks with their own respective parameters.

$G$ is a two layer neural network with a hidden layer $\in \mathbb{R}^h$ , a second hidden layer $\in \mathbb{R}^{2H}$ , with a final dense layer of size $\mathbb{R}^{Nt \times Nk}$ outputting the log-implied volatility surface increment with the simulated return of the underlying asset.

The discrminator takes the simulated return with $a$ , through a hidden layer of size $H$ and outputs a single probability value, denoting the probability that the simulated return is inductively sampled from the data.

Training Objective

The loss function for the generator is smoothed as,

L_m(g) = \sum_{i,j} \frac{(g(m_{i+1}, \tau_j) - g(m_i, \tau_j))^2}{|m_{i+1} - m_i|^2} \approx \|\partial_m g\|_{L^2}^2

L_\tau(g) = \sum_{i,j} \frac{(g(m_i, \tau_{j+1}) - g(m_i, \tau_j))^2}{|\tau_{j+1} - \tau_j|^2} \approx \|\partial_\tau g\|_{L^2}^2

J(G)(\theta_d, \theta_g) = -\frac{1}{2} \mathbb{E} [ \log (D (a_t, G(a_t, z; \theta_g); \theta_d)) \\[3mm] + \alpha_m L_m (g_t(m, \tau) \\[3mm] + G(a_t, z_t; \theta_g)|_{2:}) + \beta_\tau L_\tau (g_t(m, \tau) + G(a_t, z_t; \theta_g)|_{2:})]

where $L_m(g)$ is smoothing the moneyness dimension while $L_{\tau}(g)$ is smoothing the time to maturity dimension.

The discriminator simply minimizes the binary cross-entropy loss.

Scenario re-weighting

Assume $\mathbb{P}_0$ is the joint probability distribution output by the generator, $(r_t, \sigma(m, \tau))$ .

To remove opportunities for arbitrage, they reweight the probability $\mathbb{P}_0$ by sampling using a weighting function to the generated samples in the softmax, as:

w^i = \frac{\exp(-\beta \Phi(\hat{\sigma}_i))}{\sum_{j = 1}^N \exp(-\beta\Phi(\hat{\sigma_j}))}

Each weighted scenario is used to compute expectations of various quantities of interest under $\mathbb{P}_{\beta}$

Assume $X$ is the return of the option, then you can quantify the expected return over $N$ steps as

\mathbb{E}_{\beta}[X] = \sum_{i = 1}^N w_ix_i

or for any defintion of $X$ .