Why standard time series models fall short:
What is volatility in finance?
Why does volatility matter?
library(tidyquant)
# Get S&P 500 data
sp500 <- tq_get("^GSPC", from = "2018-01-01", to = "2022-12-31")
# Calculate daily returns
sp500_returns <- sp500 %>%
mutate(returns = log(adjusted/lag(adjusted))) %>%
na.omit()
# Plot the returns
head(sp500_returns)
# A tibble: 6 × 9
symbol date open high low close volume adjusted returns
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ^GSPC 2018-01-03 2698. 2714. 2698. 2713. 3544030000 2713. 0.00638
2 ^GSPC 2018-01-04 2719. 2729. 2719. 2724. 3697340000 2724. 0.00402
3 ^GSPC 2018-01-05 2731. 2743. 2728. 2743. 3239280000 2743. 0.00701
4 ^GSPC 2018-01-08 2743. 2749. 2738. 2748. 3246160000 2748. 0.00166
5 ^GSPC 2018-01-09 2751. 2759. 2748. 2751. 3467460000 2751. 0.00130
6 ^GSPC 2018-01-10 2746. 2751. 2736. 2748. 3579900000 2748. -0.00111
Why do financial markets show volatility clustering?
Information arrival: New information doesn’t come evenly
Market psychology: Fear leads to panic selling
Leverage effects: Falling prices increase debt-to-equity ratios
Trading strategies: Trend-following can amplify market moves
This clustering pattern cannot be captured by standard ARIMA models!
# Load S&P 500 data
sp500_returns <- diff(log(sp500$adjusted))
# 1. Visual inspection of squared returns
par(mfrow=c(2,1))
plot(sp500_returns, main="S&P 500 Returns")
plot(sp500_returns^2, main="Squared Returns (Proxy for Volatility)")
Visual approach:
# 2. Test for ARCH effects
library(FinTS)
arch_test <- ArchTest(sp500_returns, lags=10)
print(arch_test)
ARCH LM-test; Null hypothesis: no ARCH effects
data: sp500_returns
Chi-squared = 491.27, df = 10, p-value < 2.2e-16
Statistical confirmation:
The core insight: Today’s volatility depends on yesterday’s market surprise (squared return)
Think of it like weather:
The simplest ARCH(1) model in everyday terms:
When to use ARCH models:
\[\sigma_t^2 = \alpha_0 + \alpha_1 \varepsilon_{t-1}^2 + ... + \alpha_q \varepsilon_{t-q}^2\]
# Load data
data(EuStockMarkets)
dax_returns = diff(log(EuStockMarkets[,"DAX"]))
# Plot the DAX returns
plot(dax_returns, main="DAX Daily Returns")
# Fit an ARCH(1) model
arch_fit = garchFit(~garch(1,0), data=dax_returns, trace=FALSE)
summary(arch_fit)
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(1, 0), data = dax_returns, trace = FALSE)
Mean and Variance Equation:
data ~ garch(1, 0)
<environment: 0x1325ebc30>
[data = dax_returns]
Conditional Distribution:
norm
Coefficient(s):
mu omega alpha1
7.1817e-04 9.5278e-05 1.0153e-01
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu 7.182e-04 2.347e-04 3.059 0.002218 **
omega 9.528e-05 3.727e-06 25.567 < 2e-16 ***
alpha1 1.015e-01 2.629e-02 3.862 0.000113 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log Likelihood:
5884.652 normalized: 3.165493
Description:
Fri Mar 14 10:08:57 2025 by user:
Standardised Residuals Tests:
Statistic p-Value
Jarque-Bera Test R Chi^2 3718.1392407 0.000000e+00
Shapiro-Wilk Test R W 0.9548541 0.000000e+00
Ljung-Box Test R Q(10) 6.6915749 7.542059e-01
Ljung-Box Test R Q(15) 15.7054310 4.018944e-01
Ljung-Box Test R Q(20) 20.7801474 4.101730e-01
Ljung-Box Test R^2 Q(10) 62.9102787 1.015548e-09
Ljung-Box Test R^2 Q(15) 73.1777572 1.205996e-09
Ljung-Box Test R^2 Q(20) 78.1375073 8.111391e-09
LM Arch Test R TR^2 52.7654566 4.534564e-07
Information Criterion Statistics:
AIC BIC SIC HQIC
-6.327759 -6.318838 -6.327764 -6.324471
Key parameters to examine:
Key diagnostic checks:
The visualization demonstrates “volatility clustering”
A single market shock (blue bar at time 25)
Red line shows how one shock creates a “ripple effect”
Volatility increases immediately after the shock gradually decays back to baseline
Mimics real financial markets:
New information: Market shocks lead to uncertainty
Market microstructure: Trading behavior adapts to new information
Risk premium: Investors require higher returns during volatile periods
Can you think of a major financial event that might have caused volatility clustering?
The limitation of ARCH models:
The GARCH innovation:
\[\sigma_t^2 = \alpha_0 + \sum_{i=1}^{p} \alpha_i \varepsilon_{t-i}^2 + \sum_{j=1}^{q} \beta_j \sigma_{t-j}^2\]
# Specify GARCH model
spec = ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(1, 1)),
mean.model = list(armaOrder = c(0, 0)),
distribution.model = "norm")
# Fit model
fit = ugarchfit(spec, data = dax_returns)
print(fit)
*---------------------------------*
* GARCH Model Fit *
*---------------------------------*
Conditional Variance Dynamics
-----------------------------------
GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
------------------------------------
Estimate Std. Error t value Pr(>|t|)
mu 0.000654 0.000215 3.0377 0.002384
omega 0.000005 0.000000 12.7383 0.000000
alpha1 0.067909 0.005386 12.6085 0.000000
beta1 0.888757 0.008294 107.1583 0.000000
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.000654 0.000248 2.6380 0.008341
omega 0.000005 0.000001 3.2058 0.001347
alpha1 0.067909 0.009442 7.1921 0.000000
beta1 0.888757 0.015864 56.0252 0.000000
LogLikelihood : 5966.213
Information Criteria
------------------------------------
Akaike -6.4144
Bayes -6.4025
Shibata -6.4144
Hannan-Quinn -6.4100
Weighted Ljung-Box Test on Standardized Residuals
------------------------------------
statistic p-value
Lag[1] 0.1979 0.6564
Lag[2*(p+q)+(p+q)-1][2] 0.3455 0.7709
Lag[4*(p+q)+(p+q)-1][5] 0.7956 0.9039
d.o.f=0
H0 : No serial correlation
Weighted Ljung-Box Test on Standardized Squared Residuals
------------------------------------
statistic p-value
Lag[1] 0.1218 0.7271
Lag[2*(p+q)+(p+q)-1][5] 0.3371 0.9797
Lag[4*(p+q)+(p+q)-1][9] 0.5000 0.9985
d.o.f=2
Weighted ARCH LM Tests
------------------------------------
Statistic Shape Scale P-Value
ARCH Lag[3] 0.01531 0.500 2.000 0.9015
ARCH Lag[5] 0.32107 1.440 1.667 0.9346
ARCH Lag[7] 0.37551 2.315 1.543 0.9882
Nyblom stability test
------------------------------------
Joint Statistic: 30.7017
Individual Statistics:
mu 0.7028
omega 0.7205
alpha1 0.2937
beta1 0.1076
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.07 1.24 1.6
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
------------------------------------
t-value prob sig
Sign Bias 1.4171 0.1566
Negative Sign Bias 0.7997 0.4240
Positive Sign Bias 0.4247 0.6711
Joint Effect 4.2246 0.2382
Adjusted Pearson Goodness-of-Fit Test:
------------------------------------
group statistic p-value(g-1)
1 20 97.57 1.473e-12
2 30 131.18 5.970e-15
3 40 131.77 5.384e-12
4 50 193.85 4.032e-19
Elapsed time : 0.04685998
# Extract forecasted values
sigma_values <- sigma(forecast_garch)
mean_values <- fitted(forecast_garch)
# Plot with confidence intervals
plot(1:10, mean_values, type = "l", col = "blue",
ylim = c(min(mean_values - 2*sigma_values), max(mean_values + 2*sigma_values)),
main = "GARCH Forecast with 95% Confidence Interval",
xlab = "Days Ahead", ylab = "Forecasted Returns")
# Add volatility bands (95% confidence interval)
lines(1:10, mean_values + 1.96*sigma_values, lty = 2, col = "red")
lines(1:10, mean_values - 1.96*sigma_values, lty = 2, col = "red")
legend("topleft", legend = c("Mean Forecast", "95% Confidence Interval"),
col = c("blue", "red"), lty = c(1, 2))
Check coefficient significance
Evaluate persistence (α₁ + β₁)
Assess volatility response
Calculate half-life of volatility shocks
Determine unconditional variance
Validate model
For our model:
Persistence = 0.9567
Half-life = 15.6 days
Unconditional variance = 0.000108
Let’s interpret the GARCH(1,1) model:
Persistence = α₁ + β₁ - If close to 1: Volatility is highly persistent - If > 1: Volatility is explosive (non-stationary)
Unconditional variance = α₀ / (1 - α₁ - β₁) - The long-run volatility level
Parameter | Value |
---|---|
ω (omega) | 0.0000 |
α₁ (alpha1) | 0.0679 |
β₁ (beta1) | 0.8888 |
Persistence (α₁+β₁) | 0.9567 |
Unconditional Variance | 0.0001 |
Step 1: Examine Standardized Residuals
# Get standardized residuals
std_resid <- residuals(fit, standardize=TRUE)
# Test for autocorrelation
Box.test(std_resid, lag=10, type="Ljung-Box")
Box-Ljung test
data: std_resid
X-squared = 3.1952, df = 10, p-value = 0.9764
Box-Ljung test
data: std_resid^2
X-squared = 0.8958, df = 10, p-value = 0.9999
Step 2: Check Distribution Assumptions
Step 3: Assess information criteria
Step 4: Out-of-sample validation
Decision Process:
\(r_t = c + \delta\sigma_t^2 + \varepsilon_t\)
# Specify GARCH-M model
spec_garchm <- ugarchspec(
variance.model = list(model = "sGARCH", garchOrder = c(1, 1)),
mean.model = list(armaOrder = c(0, 0), include.mean = TRUE, archm = TRUE),
distribution.model = "norm"
)
# Fit model
fit_garchm <- ugarchfit(spec_garchm, data = dax_returns)
# Extract risk premium parameter
coef(fit_garchm)["archm"]
archm
0.2464345
Interpretation:
A GARCH(1,1) model has estimated parameters: ω = 0.00001, α₁ = 0.15, β₁ = 0.83.
Questions:
Take a moment to work through these problems. :::
Solutions:
a) Persistence = α₁ + β₁ = 0.15 + 0.83 = 0.98
b) Model is stationary because persistence < 1
c) Unconditional variance = ω/(1-α₁-β₁) = 1e-05 /(1- 0.98 ) = 5e-04
d) Half-life = log(0.5)/log(persistence) = log(0.5)/log( 0.98 ) = 34.3 days
Each variant addresses specific features in financial data:
Volatility and Asset Pricing Theory
Volatility and Market Efficiency
Volatility, Leverage and Firm Value
Institutional Factors
Markets don’t exist in isolation:
What we miss with univariate models:
Examples of financial interconnections:
The basic idea:
Everyday analogy:
Practical example: Two-variable VAR(1)
\(y_t = c + A_1 y_{t-1} + ... + A_p y_{t-p} + \varepsilon_t\)
# Get some stock data
aapl <- tq_get("AAPL", from = "2020-01-01", to = "2022-12-31")
msft <- tq_get("MSFT", from = "2020-01-01", to = "2022-12-31")
# Calculate returns
aapl_ret <- diff(log(aapl$adjusted))
msft_ret <- diff(log(msft$adjusted))
# Combine into a matrix
stock_returns <- cbind(aapl_ret, msft_ret)
colnames(stock_returns) <- c("AAPL", "MSFT")
# Fit VAR model
var_fit <- VAR(stock_returns, p = 2)
summary(var_fit)
VAR Estimation Results:
=========================
Endogenous variables: AAPL, MSFT
Deterministic variables: const
Sample size: 753
Log Likelihood: 4004.984
Roots of the characteristic polynomial:
0.3236 0.1499 0.1397 0.05499
Call:
VAR(y = stock_returns, p = 2)
Estimation results for equation AAPL:
=====================================
AAPL = AAPL.l1 + MSFT.l1 + AAPL.l2 + MSFT.l2 + const
Estimate Std. Error t value Pr(>|t|)
AAPL.l1 -0.1009665 0.0624010 -1.618 0.106
MSFT.l1 -0.0620421 0.0672548 -0.922 0.357
AAPL.l2 -0.0167360 0.0622984 -0.269 0.788
MSFT.l2 0.0286580 0.0664777 0.431 0.667
const 0.0008684 0.0008414 1.032 0.302
Residual standard error: 0.02306 on 748 degrees of freedom
Multiple R-Squared: 0.02408, Adjusted R-squared: 0.01887
F-statistic: 4.615 on 4 and 748 DF, p-value: 0.001098
Estimation results for equation MSFT:
=====================================
MSFT = AAPL.l1 + MSFT.l1 + AAPL.l2 + MSFT.l2 + const
Estimate Std. Error t value Pr(>|t|)
AAPL.l1 -0.0779151 0.0578557 -1.347 0.1785
MSFT.l1 -0.1574331 0.0623560 -2.525 0.0118 *
AAPL.l2 -0.0534476 0.0577606 -0.925 0.3551
MSFT.l2 0.0692635 0.0616355 1.124 0.2615
const 0.0007368 0.0007801 0.944 0.3452
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02138 on 748 degrees of freedom
Multiple R-Squared: 0.05585, Adjusted R-squared: 0.0508
F-statistic: 11.06 on 4 and 748 DF, p-value: 1.013e-08
Covariance matrix of residuals:
AAPL MSFT
AAPL 0.0005318 0.0003997
MSFT 0.0003997 0.0004572
Correlation matrix of residuals:
AAPL MSFT
AAPL 1.0000 0.8106
MSFT 0.8106 1.0000
What to look for in VAR output:
Financial applications of VAR models:
Testing for cointegration:
The pairs trading concept:
Imagine two competing companies in the same industry:
Statistical representation:
Financial applications:
\(\Delta y_t = \alpha\beta' y_{t-1} + \Gamma_1 \Delta y_{t-1} + ... + \varepsilon_t\)
# Create cointegrated series
set.seed(123)
t <- 200
y <- cumsum(rnorm(t))
x <- 0.5*y + rnorm(t, 0, 0.5)
data <- cbind(y, x)
# Test for cointegration
jotest <- ca.jo(data, type="trace", K=2, ecdet="none", spec="transitory")
summary(jotest)
######################
# Johansen-Procedure #
######################
Test type: trace statistic , with linear trend
Eigenvalues (lambda):
[1] 0.36467287 0.02553714
Values of teststatistic and critical values of test:
test 10pct 5pct 1pct
r <= 1 | 5.12 6.50 8.18 11.65
r = 0 | 94.94 15.66 17.95 23.52
Eigenvectors, normalised to first column:
(These are the cointegration relations)
y.l1 x.l1
y.l1 1.000000 1.0000000
x.l1 -1.950414 0.2287169
Weights W:
(This is the loading matrix)
y.l1 x.l1
y.d -0.04126551 -0.05008652
x.d 0.52788617 -0.02587802
#############
###Model VECM
#############
Full sample size: 200 End sample size: 197
Number of variables: 2 Number of estimated slope parameters 12
AIC -285.01 BIC -242.3283 SSR 260.4855
Cointegrating vector (estimated by ML):
y x
r1 1 -1.955183
ECT Intercept y -1
Equation y -0.0739(0.1245) -0.0119(0.0673) -0.0434(0.1211)
Equation x 0.5529(0.0904)*** -0.0129(0.0489) -0.0800(0.0879)
x -1 y -2 x -2
Equation y -0.0376(0.1943) -0.0241(0.0970) -0.1269(0.1358)
Equation x 0.1005(0.1410) -0.0200(0.0704) 0.0198(0.0986)
Key parameters to examine:
Practical interpretation checklist:
Financial example: In a stock-dividend model, if stock prices adjust faster than dividends, the α coefficient for stock prices will be larger in magnitude.
Granger causality test
Model 1: x ~ Lags(x, 1:2) + Lags(y, 1:2)
Model 2: x ~ Lags(x, 1:2)
Res.Df Df F Pr(>F)
1 193
2 195 -2 28.628 1.295e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Example of potentially spurious Granger causality:
p-value: 0.0001460405
Conclusion: X appears to Granger-cause Y
But in reality, both are caused by Z
The hidden common factor problem:
The core idea:
Financial examples:
The Kalman filter:
\[\log(\sigma_t^2) = \alpha_0 + \sum \beta_j \log(\sigma_{t-j}^2) + \sum \alpha_i \frac{|\varepsilon_{t-i}|}{\sigma_{t-i}} + \sum \gamma_k \frac{\varepsilon_{t-k}}{\sigma_{t-k}}\]
# Specify EGARCH model
spec_egarch = ugarchspec(variance.model = list(model = "eGARCH", garchOrder = c(1, 1)),
mean.model = list(armaOrder = c(0, 0)),
distribution.model = "norm")
# Fit EGARCH model
fit_egarch = ugarchfit(spec_egarch, data = dax_returns)
print(fit_egarch)
*---------------------------------*
* GARCH Model Fit *
*---------------------------------*
Conditional Variance Dynamics
-----------------------------------
GARCH Model : eGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
------------------------------------
Estimate Std. Error t value Pr(>|t|)
mu 0.000593 0.000230 2.5793 0.009900
omega -0.102769 0.003003 -34.2237 0.000000
alpha1 -0.024264 0.007520 -3.2264 0.001253
beta1 0.988504 0.000312 3165.3825 0.000000
gamma1 0.061572 0.008332 7.3895 0.000000
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.000593 0.000307 1.9337 0.053148
omega -0.102769 0.018694 -5.4974 0.000000
alpha1 -0.024264 0.010174 -2.3848 0.017086
beta1 0.988504 0.001725 573.2100 0.000000
gamma1 0.061572 0.038450 1.6013 0.109303
LogLikelihood : 5971.651
Information Criteria
------------------------------------
Akaike -6.4192
Bayes -6.4043
Shibata -6.4192
Hannan-Quinn -6.4137
Weighted Ljung-Box Test on Standardized Residuals
------------------------------------
statistic p-value
Lag[1] 0.03173 0.8586
Lag[2*(p+q)+(p+q)-1][2] 0.43088 0.7269
Lag[4*(p+q)+(p+q)-1][5] 1.02065 0.8550
d.o.f=0
H0 : No serial correlation
Weighted Ljung-Box Test on Standardized Squared Residuals
------------------------------------
statistic p-value
Lag[1] 0.03067 0.86098
Lag[2*(p+q)+(p+q)-1][5] 7.14903 0.04772
Lag[4*(p+q)+(p+q)-1][9] 8.06463 0.12520
d.o.f=2
Weighted ARCH LM Tests
------------------------------------
Statistic Shape Scale P-Value
ARCH Lag[3] 0.01681 0.500 2.000 0.8968
ARCH Lag[5] 0.37982 1.440 1.667 0.9184
ARCH Lag[7] 0.40937 2.315 1.543 0.9858
Nyblom stability test
------------------------------------
Joint Statistic: 2.54
Individual Statistics:
mu 0.2674
omega 0.1481
alpha1 0.1139
beta1 0.1410
gamma1 0.2306
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.28 1.47 1.88
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
------------------------------------
t-value prob sig
Sign Bias 1.25959 0.2080
Negative Sign Bias 0.34643 0.7291
Positive Sign Bias 0.01245 0.9901
Joint Effect 2.59481 0.4584
Adjusted Pearson Goodness-of-Fit Test:
------------------------------------
group statistic p-value(g-1)
1 20 96.21 2.583e-12
2 30 117.40 1.351e-12
3 40 126.86 3.146e-11
4 50 185.24 1.040e-17
Elapsed time : 0.05077314
The financial theory behind asymmetry:
Model | Key Feature | When to Use | Typical Application |
---|---|---|---|
ARCH(q) | Uses q lags of squared returns | Simple volatility clustering | Historical baseline model |
GARCH(p,q) | Adds p lags of conditional variance | Persistent volatility | General financial returns |
EGARCH | Log form, asymmetric response | Leverage effects | Equity markets |
GJR-GARCH | Threshold-based asymmetry | Leverage effects | Alternative to EGARCH |
GARCH-M | Volatility in mean equation | Risk premium analysis | Asset pricing tests |
IGARCH | Restricted for unit root in variance | Very persistent volatility | FX markets, long memory |
FIGARCH | Fractional integration | Long memory processes | High-frequency data |
APARCH | Flexible power transformation | When standard models fail | Flexible modeling |
Traditional time series models can be enhanced with ML:
Key advantages:
Hybrid modeling approaches:
Implementation example:
# Step 1: Fit traditional GARCH model
garch_model <- ugarchfit(...)
# Step 2: Extract GARCH features
volatility <- sigma(garch_model)
residuals <- residuals(garch_model)
# Step 3: Combine with other features
features <- cbind(
lagged_returns,
volatility,
residuals,
technical_indicators,
fundamental_data
)
# Step 4: Train ML model on features
ml_model <- randomForest(
x = features,
y = future_returns,
ntree = 500
)
Benefits:
Volatility Models | Multivariate Models |
---|---|
Model Types: | Model Types: |
- ARCH: Models volatility based on past squared returns | - VAR: Captures dynamic relationships between variables |
- GARCH: Adds lagged volatility for more persistent patterns | - VECM: For cointegrated series with long-run equilibrium |
- EGARCH/GJR: Capture asymmetric effects (leverage effect) | - State Space Models: Extract latent variables from noisy data |
- GARCH-M: Incorporates risk premium in mean equation | |
- FIGARCH: Captures long memory in volatility | |
When to use: | When to use: |
- Market risk assessment | - Asset allocation |
- Option pricing and hedging | - Pairs trading |
- Portfolio optimization | - Yield curve analysis |
- Value-at-Risk measurement | - Economic forecasting |
- Trading strategy development | - Systemic risk assessment |
- Market interconnection analysis |
These advanced time series models provide powerful tools for financial modeling, risk management, and understanding market dynamics.
Advanced Financial Data Analytics