Introduction
From the CBOE VIX website:
“Cboe Global Markets revolutionized investing with the creation of the Cboe Volatility Index® (VIX® Index), the first benchmark index to measure the market’s expectation of future volatility. The VIX Index is based on options of the S&P 500® Index, considered the leading indicator of the broad U.S. stock market. The VIX Index is recognized as the world’s premier gauge of U.S. equity market volatility.”
In this tutorial, we will investigate finding a signal to use as a basis to trade the VIX.
VIX Data
I don’t have access to data for the VIX through Nasdaq Data Link, but for our purposes Yahoo Finance is sufficient.
Using the yfinance python module, we can pull what we need and quicky dump it to excel to retain it for future use.
Python Functions
First, a couple of useful functions:
Pull Data From Yahoo Finance
Here’s the code for the function to pull the data and dump to Excel:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
| # This function pulls data from Yahoo finance
def yf_data_updater(fund):
# Download data from YF
df_comp = yf.download(fund)
# Drop the column level with the ticker symbol
df_comp.columns = df_comp.columns.droplevel(1)
# Reset index
df_comp = df_comp.reset_index()
# Remove the "Price" header from the index
df_comp.columns.name = None
# Reset date column
df_comp['Date'] = df_comp['Date'].dt.tz_localize(None)
# Set 'Date' column as index
df_comp = df_comp.set_index('Date', drop=True)
# Drop data from last day because it's not accrate until end of day
df_comp = df_comp.drop(df_comp.index[-1])
# Export data to excel
file = fund + ".xlsx"
df_comp.to_excel(file, sheet_name='data')
print(f"The first and last date of data for {fund} is: ")
print(df_comp[:1])
print(df_comp[-1:])
print(f"Data updater complete for {fund} data")
return print(f"--------------------")
|
Set Number Of Decimal Places
1
2
3
| # Set number of decimal places in pandas
def dp(decimal_places):
pd.set_option('display.float_format', lambda x: f'%.{decimal_places}f' % x)
|
Import Data From CSV / XLSX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| def load_data(file):
# Import CSV
try:
df = pd.read_csv(file)
except:
pass
# Import excel
try:
df = pd.read_excel(file, sheet_name='data', engine='openpyxl')
except:
pass
return df
|
1
2
3
4
5
6
7
8
9
10
| # The `df_info` function returns some useful information about
# a dataframe, such as the columns, data types, and size.
def df_info(df):
print('The columns, shape, and data types are:')
print(df.info())
print('The first 5 rows are:')
display(df.head())
print('The last 5 rows are:')
display(df.tail())
|
Data Overview
Acquire Data
First, let’s get the data:
1
| yf_data_updater('^VIX')
|
Load Data
Then set our decimal places to something reasonable (like 2):
Now that we have the data, let’s load it up and take a look.
1
2
3
4
5
6
7
8
9
10
11
| # VIX
vix = load_data('^VIX.xlsx')
# Set 'Date' column as datetime
vix['Date'] = pd.to_datetime(vix['Date'])
# Drop 'Volume'
vix.drop(columns = {'Volume'}, inplace = True)
# Set Date as index
vix.set_index('Date', inplace = True)
|
Check For Missing Values & Forward Fill Any Missing Values
1
2
3
4
5
| # Check to see if there are any NaN values
vix[vix['High'].isna()]
# Forward fill to clean up missing data
vix['High'] = vix['High'].ffill()
|
DataFrame Info
Now, running:
Gives us the following:

Interesting Statistics
Some interesting statistics jump out at use when we look at the mean, standard deviation, min, and max values:
1
2
3
4
5
6
7
8
9
10
11
12
13
| vix_stats = vix.describe()
vix_stats.loc['mean + 1 std'] = {'Open': vix_stats.loc['mean']['Open'] + vix_stats.loc['std']['Open'],
'High': vix_stats.loc['mean']['High'] + vix_stats.loc['std']['High'],
'Low': vix_stats.loc['mean']['Low'] + vix_stats.loc['std']['Low'],
'Close': vix_stats.loc['mean']['Close'] + vix_stats.loc['std']['Close']}
vix_stats.loc['mean + 2 std'] = {'Open': vix_stats.loc['mean']['Open'] + 2 * vix_stats.loc['std']['Open'],
'High': vix_stats.loc['mean']['High'] + 2 * vix_stats.loc['std']['High'],
'Low': vix_stats.loc['mean']['Low'] + 2 * vix_stats.loc['std']['Low'],
'Close': vix_stats.loc['mean']['Close'] + 2 * vix_stats.loc['std']['Close']}
vix_stats.loc['mean - 1 std'] = {'Open': vix_stats.loc['mean']['Open'] - vix_stats.loc['std']['Open'],
'High': vix_stats.loc['mean']['High'] - vix_stats.loc['std']['High'],
'Low': vix_stats.loc['mean']['Low'] - vix_stats.loc['std']['Low'],
'Close': vix_stats.loc['mean']['Close'] - vix_stats.loc['std']['Close']}
|

And the levels for each decile:
1
2
| deciles = vix.quantile(np.arange(0, 1.1, 0.1))
display(deciles)
|

A quick histogram gives us the distribution for the entire dataset:

Now, let’s add the levels for the mean, mean plus 1 standard deviation, mean minus 1 standard deviation, and mean plus 2 standard deviations:

Historical VIX Plot
Here’s two plots for the dataset. The first covers 1990 - 2009, and the second 2010 - 2024. This is the daily high level.


From this plot, we can see the following:
- The VIX has really only jumped above 50 several times (GFC, COVID, recently in August of 2024)
- The highest levels (> 80) occured only during the GFC & COVID
- Interestingly, the VIX did not ever get above 50 during the .com bubble
Investigating A Signal
Next, we will consider the idea of a spike level in the VIX and how we might use a spike level to generate a signal. These elevated levels usually occur during market sell-off events or longer term drawdowns in the S&P 500. Sometimes the VIX reverts to recent levels after a spike, but other times levels remain elevated for weeks or even months.
Spike Level
We will start the 10 day simple moving average (SMA) of the daily high level to get an idea of what is happening recently with the VIX. We’ll then pick an arbitrary spike level (25% above the 10 day SMA), and our signal is generated if the VIX hits a level that is above the spike threshold.
The idea is that the 10 day SMA will smooth out the recent short term volatility in the VIX, and therefore any gradual increases in the VIX are not interpreted as spike events.
We also will generate the 20 and 50 day SMAs for reference, and again to see what is happening with the level of the VIX over slightly longer timeframes.
Here’s the code for the above:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
| # Define the spike multiplier for detecting significant spikes
spike_level = 1.25
# =========================
# Simple Moving Averages (SMA)
# =========================
# Calculate 10-period SMA of 'High'
vix['High_SMA_10'] = vix['High'].rolling(window=10).mean()
# Shift the 10-period SMA by 1 to compare with current 'High'
vix['High_SMA_10_Shift'] = vix['High_SMA_10'].shift(1)
# Calculate the spike level based on shifted SMA and spike multiplier
vix['Spike_Level_SMA'] = vix['High_SMA_10_Shift'] * spike_level
# Calculate 20-period SMA of 'High'
vix['High_SMA_20'] = vix['High'].rolling(window=20).mean()
# Determine if 'High' exceeds the spike level (indicates a spike)
vix['Spike_SMA'] = vix['High'] >= vix['Spike_Level_SMA']
# Calculate 50-period SMA of 'High' for trend analysis
vix['High_SMA_50'] = vix['High'].rolling(window=50).mean()
# =========================
# Exponential Moving Averages (EMA)
# =========================
# Calculate 10-period EMA of 'High'
vix['High_EMA_10'] = vix['High'].ewm(span=10, adjust=False).mean()
# Shift the 10-period EMA by 1 to compare with current 'High'
vix['High_EMA_10_Shift'] = vix['High_EMA_10'].shift(1)
# Calculate the spike level based on shifted EMA and spike multiplier
vix['Spike_Level_EMA'] = vix['High_EMA_10_Shift'] * spike_level
# Calculate 20-period EMA of 'High'
vix['High_EMA_20'] = vix['High'].ewm(span=20, adjust=False).mean()
# Determine if 'High' exceeds the spike level (indicates a spike)
vix['Spike_EMA'] = vix['High'] >= vix['Spike_Level_EMA']
# Calculate 50-period EMA of 'High' for trend analysis
vix['High_EMA_50'] = vix['High'].ewm(span=50, adjust=False).mean()
|
For this exercise, we will use simple moving averages.
Spike Totals By Year
To investigate the number of spike events (or signals) that we receive on a yearly basis, we can run the following:
1
2
3
4
5
6
7
8
9
10
| # Ensure the index is a DatetimeIndex
vix.index = pd.to_datetime(vix.index)
# Create a new column for the year extracted from the date index
vix['Year'] = vix.index.year
# Group by year and the "Spike_SMA" and "Spike_EMA" columns, then count occurrences
spike_count_SMA = vix.groupby(['Year', 'Spike_SMA']).size().unstack(fill_value=0)
spike_count_SMA
|
Which gives us the following:

Spike Counts (Signals) By Year
Here’s the plots for the spikes/signals generated over the past 3 decades:








More discussion to follow.
References
https://www.cboe.com/tradable_products/vix/https://github.com/ranaroussi/yfinance
Code
The jupyter notebook with the functions and all other code is available here.The html export of the jupyter notebook is available here.The pdf export of the jupyter notebook is available here.