def getDailyVol(close, span0=100):
# daily vol, reindexed to close
= close.index.searchsorted(close.index - pd.Timedelta(days=1))
df0 = df0[df0 > 0]
df0 = pd.Series(close.index[df0 - 1], index=close.index[close.shape[0] - df0.shape[0]:])
df0 = close.loc[df0.index]/close.loc[df0.values].values - 1 # daily returns
df0 = df0.ewm(span=span0).std()
df0
return df0
Notes from Advances in Financial Machine Learning: Labelling Financial Data - Work in Progress
Introduction
This is the second of a series of blog posts summarising chapters from Advances in Financial Machine Learning by Marcos Lopez de Prado. These notes are concerned with ways to label financial data for machine learning applications. I augment the code with my own approaches.
In this context labeling means identifying the dependent variable
The Fixed-Time Horizon Method
Consider a features matrix
is a predefined constant threshold is the index of the bar immediately after occurs is the index of the bar after is the price return over a bar horizon :
Because the literature almost always works with time bars,
- Time bars do not exhibit good statistical properties.
- The same threshold
is applied regardless of the observed volatility.
There are two proposed alternatives to the fixed
1. Computing Dynamic Thresholds
The first alternative is to label data using a varying threshold,
2. Use Volume or Dollar Bars
Use volume bars or dollar weighted bars, which should have volatilities that are more homoscedastic (constant).
In finality, a last argument against the fixed time horizon method is the path followed by prices in reality. The author argues that every investment strategy has stop-loss limits, and that it is “unrealistic to build a strategy that profits from positions that would have been stopped out the exchange.”