-
Notifications
You must be signed in to change notification settings - Fork 29
Technical section
Volatile adopts a Bayesian hierarchical model based on adjusted closing prices, sector and industry information, estimating log-price via polynomials in time.
Denote to represent times at which observations arrive. corresponds to the number of days in the training dataset, which is taken to be the last one year of data.
Furthermore, denote to be prior scale parameters associated to the j-th order of a polynomial with degree . Decreasing the scales as increases penalises deviation from zero of higher-order parameters, thereby encouraging simpler models. Currently, we set .
We write:
- to indicate that an industry belongs to a sector , where is the number of industries and the number of sectors;
- to indicate that a stock belongs to an industry , where the number of stocks.
Then, we construct the hierarchical model
Parameters at market-level are prior means for sector-level parameters , which in turn are prior means for industry-level parameters ; finally, the latter are prior means for stock-level parameters Components of the parameters at each level are supposed to be conditionally independent given the parameters at the level above in the hierarchy. Whereas are used to determine the coefficients of the polynomial model, are used to determine the scales of the likelihood function.
In order to estimate parameters, we condition on adjusted closing log-prices , for all , then we estimate the mode of the posterior distribution, also known as Maximum-A-Posteriori (MAP). From a frequentist statistics perspective, this corresponds to a polynomial regression task where we minimise a regularised mean-squared error loss. In practice, we train the model sequentially at different levels, that is first we train a market-level model to find market-level parameters; then we fix the market-level parameters and train a sector-level model to find sector-level parameters; and so on. A plot showing the losses decay during training can be saved in the current directory as losses_decay.png
by adding the flag --plot-losses
in the command line.
Obtained our estimates , we can use the likelihood mean as an estimator of the log-prices for any time in the past, as well as a predictor for times in the short future. As a measure of uncertainty, we take the learned scale of the likelihood, that is .
Because we model log-prices as a Gaussian, the distribution of prices is a log-Normal distribution, whose mean and standard deviation can be derived in closed form from the estimators and . They are respectively as follows:
We use log-Normal distribution statistics at times to produce the stock estimation plot and at time to fill the prediction table. In order to produce the market, sector and industry estimation plots, we proceed analogously but with estimators at respective levels, that is and for market, and for sector, and for industry.
Given the selected model complexity, Volatile trains the model and provides a rate for each stock by introducing the following score:
where is the last available log-price and is its prediction in 5 trading days (usually, that corresponds to the log-price in one week). If the future prediction is larger than the current price, the score will be positive; the larger the difference and the more confident we are about the prediction (or equivalently, the smaller the standard deviation is), the more positive will be the score. We can reason similarly if the score is negative. In other words, a large positive score indicates that the current price is undervalued with respect to its stock trend, therefore an opportunity to buy; a large negative score indicates, vice versa, that the current price is overvalued with respect to its stock trend, therefore a moment to sell.
Then, stocks are rated according to the following criteria:
Given the price trend estimate as a function of time , the percentage trend growth is defined as . When evaluated at current time , the right-hand-side equals , which is the number appearing in the table.
The volatility is a measure of how noisy is a stock over time. In Volatile, this is measured as the current standard deviation estimate divided by the current price, that is .
For each stock, we define its match as the stock in the list that is most closely correlated to it according to some metric. This information is particularly useful in pair trading, a simple trading strategy based on the strong correlation between a pair of stocks. If the two stocks evolve similarly up to some point and then diverge, one could long the underperforming stock and short the overperforming one, with the intention of closing the positions when their evolutions will match again. In order to discover correlated stocks, we first train a model analogous to the one described above, but with a very high polynomial degree; we arbitrarily take , that is the number of weeks in a year of training data. The rationality behind this choice is that while a predictive model should exploit a low-complexity polynomial in order to avoid fitting oscillations that may not be inherent to the prediction (i.e. overfitting), a model purposed to discover correlations should be complex enough to indeed capture most of them. Then, because the polynomial is a smooth curve in time, we can again compute the percentage trend growth function and measure stock correlations via the simple distance metric . For every stock , the stock that minimizes such distance will be named its match. Notice that because the percentage trend growth is a normalized derivative, the nominal value of the trend (and therefore of the price) should not matter when measuring the distance; the metric will only look at its relative variation.
If the symbols passed to Volatile have different price currencies, we first find the most common currency and set it as default, then we download the last year of exchange rate information and convert all currencies to the default one. Training and score metric computation are executed using converted prices. Mathematically, if is the price of a certain stock in its currency, we define to be the converted price, where is the exchange rate from the original currency to the default one. Then, the corresponding log-prices follow the relation . Because we model as a Gaussian, is also a Gaussian with the additional log-exchange rate in the mean and same standard deviation. Therefore, after mean and standard deviation estimates of are computed, estimators for can be promptly obtained, from which log-Normal mean and standard deviation estimators of can in turn be produced.
We compute a measure of risk for the portfolio as follows:
where stands for standard deviation, for covariance, is the number of different stocks in the portfolio, and are respectively price and number of owned units of stock at time . We then make the practical assumption that
where denotes a Kronecker delta and are price standard deviation estimators at stock, industry, sector and market levels. Although the covariance approximation above does not exactly correspond to the model in use, it is useful to associate higher risk to stocks with higher volatility and to increase the risk if multiple stocks in the portfolio belong to the same category.
Notice that because standard deviations are multiplied by the number of owned units and because higher prices usually tend to have higher standard deviations, a larger invested capital is prone to be associated with a higher risk. Furthermore, we divide by the number of stocks in the portfolio to promote diversification as a way to lower risk.