-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft]Adjustment to the PCA Approach #41
base: develop
Are you sure you want to change the base?
Conversation
…ke 45%) by PCA factors.
Add an option to use a variable value of explained variance(like 45%, 55%, 65%) by PCA factors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress 👌
I left some comments regarding the code that we can discuss.
Now we should polish the docstrings and start writing the sphinx docs.
# pylint: disable=invalid-name | ||
# pylint: disable=R0913 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rather do
# pylint: disable=invalid-name, too-many-arguments
:param matrix: (pd.DataFrame) DataFrame with returns that need to be standardized. | ||
:param vol_matrix: (pd.DataFrame) DataFrame with histoircal trading volume data. | ||
:param k: (int) Look-back window used for volume moving average. | ||
:return: (pd.DataFrame) a volume-adjusted returns dataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:return: (pd.DataFrame) A
volume-adjusted returns dataFrame.
# Fill missing data with preceding values | ||
returns = matrix.dropna(axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rather fill values?
# Standardized: fill nan with zero / std: fill nan with 1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be removed now.
|
||
So the output is a dataframe containing the weight for each asset in a portfolio for each eigen vector. | ||
|
||
:param matrix: (pd.DataFrame) Dataframe with index and columns containing asset returns. | ||
:param explained_var (float) The user-defined explained variance criteria. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add that if this parameter is given it will override the n_components
parameter. And also mention that it should've in the range from 0 to 1.
tests/test_etf_approach.py
Outdated
Tests the PCA Strategy from the Other Approaches module. | ||
""" | ||
|
||
import unittest | ||
import os | ||
import pandas as pd | ||
import numpy as np | ||
from arbitragelab.other_approaches import ETFStrategy | ||
|
||
|
||
class TestPCAStrategy(unittest.TestCase): | ||
""" | ||
Tests PCAStrategy class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming should be fixed.
tests/test_etf_approach.py
Outdated
# Check target weights | ||
self.assertAlmostEqual(target_weights.mean()['EEM'], 0.333333, delta=1e-5) | ||
self.assertAlmostEqual(target_weights.mean()['BND'], -0.5, delta=1e-5) | ||
self.assertAlmostEqual(target_weights.mean()['SPY'], -0.38888, delta=1e-5) | ||
|
||
# Check drift argument | ||
target_weights = self.etf_strategy.get_signals(smaller_etf, smaller_dataset, k=1, corr_window=252, | ||
residual_window=60, sbo=1.25, sso=1.25, ssc=0.5, | ||
sbc=0.75, size=1, drift=True) | ||
|
||
# Check target weights | ||
self.assertAlmostEqual(target_weights.mean()['EEM'], 0.333333, delta=1e-5) | ||
self.assertAlmostEqual(target_weights.mean()['BND'], -0.5, delta=1e-5) | ||
self.assertAlmostEqual(target_weights.mean()['SPY'], -0.38888, delta=1e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that these test values are the same.
tests/test_etf_approach.py
Outdated
# Check target weights | ||
self.assertAlmostEqual(target_weights.mean()['EEM'], 0.333333, delta=1e-5) | ||
self.assertAlmostEqual(target_weights.mean()['BND'], -0.5, delta=1e-5) | ||
self.assertAlmostEqual(target_weights.mean()['SPY'], -0.38888, delta=1e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And these too. Can we pick the values of the parameters so the outputs are different?
|
||
def __init__(self, n_components: int = 15): | ||
""" | ||
Initialize PCA StatArb Strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings in this class should be fixed.
First, the correlation matrix to get PCA components is calculated using a | ||
corr_window parameter. From this, we get weights to calculate PCA factor returns. | ||
These weights are being recalculated each time we generate (residual_window) number | ||
of signals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these descriptions should be updated to match the ETF Approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some code fixes to this PR.
condition = min(np.cumsum(expl_variance), key=lambda x: abs(x - explained_var)) | ||
# The number of components to use | ||
num_pc = np.where(np.cumsum(expl_variance) == condition)[0][0] + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is not working as expected, I'll show an example.
A function to calculate weights (scaled eigen vectors) to use for factor return calculation with | ||
asymptotic PCA. | ||
|
||
Weights are calculated from PCA components as: | ||
|
||
Weight = Eigen vector / std.(R) | ||
|
||
So the output is a dataframe containing the weight for each asset in a portfolio for each eigen vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please adjust this docstring to reflect the idea behind the asym PCA.
Purpose
Describe the problem or feature in addition to a link to the issues.
Approach
How does this change address the problem?
Tests for New Behavior
What new tests were added to cover new features or behaviors?
Checklist
Make sure you did the following (if applicable):
./pylint
to make sure code style is consistent.Learning
Describe the research stage
Links to blog posts, patterns, libraries or addons used to solve this problem