Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft]Adjustment to the PCA Approach #41

Open
wants to merge 27 commits into
base: develop
Choose a base branch
from
Open

[Draft]Adjustment to the PCA Approach #41

wants to merge 27 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Mar 2, 2021

Purpose

Describe the problem or feature in addition to a link to the issues.

Approach

How does this change address the problem?

Tests for New Behavior

What new tests were added to cover new features or behaviors?

Checklist

Make sure you did the following (if applicable):

  • Added tests for any new features or behaviors.
  • Ran ./pylint to make sure code style is consistent.
  • Built and reviewed the docs.
  • Added a note to the changelog.

Learning

Describe the research stage

Links to blog posts, patterns, libraries or addons used to solve this problem

@ghost ghost added documentation Improvements or additions to documentation enhancement New feature or request labels Mar 2, 2021
@ghost ghost requested a review from PanPip March 2, 2021 16:38
@ghost ghost self-assigned this Mar 2, 2021
@ghost
Copy link
Author

ghost commented Mar 7, 2021

Add an option to use a variable value of explained variance(like 45%, 55%, 65%) by PCA factors.

Copy link
Contributor

@PanPip PanPip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress 👌

I left some comments regarding the code that we can discuss.

Now we should polish the docstrings and start writing the sphinx docs.

Comment on lines 17 to 18
# pylint: disable=invalid-name
# pylint: disable=R0913
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rather do

# pylint: disable=invalid-name, too-many-arguments

:param matrix: (pd.DataFrame) DataFrame with returns that need to be standardized.
:param vol_matrix: (pd.DataFrame) DataFrame with histoircal trading volume data.
:param k: (int) Look-back window used for volume moving average.
:return: (pd.DataFrame) a volume-adjusted returns dataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:return: (pd.DataFrame) A volume-adjusted returns dataFrame.

Comment on lines 64 to 65
# Fill missing data with preceding values
returns = matrix.dropna(axis=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rather fill values?

Comment on lines 116 to 117
# Standardized: fill nan with zero / std: fill nan with 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be removed now.


So the output is a dataframe containing the weight for each asset in a portfolio for each eigen vector.

:param matrix: (pd.DataFrame) Dataframe with index and columns containing asset returns.
:param explained_var (float) The user-defined explained variance criteria.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add that if this parameter is given it will override the n_components parameter. And also mention that it should've in the range from 0 to 1.

Comment on lines 6 to 18
Tests the PCA Strategy from the Other Approaches module.
"""

import unittest
import os
import pandas as pd
import numpy as np
from arbitragelab.other_approaches import ETFStrategy


class TestPCAStrategy(unittest.TestCase):
"""
Tests PCAStrategy class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming should be fixed.

Comment on lines 113 to 126
# Check target weights
self.assertAlmostEqual(target_weights.mean()['EEM'], 0.333333, delta=1e-5)
self.assertAlmostEqual(target_weights.mean()['BND'], -0.5, delta=1e-5)
self.assertAlmostEqual(target_weights.mean()['SPY'], -0.38888, delta=1e-5)

# Check drift argument
target_weights = self.etf_strategy.get_signals(smaller_etf, smaller_dataset, k=1, corr_window=252,
residual_window=60, sbo=1.25, sso=1.25, ssc=0.5,
sbc=0.75, size=1, drift=True)

# Check target weights
self.assertAlmostEqual(target_weights.mean()['EEM'], 0.333333, delta=1e-5)
self.assertAlmostEqual(target_weights.mean()['BND'], -0.5, delta=1e-5)
self.assertAlmostEqual(target_weights.mean()['SPY'], -0.38888, delta=1e-5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that these test values are the same.

Comment on lines 134 to 137
# Check target weights
self.assertAlmostEqual(target_weights.mean()['EEM'], 0.333333, delta=1e-5)
self.assertAlmostEqual(target_weights.mean()['BND'], -0.5, delta=1e-5)
self.assertAlmostEqual(target_weights.mean()['SPY'], -0.38888, delta=1e-5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And these too. Can we pick the values of the parameters so the outputs are different?


def __init__(self, n_components: int = 15):
"""
Initialize PCA StatArb Strategy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings in this class should be fixed.

Comment on lines 327 to 330
First, the correlation matrix to get PCA components is calculated using a
corr_window parameter. From this, we get weights to calculate PCA factor returns.
These weights are being recalculated each time we generate (residual_window) number
of signals.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these descriptions should be updated to match the ETF Approach.

Copy link
Contributor

@PanPip PanPip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some code fixes to this PR.

Comment on lines +132 to +134
condition = min(np.cumsum(expl_variance), key=lambda x: abs(x - explained_var))
# The number of components to use
num_pc = np.where(np.cumsum(expl_variance) == condition)[0][0] + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is not working as expected, I'll show an example.

Comment on lines +149 to +156
A function to calculate weights (scaled eigen vectors) to use for factor return calculation with
asymptotic PCA.

Weights are calculated from PCA components as:

Weight = Eigen vector / std.(R)

So the output is a dataframe containing the weight for each asset in a portfolio for each eigen vector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adjust this docstring to reflect the idea behind the asym PCA.

@PanPip PanPip assigned PanPip and unassigned ghost Mar 19, 2021
@PanPip PanPip removed their assignment Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant