Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add risk parity loss function & risk budgeting allocator #98

Merged
merged 8 commits into from
Dec 19, 2020
Merged

add risk parity loss function & risk budgeting allocator #98

merged 8 commits into from
Dec 19, 2020

Conversation

turmeric-blend
Copy link
Contributor

added risk parity loss function based on this paper, here is a stack exchange version, Ze Vinicius's answer to make things simple. Please double check by formulas. Thanks.

@jankrepl
Copy link
Owner

Thanks for the PR, I will check it asap:)

@turmeric-blend
Copy link
Contributor Author

for this loss function this term_2 = torch.sum(b*torch.log(weights.permute((0,2,1))), dim=1).unsqueeze(dim=-1) line is returning nan values cause by torch.log(weights.permute((0,2,1))) taking the log of a negative value. These negative values are coming from the weights which are the outputs of NumericalMarkowitz allocator. Not sure why NumericalMarkowitz is returning negative values but these values are small on the order of 10^-8.

@turmeric-blend turmeric-blend changed the title add risk parity loss function add risk parity loss function & risk budgeting allocator Nov 24, 2020
@turmeric-blend
Copy link
Contributor Author

for risk budgeting allocator, it is not working yet and I am kind of stuck.

  1. apparently cp.log(w) is not dpp, and I can't seem to make it work with log refer Section 4.1

  2. I tried without cp.log(w) and just w but ran into error:

Please consider re-formulating your problem so that it is 
always solvable or increasing the number of solver iterations.

@jankrepl
Copy link
Owner

jankrepl commented Nov 25, 2020

Ok, I read through the theory:) Let's discuss the allocator first.

for risk budgeting allocator, it is not working yet and I am kind of stuck.

  1. apparently cp.log(w) is not dpp, and I can't seem to make it work with log refer Section 4.1
  2. I tried without cp.log(w) and just w but ran into error:
Please consider re-formulating your problem so that it is 
always solvable or increasing the number of solver iterations.

I just did a sanity check and it seems to be working in the most basic setup:

import cvxpy as cp
import numpy as np

def check_squared_PD(matrix):
    """Check matrix squared is positive definite."""
    return np.all(np.linalg.eigvals(matrix @ matrix) > 0)

np.random.seed(98)

n_assets = 5
max_weight = 1

# Define objective and contstraints
covmat_sqrt = cp.Parameter((n_assets, n_assets))                           
b = cp.Parameter(n_assets, nonneg=True)                                    
w = cp.Variable(n_assets)       

term_1 = 0.5 * cp.sum_squares(covmat_sqrt @ w)                             
term_2 = cp.sum(b @ cp.log(w))                     
objective = cp.Minimize(term_1 - term_2)                   
constraint = [
              cp.sum(w) == 1,
#               cp.sum(b) == 1, # IMO this constraint does not make sense because b is a Parameter and not a Variable
              w >= 0,
              w <= max_weight]

# Set parameter values
some_vector = np.random.random(n_assets)
b.value = some_vector / some_vector.sum()  # sums up to one and nonnegative

some_matrix = np.random.random((n_assets, n_assets))
covmat_sqrt.value = some_matrix.T @ some_matrix  # Positive definite by construction

# Define Problem
prob = cp.Problem(objective, constraint)

assert prob.is_dcp()
assert prob.is_dpp()  # This will guarantee that `cvxpylayers` will work = one can get gradients w.r.t. parameters

# Solve
prob.solve()
print(f"b: {b.value}\nCovmat is PD: {check_squared_PD(covmat_sqrt.value)}\nWeights: {w.value}")
b: [0.25433564 0.19699964 0.10462297 0.27111695 0.17292481]
Covmat is PD: True
Weights: [0.09108105 0.73822756 0.01435531 0.05227587 0.1040602 ]

So if I understand your problem correctly, you are saying that there are some covariance matrices and some b vectors for which cvxpy fails to find the solution? If that is the case my first guess is that your covmat_sqrt is nasty and not positive definite.

Another question I have is why did you decide to go for this formulation? Is there any benefit? They should be all equivalent, or not?

@jankrepl
Copy link
Owner

jankrepl commented Nov 25, 2020

for this loss function this term_2 = torch.sum(b*torch.log(weights.permute((0,2,1))), dim=1).unsqueeze(dim=-1) line is returning nan values cause by torch.log(weights.permute((0,2,1))) taking the log of a negative value. These negative values are coming from the weights which are the outputs of NumericalMarkowitz allocator. Not sure why NumericalMarkowitz is returning negative values but these values are small on the order of 10^-8.

The way you wrote the loss function (containing log) it would be impossible to have w_i=0 which is really limiting in general. I saw that you dealt with this via the eps parameter, however, why not to go for this formulation?

@turmeric-blend
Copy link
Contributor Author

turmeric-blend commented Nov 26, 2020

For the allocator, I need to catch up on my maths to understand if covmat_sqrt being positive definite or not has any effect, but I guess its one of the issues since the problem is solvable prob.solve(). Besides that I think my formulation of term_2 was wrong. I do have one comment on yours though, term_2 = cp.sum(b @ cp.log(w)) if we are taking the matrix multiplication @, then cp.sum() would be redundant?

Another question I have is why did you decide to go for this formulation?

One reason is that it was one of the more simpler formulations of risk budgeting that is convex, see Spinu's eqn 4, also check out this video and this paper on other formulations of risk budgeting/parity, some may not be convex

Is there any benefit? They should be all equivalent, or not?

If you mean compared to Sharpe Ratio, Minimum Variance and other metrics, then Risk Budgeting/Parity brings something different in the sense that it always tries to have the weight of each asset contribute equal risk to the portfolio.

Refer to this blog on a comparison between different metrics vs Risk Parity. Note they used Spinu's version.

If you mean compared to other Risk Budgeting/Parity formulations then I think the difference is in terms of ease of compute and whether it is convex or not, again refer to the video and paper.

The way you wrote the loss function (containing log) it would be impossible to have w_i=0 which is really limiting in general. I saw that you dealt with this via the eps parameter, however, why not to go for this formulation?

Since both allocator and loss function are the same Risk Budgeting/Parity concept, seems natural to have the same formulas for both.

Anyway, as you can see my theory is not that strong to be implementing these formulas effectively. I think I will leave it to you to implement the allocator and loss function of Risk Budgeting/Parity, is that okay? I think I can learn a lot from the way you would implement it and maybe next time I'll PR an actual working code 🤣. Seems like your version of the allocator is basically done? And I'll leave it to you to decide which formulations to go for in the loss function. Feel free to let me know if there's anything I can help with.

@jankrepl
Copy link
Owner

For the allocator, I need to catch up on my maths to understand if covmat_sqrt being positive definite or not has any effect, but I guess its one of the issues since the problem is solvable prob.solve(). Besides that I think my formulation of term_2 was wrong. I do have one comment on yours though, term_2 = cp.sum(b @ cp.log(w)) if we are taking the matrix multiplication @, then cp.sum() would be redundant?

You are totally right, the cp.sum is redundant:) Note that I just replaced * with @ because in the background cvxpy does it automatically. See the below warning

This use of ``*`` has resulted in matrix multiplication.
Using ``*`` for matrix multiplication has been deprecated since CVXPY 1.1.
    Use ``*`` for matrix-scalar and vector-scalar multiplication.
    Use ``@`` for matrix-matrix and matrix-vector multiplication.
    Use ``multiply`` for elementwise multiplication.

  warnings.warn(__STAR_MATMUL_WARNING__, UserWarning)

Anyway, as you can see my theory is not that strong to be implementing these formulas effectively. I think I will leave it to you to implement the allocator and loss function of Risk Budgeting/Parity, is that okay? I think I can learn a lot from the way you would implement it and maybe next time I'll PR an actual working code 🤣. Seems like your version of the allocator is basically done? And I'll leave it to you to decide which formulations to go for in the loss function. Feel free to let me know if there's anything I can help with.

Come on, you wrote everything!!! You did a great job:) I will finalize this PR once I have a bit of time. I think it is a great addition! The allocator is more or less done and for the loss I will just try to use the other formula without log.

@turmeric-blend
Copy link
Contributor Author

Hi, here is my take on this eqn without log for the risk parity/budgeting loss function. I'm not really sure if there is a proper way to reduce the last part of the eqn, 2 sums, but here is my take,

n_assets = 5
weights = torch.rand((2,n_assets))
weights = weights.unsqueeze(dim=1)
covar = torch.rand((2,n_assets,n_assets))

volatility = torch.sqrt(torch.matmul(weights, torch.matmul(covar, weights.permute((0,2,1)))))
volatility.shape # (2,1,1)

term_top = (covar * weights)
term_top.shape # (2,5,5)

c = term_top / volatility
c.shape # (2,5,5)

term_front = volatility/n_assets
term_front.shape # (2,1,1)

term_back = weights * c
term_back.shape # (2,5,5)

loss = torch.sum(torch.sum((term_front - term_back)**2, dim=-1), dim=-1)
loss.shape # (2,)

On another note, I think it would be nice if the Risk Budgeting Allocator was included in the Resample Allocator as well :)

@jankrepl
Copy link
Owner

jankrepl commented Dec 2, 2020

Hi, here is my take on this eqn without log for the risk parity/budgeting loss function. I'm not really sure if there is a proper way to reduce the last part of the eqn, 2 sums, but here is my take,

n_assets = 5
weights = torch.rand((2,n_assets))
weights = weights.unsqueeze(dim=1)
covar = torch.rand((2,n_assets,n_assets))

volatility = torch.sqrt(torch.matmul(weights, torch.matmul(covar, weights.permute((0,2,1)))))
volatility.shape # (2,1,1)

term_top = (covar * weights)
term_top.shape # (2,5,5)

c = term_top / volatility
c.shape # (2,5,5)

term_front = volatility/n_assets
term_front.shape # (2,1,1)

term_back = weights * c
term_back.shape # (2,5,5)

loss = torch.sum(torch.sum((term_front - term_back)**2, dim=-1), dim=-1)
loss.shape # (2,)

On another note, I think it would be nice if the Risk Budgeting Allocator was included in the Resample Allocator as well :)

Cool!! It looks correct on the first sight, however, I would suggest testing this programatically. Anyway, if you have some spare time feel free to make it a part of the actual source code and finalize this PR yourself! Especially if you need this change quickly and do not want to wait for me have some spare time to do it:)

The steps to finalize the PR

  • Put your suggested code for the loss into the source code

  • Write unit tests for both the allocator and the loss. Currently the coverage is 100% and it would be cool if we could keep it that way. If it is too overwhelming feel free to ask any questions. Ideas for tests:

    • The allocator needs to sum up to 1 and the shape needs to be correct no matter what the input is
    • The loss should be 0 if e.g. the covariance matrix is torch.eye and the weights are 1/N.

    For some inspiration have a look at existing tests

  • Fix formatting issues (see travis build information and .travis.yml to see what is run)

    • pydocstyle
    • flake8
  • Document the new features in the documentation

As said above, if you want to do it, feel free to ask any questions and I will help!

@turmeric-blend
Copy link
Contributor Author

turmeric-blend commented Dec 3, 2020

By the way, before continuing I just want to ask about the weights output from NumericalMarkowitz allocator, having small negative values, shouldn't it be all positive?

for this loss function this term_2 = torch.sum(b*torch.log(weights.permute((0,2,1))), dim=1).unsqueeze(dim=-1) line is returning nan values cause by torch.log(weights.permute((0,2,1))) taking the log of a negative value. These negative values are coming from the weights which are the outputs of NumericalMarkowitz allocator. Not sure why NumericalMarkowitz is returning negative values but these values are small on the order of 10^-8.

you mentioned something

The way you wrote the loss function (containing log) it would be impossible to have w_i=0 which is really limiting in general.

but I still didnt get it :/

@jankrepl
Copy link
Owner

jankrepl commented Dec 3, 2020

By the way, before continuing I just want to ask about the weights output from NumericalMarkowitz allocator, having small negative values, shouldn't it be all positive?

for this loss function this term_2 = torch.sum(b*torch.log(weights.permute((0,2,1))), dim=1).unsqueeze(dim=-1) line is returning nan values cause by torch.log(weights.permute((0,2,1))) taking the log of a negative value. These negative values are coming from the weights which are the outputs of NumericalMarkowitz allocator. Not sure why NumericalMarkowitz is returning negative values but these values are small on the order of 10^-8.

you mentioned something

The way you wrote the loss function (containing log) it would be impossible to have w_i=0 which is really limiting in general.

but I still didnt get it :/

Sorry for the confusion. NumericalMarkowitz can indeed return small negative values because everything is solved numerically. I guess one can even create an issue for this and solve it with some "manual" rescaling. However, if the optimization algorithm converged, then this should be mostly a cosmetic adjustment that is not going to affect the loss.

Nonpositive weights can become an annoying problem when the loss function only works on "long only" portfolios. For example, if the loss formula contains logarithms of weights or similar (as in your initial implementation of RiskParity). The annoying thing about logarithms is that they are even undefined for 0. However, it is totally feasible for any allocation layer to give 0 weight to multiple assets. With that being said, the new formulation that we discussed above is better because small negative or zero weights are not going to lead to errors.

I hope what I wrote makes sense:) And hopefully I answered your question.

@turmeric-blend
Copy link
Contributor Author

turmeric-blend commented Dec 4, 2020

Sorry for the confusion. NumericalMarkowitz can indeed return small negative values because everything is solved numerically. I guess one can even create an issue for this and solve it with some "manual" rescaling. However, if the optimization algorithm converged, then this should be mostly a cosmetic adjustment that is not going to affect the loss.

Since any allocator layers using the numerical method would run into small negative values issue, and the goal is to have the optimization algorithm converge anyway, why not just perform manual rescaling (minimum 0 weights) at the output as a standard across all numerical based allocator layers since it doesn't really affect performance. This way we can have more flexibility in the loss functions.

At least this way loss functions can be design without worrying about negative weights (at least the intuition/logic is still there where weights can't be negative). Yes 0 is still an issue for log but that would be a loss function specific issue.

@jankrepl
Copy link
Owner

jankrepl commented Dec 4, 2020

Sorry for the confusion. NumericalMarkowitz can indeed return small negative values because everything is solved numerically. I guess one can even create an issue for this and solve it with some "manual" rescaling. However, if the optimization algorithm converged, then this should be mostly a cosmetic adjustment that is not going to affect the loss.

Since any allocator layers using the numerical method would run into small negative values issue, and the goal is to have the optimization algorithm converge anyway, why not just perform manual rescaling (minimum 0 weights) at the output as a standard across all numerical based allocator layers since it doesn't really affect performance. This way we can have more flexibility in the loss functions.

At least this way loss functions can be design without worrying about negative weights (at least the intuition/logic is still there where weights can't be negative). Yes 0 is still an issue for log but that would be a loss function specific issue.

I totally agree. I created an issue #102 and I suggest we address it in a totally separate PR. Does that sound good? When it comes to this PR I guess we can assume for now that the weights are "correct".

@turmeric-blend
Copy link
Contributor Author

Here is the Allocator Layer:

class NumericalRiskBudgeting(nn.Module):
    """Convex optimization layer stylized into portfolio optimization problem.

    Parameters
    ----------
    n_assets : int
        Number of assets.

    Attributes
    ----------
    cvxpylayer : CvxpyLayer
        Custom layer used by a third party package called cvxpylayers.

    References
    ----------
    [1] https://github.com/cvxgrp/cvxpylayers
    [2] https://poseidon01.ssrn.com/delivery.php?ID=390114006122099106072078096104069112056050065027039051078098000067064086112072124109005035107005061025044074073029126125070080038037078052000085104024102000110030090033033078083120080087081020021019107089002123119030075084115101026124093011116092064098&EXT=pdf
    
    """
    
    def __init__(self, n_assets, max_weight=1):
        """Construct."""
        super().__init__()
        covmat_sqrt = cp.Parameter((n_assets, n_assets))                           
        b = cp.Parameter(n_assets, nonneg=True)  

        w = cp.Variable(n_assets)

        risk = 0.5 * cp.sum_squares(covmat_sqrt @ w)                             
        budget = b @ cp.log(w) 
        objective = cp.Minimize(risk - budget)                   
        constraint = [cp.sum(w) == 1,
                      w >= 0,
                      w <= max_weight]

        prob = cp.Problem(objective, constraint)

        assert prob.is_dpp()

        self.cvxpylayer = CvxpyLayer(prob, parameters=[covmat_sqrt, b], variables=[w])

    def forward(self, covmat_sqrt, b):
        """Perform forward pass.

        Parameters
        ----------
        covmat : torch.Tensor
            Of shape (n_samples, n_assets, n_assets) representing the covariance matrix.
        b : torch.Tensor
            Of shape (n_samples, n_assets) representing the budget, 
            risk contribution from each component (asset) is equal to the budget, refer [3]

        Returns
        -------
        weights : torch.Tensor
            Of shape (n_samples, n_assets) representing the optimal weights as determined by the convex optimizer.
        
        """
        return self.cvxpylayer(covmat_sqrt, b)[0]

Here is the Loss Function:

For the loss function I am a bit sceptical at my implementation, especially at option 1 vs option 2 see below code. The formula (option 1) states a dot product followed by a square and then sum, but this sum is required across the last 2 dimensions which is a bit unusual to me. On the other hand if we go for option 2 then it seems more natural to me.

class RiskParity(Loss):
    """Risk Parity Portfolio.
    
    Parameters
    ----------
    returns_channel : int
        Which channel of the `y` target represents returns.
    input_type : str, {'log', 'simple'}
        What type of returns are we dealing with in `y`.
    output_type : str, {'log', 'simple'}
        What type of returns are we dealing with in the output.
    
    References
    ----------
    [1] https://quant.stackexchange.com/questions/3114/how-to-construct-a-risk-parity-portfolio/3115#3115
    
    """

    def __init__(self, returns_channel=0, input_type='log', output_type='simple'):
        self.returns_channel = returns_channel
        self.covariance_layer = CovarianceMatrix(sqrt=False)
        self.input_type = input_type
        self.output_type = output_type
        
    def __call__(self, weights, y):
        """.
        
        Parameters
        ----------
        weights : torch.Tensor
            Tensor of shape `(n_samples, n_assets)` representing the predicted weights by our portfolio optimizer.
        y : torch.Tensor
            Tensor of shape `(n_samples, n_channels, horizon, n_assets)` representing the evolution over the next
            `horizon` timesteps.
        
        Returns
        -------
         torch.Tensor
            Tensor of shape `(n_samples,)` representing the per sample risk parity.
        
        """
        n_assets = weights.shape[-1]
        covar = self.covariance_layer(y[:, self.returns_channel, ...]) # (n_samples, n_assets, n_assets)
        
        weights = weights.unsqueeze(dim=1)
        volatility = torch.sqrt(torch.matmul(weights, torch.matmul(covar, weights.permute((0,2,1))))) # (n_samples, 1, 1)
        c = (covar * weights) / volatility # (n_samples, n_assets, n_assets)
        risk = volatility/n_assets # (n_samples, 1, 1)
        
        # option 1         <------
        budget = weights * c # (n_samples, n_assets, n_assets)
        rp = torch.sum(torch.sum((risk - budget)**2, dim=-1), dim=-1) # (n_samples,)
        
        # option 2         <------
        # budget = torch.matmul(weights, c) # (n_samples, n_assets, n_assets)
        # rp = torch.sum((risk - budget)**2, dim=-1).view(-1) # (n_samples,)
                
        return rp
        
    def __repr__(self):
        """Generate representation string."""
        return "{}(returns_channel={}, input_type='{}', output_type='{}', eps={})".format(self.__class__.__name__,
                                                                                          self.returns_channel,
                                                                                          self.input_type,
                                                                                          self.output_type)

@jankrepl
Copy link
Owner

jankrepl commented Dec 7, 2020

Amazing work:)

The allocator is perfect. Regarding the loss, see below my remarks

  • since you are not computing portfolio returns anywhere, you can totally drop the input_type and output_type from the __init__ as well as from the __repr__ (together with eps)
  • I think only the option 2 is correct, I validated it by writing the forward pass via list comprehensions that are easier to follow but probably less efficient.
def stupid_fpass(self, w, y):
    n_samples, n_assets = w.shape
    covar = self.covariance_layer(y[:, self.returns_channel, ...])

    var = torch.cat([(w[[i]] @ covar[i]) @ w[[i]].permute(1, 0) for i in range(n_samples)], dim=0)
    vol = torch.sqrt(var)

    lhs = vol / n_assets
    rhs = torch.cat([(1 / vol[i]) * w[[i]] * (w[[i]] @ covar[i]) for i in range(n_samples)], dim=0)

    res = torch.tensor([((lhs[i] - rhs[i]) ** 2).sum() for i in range(n_samples)])

    return res

Another validation I did was generating the input data in the following fashion.

n_samples = 15
n_channels = 1
horizon = 1000
n_assets = 7

w = torch.ones((n_samples, n_assets)) / n_assets

normal = torch.distributions.MultivariateNormal(torch.zeros(n_assets), torch.eye(n_assets))
y = normal.sample((n_samples, 1, horizon,))

So as we increase the horizon, the sample covariance matrix should be closer and closer to the identity matrix. And
since we are holding an equally weighted portfolio the loss should be going to zero which is the case for option 2.

Other than the remarks above, I did not notice any other problems:) Thank you very much! It is a cool addition:) Do you want to give it a shot and make it a part the source code (+ the other things I described in #98 (comment))?:)

@turmeric-blend
Copy link
Contributor Author

turmeric-blend commented Dec 11, 2020

Hi @jankrepl thanks for your review on this. I think I will let you implement the testing part as I am in no rush. This was more of a learning experience for me although I feel bad for leaving it like this. I've recently been caught up in some other stuff so I'll have to leave it like this for now, but when I am free again I'll definitely help out where possible (although might be late). Again, thanks for your patience!! It was much appreciated!! :)

@jankrepl
Copy link
Owner

@turmeric-blend Sure, no problem! I will do it:)))))

@codecov
Copy link

codecov bot commented Dec 19, 2020

Codecov Report

Merging #98 (7e5c8b6) into master (cab9cac) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master       #98   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           18        18           
  Lines         1733      1762   +29     
=========================================
+ Hits          1733      1762   +29     
Impacted Files Coverage Δ
deepdow/layers/__init__.py 100.00% <100.00%> (ø)
deepdow/layers/allocate.py 100.00% <100.00%> (ø)
deepdow/losses.py 100.00% <100.00%> (ø)
deepdow/callbacks.py 100.00% <0.00%> (ø)
deepdow/layers/misc.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cab9cac...4ce881a. Read the comment docs.

@jankrepl jankrepl merged commit 478776c into jankrepl:master Dec 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants