Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gram-based CD/BCD/FISTA solvers for (group)Lasso when n_samples >> n_features #4

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

PABannier
Copy link
Collaborator

@PABannier PABannier commented Apr 20, 2022

The goal of this PR is to write a CD (BCD) solver when n_samples >> n_features.
Such configurations are solved much faster by pre-computing a Gram matrix XtX and updating the gradient (rather than the residuals) at every CD cycle.

A quick experiment with 1e6 samples and 300 features:

########
Lasso
########

Celer: 5.43s
Gram: 1.89s

###########
Group Lasso
###########

Celer: 43.41s
Gram: 4.23s

@mathurinm mathurinm marked this pull request as draft April 21, 2022 06:20
@mathurinm mathurinm changed the title Faster CD solver when n_samples >> n_features Implement Gram-based CD/BCD solver when n_samples >> n_features Apr 21, 2022
@mathurinm
Copy link
Collaborator

@PABannier thanks a lot ! I tried it and it really shines on the data by @sehoff

Caveat: the stopping criteria are not the same. Monitoring the primal decrease is much looser than checking the duality gap. I adjusted the tolerance manually to obtain similar results

@sehoff
Copy link

sehoff commented Apr 22, 2022

Thanks a lot! I tried it out (on slightly different data than I provided), and can confirm sizable speedups. I use warm-starts for celer, so I guess the figures are even biased towards celer.

Note: in the current setting of the gram-solver: res = gram_group_lasso(X, y, a, groups=grps, tol=1e-10, max_iter=10_000, check_freq=10) with very small alphas (alpha_max/1000), the solver does not convergence after max_iter.

Do you have any suggestion on a reasonable decrease in the tolerance? Because for tol=1e-9 it reaches convergence after ~3500 iterations.
speed_comparison_celer_gram
.

skglm/solvers/gram.py Outdated Show resolved Hide resolved
@mathurinm
Copy link
Collaborator

@sehoff hard to tell without the data but consider increasing max_iter, probably the solver is not very far from convergence.You can run it in verbose mode to see if you stop far from convergence or not.

It's easy to add warm start to the gram solvers, it should be done shortly. Beware in your comparison that the stopping criteria are not the same.

@PABannier
Copy link
Collaborator Author

PABannier commented Apr 24, 2022

Quick update:

  • Updated stopping criterion to duality gap
  • Gram CD
  • Gram BCD
  • Gram CD - FISTA
  • Gram BCD - FISTA

skglm/solvers/gram.py Outdated Show resolved Hide resolved
skglm/solvers/gram.py Outdated Show resolved Hide resolved
@mathurinm mathurinm changed the title Implement Gram-based CD/BCD solver when n_samples >> n_features Gram-based CD/BCD/FISTA solvers for (group)Lasso when n_samples >> n_features Apr 25, 2022
@mathurinm
Copy link
Collaborator

mathurinm commented Apr 25, 2022

@sehoff support for weights (inifnite ones too) is here. Let us know how it works !

@sehoff
Copy link

sehoff commented Apr 26, 2022

@sehoff hard to tell without the data but consider increasing max_iter, probably the solver is not very far from convergence.You can run it in verbose mode to see if you stop far from convergence or not.

It's easy to add warm start to the gram solvers, it should be done shortly. Beware in your comparison that the stopping criteria are not the same.

Increasing max_iter works for me, thank you !

@PABannier
Copy link
Collaborator Author

@sehoff For very small alphas, I'd recommend using FISTA instead. Have a look at the gram_fista_group_lasso function!

@sehoff
Copy link

sehoff commented Apr 26, 2022

@sehoff support for weights (inifnite ones too) is here. Let us know how it works !
@sehoff For very small alphas, I'd recommend using FISTA instead. Have a look at the gram_fista_group_lasso function!

Here are the results of my comparisons, where I used the data I provided in the dropbox, and a tol=1e-8 in all solvers. Furthermore, the results I show refer to Case 3.1., i.e., with one weight set to infinity. As a side remark, this highlights that all three solvers can handle infinite weights!
Bottomline: especially for (very) small alphas the Gram-based solvers outperform the one in celer significantly. So depending on the grid one eventually searches, each solver has its advantages.
Comparison 1:
speed_comparison_celer_gram
Comparison 2: Note: I leave out celer here, because for alpha_max/1000 it takes quite long for convergence.
speed_comparison_bcd_fista

@PABannier
Copy link
Collaborator Author

@sehoff indeed, Celer is particularly efficient in settings where n_features >> n_samples. It implements a working set strategy, that is particularly useful in high-regularization regime (where there are few active features). In your first figure, for low level of regularization, since your design matrix has way more samples than features, Celer working set strategy is less useful.

Besides, for your data, the Gram solver has a cheaper update since X.T @ X is of size (n_features, n_features) (X.T @ X being a useful quantity at every gradient update).

@PABannier
Copy link
Collaborator Author

@sehoff Thanks for the plots. They are very insightful for us. By the way, if you ever find yourself in need for an automated way of benchmarking optimization routines, look at https://github.com/benchopt/benchopt

@PABannier
Copy link
Collaborator Author

@mathurinm do we want to keep this PR open? FISTA and Gram-solver have been merged

@mathurinm
Copy link
Collaborator

This one supports groups while the merged PR doesn't, so it does not hurt to leave it open IMO

@mathurinm
Copy link
Collaborator

note: this is a PR from the skglm repo, instead it should be done from a branch of your fork @PABannier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants