-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gram-based CD/BCD/FISTA solvers for (group)Lasso when n_samples >> n_features
#4
base: main
Are you sure you want to change the base?
Conversation
n_samples >> n_features
n_samples >> n_features
@PABannier thanks a lot ! I tried it and it really shines on the data by @sehoff Caveat: the stopping criteria are not the same. Monitoring the primal decrease is much looser than checking the duality gap. I adjusted the tolerance manually to obtain similar results |
@sehoff hard to tell without the data but consider increasing max_iter, probably the solver is not very far from convergence.You can run it in verbose mode to see if you stop far from convergence or not. It's easy to add warm start to the gram solvers, it should be done shortly. Beware in your comparison that the stopping criteria are not the same. |
Quick update:
|
n_samples >> n_features
n_samples >> n_features
@sehoff support for weights (inifnite ones too) is here. Let us know how it works ! |
Increasing max_iter works for me, thank you ! |
@sehoff For very small alphas, I'd recommend using FISTA instead. Have a look at the |
Here are the results of my comparisons, where I used the data I provided in the dropbox, and a |
@sehoff indeed, Celer is particularly efficient in settings where Besides, for your data, the Gram solver has a cheaper update since |
@sehoff Thanks for the plots. They are very insightful for us. By the way, if you ever find yourself in need for an automated way of benchmarking optimization routines, look at https://github.com/benchopt/benchopt |
@mathurinm do we want to keep this PR open? FISTA and Gram-solver have been merged |
This one supports groups while the merged PR doesn't, so it does not hurt to leave it open IMO |
note: this is a PR from the |
The goal of this PR is to write a CD (BCD) solver when
n_samples >> n_features
.Such configurations are solved much faster by pre-computing a Gram matrix XtX and updating the gradient (rather than the residuals) at every CD cycle.
A quick experiment with 1e6 samples and 300 features: