Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: the LLA algorithm for general folded concave penalties #185

Open
idc9 opened this issue Jan 18, 2021 · 6 comments
Open

Comments

@idc9
Copy link

idc9 commented Jan 18, 2021

I am very happy to that see someone implementing adaptive Lasso in Python (#169)! It would be great if celer also implemented the more general LLA algorithm for any folded concave penalty e.g. see One-step sparse estimates in nonconcave penalized likelihood models, (Zou and Li, 2008) and Strong oracle optimality of folded concave penalized estimation, (Fan el at. 2014). The LLA algorithm is a mild, but statistically very nice generalization of AdaptiveLasso.

The main differences between the general LLA algorithm and AdaptiveLasso are

  1. LLA typically uses a better initialize e.g. a Lasso solution or simply 0 instead of the least squares solution
  2. LLA allows for different penalties (e.g. using SCAD the LLA algorithm satisfies the desirable strong oracle property)

The LLA algorithm should be fairly straightforward to implement, granted I'm not yet very familiar with the backend of celer.

LLA algorithm sketch

User input:

  1. tuning parameter \lambda
  2. concave penalty function g_{\lambda} (e.g. SCAD, MCP)
  3. initial value, \beta^0
  4. stopping criteria, either A) stop after s = 1 steps (so called "one step estimator") or B) stop at convergence

for s= 1, 2, ....

w^s = compute Lasso weights at current guess \beta^{s-1}

\beta^{s} = solve weighted lasso problem using weights w^s

check stopping criteria
@josephsalmon
Copy link
Contributor

Thanks for the proposition and the references!
Actually we are working on that.

We mostly need a better naming than AdatativeLasso, since we are actually targeting what you describe.
Moreover, this majorization / minimization approach has many names in the literature:

We'll try to choose a more suitable name then...
I would lean toward "ReweigthedL1" since this is rather clear from the name what it does (and could also then be adapted to logistic regression versions).
wdyt @agramfort @mathurinm @arakotom

@idc9
Copy link
Author

idc9 commented Jan 18, 2021

Great to hear!

I agree reweighted L1 is a good name i.e. tells you exactly what the algorithm does. That being said the stats literature tends to use LLA (e.g. see the above references). This is relevant because of the importance of the one-step version of this algorithm that many users may want.

@mathurinm
Copy link
Owner

Hello @idc9, thanks for the interest.
For the reweighting scheme, I plan to have a few string options corresponding to known penalites (Lq, log), but the user will also be able to pass a callable so that at iteration iter + 1, weights[j] = weight(solution[iter][j]).
This should allow a large flexibility, together with the setting of n_reweightings.

As far as the naming goes, I think IterativeReweightedLasso is more explicit and corresponds to the truth ; we can duplicate the classe to AdaptiveLasso which seems to be the most popular naming. It'll requier a bit of explanation in the docstring.
I'll work on that in the upcoming month.

@idc9
Copy link
Author

idc9 commented Jan 19, 2021

Awesome! It might be worth adding SCAD and MCP to the defaults because of their nice statistical properties (and it's hard to find these implemented penalties in python!)

@josephsalmon
Copy link
Contributor

I totally agree for MCP (though I found SCAD overrated: https://hal.inria.fr/hal-01267701v2/document, section 5.2).

Moreover, in terms of history, I found an even earlier work where the iterative reweighted scheme was used for non-convex sparse regularization, in the signal/image community (if you know older, that would be nice):

"Sparse Multinomial Logistic Regression:Fast Algorithms and Generalization Bounds"
Balaji Krishnapuram, Lawrence Carin, Mario A.T. Figueiredo and Alexander J. Hartemink
http://www.lx.it.pt/~mtf/Krishnapuram_Carin_Figueiredo_Hartemink_2005.pdf

It does not help for the naming though...
And I'm fine with IterativeReweightedLasso.

@arakotom
Copy link

Hi all,
sorry for being late on this. IterativeReweightedLasso is a good naming and clear enough regardless of the underlying theoretical framework (DC or LLA).

is there any point I can help on this or most of the things have already been done?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants