Skip to content

Commit

Permalink
add references to the paper
Browse files Browse the repository at this point in the history
  • Loading branch information
bkoseoglu committed Oct 10, 2024
1 parent c9429c1 commit 37254bb
Showing 1 changed file with 15 additions and 1 deletion.
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
The basic idea is running a linear or logistic regression of the target on the Shapley values of
the original features, on the validation set,
discarding the features with negative coefficients, and ranking/filtering the rest according to their
statistical significance. For motivation and details, see the [example notebook](https://github.com/transferwise/shap-select/blob/main/docs/Quick%20feature%20selection%20through%20regression%20on%20Shapley%20values.ipynb)
statistical significance. For motivation and details, refer to our [research paper](https://arxiv.org/abs/2410.06815) see the [example notebook](https://github.com/transferwise/shap-select/blob/main/docs/Quick%20feature%20selection%20through%20regression%20on%20Shapley%20values.ipynb)

Earlier packages using Shapley values for feature selection exist, the advantages of this one are
* Regression on the **validation set** to combat overfitting
Expand Down Expand Up @@ -109,3 +109,17 @@ selected_features_df = shap_select(model, X_val, y_val, task="multiclass", thres
</table>


## Citation

If you use `shap-select` in your research, please cite our paper:

```bibtex
@misc{kraev2024shapselectlightweightfeatureselection,
title={Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression},
author={Egor Kraev and Baran Koseoglu and Luca Traverso and Mohammed Topiwalla},
year={2024},
eprint={2410.06815},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.06815},
}

0 comments on commit 37254bb

Please sign in to comment.