-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a "drop_similar" argument to the TableVectorizer #1249
Comments
@GaelVaroquaux is this up for grabs? If yes, I'd like to contribute! :) |
@GaelVaroquaux does this change look good? Tests are passing although I can see |
Hi @Neilblaze , This is still up for grabs, and the change that you suggest are totally going in the right direction. However, unless I am wrong, we don't have the DropSimilar transformer yet. It is issue #1001 , the parent issue. That issue would need to be tackled first, and I suggest that we go through the whole process of tackling it and merging it before moving on with this one. Understanding the process will then make you more productive. Thanks!! |
@GaelVaroquaux Makes sense and thanks for the update! I'll keep track of #1001 for now, and hopefully when it gets merged into the |
There is no PR for #1001 for now. I'm not sure when we'll get it done. We'd love to, but we're shuffling so many things 😁 |
Yep, I saw that. So for the meantime, I'll try to grab some other open issues. Will ping here/on discord if I get stuck anywhere, and thanks! 😃 |
It would be great to make it really easy drop the redundant columns in the TableVectorizer (Using the DropSimilar transformer inside the TableVectorizer), by a simple additional argument.
This would have improvements both is speed/memory and maybe in statistical performance.
I realize that it makes the TableVectorizer even more a swiss-army knife than it currently is, but honestly, it's sooo useful and we use it everywhere, even as an element in complex pipelines.
The text was updated successfully, but these errors were encountered: