Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to specify which language to use when removing stop words in the LSA primitive #190

Open
thehomebrewnerd opened this issue Aug 4, 2022 · 0 comments

Comments

@thehomebrewnerd
Copy link
Contributor

The LSA primitive applies a cleaning step that removes stop words. Currently this is hard-coded to remove English stop words:

swords = set(nltk.corpus.stopwords.words("english"))

The primitive should be updated to allow users to specify other languages that are supported by nltk so the primitive functions properly on natural language columns that are not in English.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant