Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the word_index of the dictionary of imdb_dataset are not unique? #121

Open
danli349 opened this issue Jan 3, 2025 · 1 comment
Open

Comments

@danli349
Copy link

danli349 commented Jan 3, 2025

Hello:

Why the word_index of the dictionary of imdb_dataset are not unique?
Thanks

max_features <- 10000
imdb_train <- imdb_dataset(
  root = ".", 
  download = TRUE,
  split="train",
  num_words = max_features
)
word_index <- imdb_train$vocabulary
head(table(word_index))

word_index
31 32 33 34 35 36
130 216 196 197 165 160

@cregouby
Copy link
Contributor

Hello @danli349,

I guess your issue is related to {torchdatasets}, not to this project.

Would you be kind to open the issue in torchdatasets/issues ? and then to close issue in here ?

And also I would encourage you to use reprex::reprex() to include reproductible example to your issue, everyone would have spot immediately the mismatch.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants