Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix padding_idx logical error in Adaptive Input (facebookresearch#1629)
Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? I think if we keep pass **padding index of vocabulary** as `padding_idx` to adaptive embedding layers, there will be no chance to train some words. e.g. If `cut_off` is (20000,60000) and vocab is larger than 60000, we can't learn[**20,000+padding_idx**]th word and [**60,000+padding_idx**]th word. Because those words' ids will be **padding_idx** by subtraction logic and eventually get zero tensors. So, I changed `self.padding_idx` to `None` after assign vocab's `padding_idx` **for the first time at head embedding representation**. Pull Request resolved: facebookresearch#1629 Differential Revision: D19557340 Pulled By: myleott fbshipit-source-id: e0c3b38862374d422a46dc62c248b2ecfbf08fd2
- Loading branch information