Fix padding_idx logical error in Adaptive Input (facebookresearch#1629)

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? I think if we keep pass **padding index of vocabulary** as `padding_idx` to adaptive embedding layers, there will be no chance to train some words. e.g. If `cut_off` is (20000,60000) and vocab is larger than 60000, we can't learn[**20,000+padding_idx**]th word and [**60,000+padding_idx**]th word. Because those words' ids will be **padding_idx** by subtraction logic and eventually get zero tensors. So, I changed `self.padding_idx` to `None` after assign vocab's `padding_idx` **for the first time at head embedding representation**. Pull Request resolved: facebookresearch#1629 Differential Revision: D19557340 Pulled By: myleott fbshipit-source-id: e0c3b38862374d422a46dc62c248b2ecfbf08fd2
LLL-Orleans · Jan 24, 2020 · 4f71c63 · 4f71c63
1 parent 0ff92b9
commit 4f71c63
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/fairseq/modules/adaptive_input.py b/fairseq/modules/adaptive_input.py
@@ -39,10 +39,12 @@ def __init__(
             size = self.cutoff[i] - prev
             dim = int(initial_dim // (factor ** i))
             seq = nn.Sequential(
-                nn.Embedding(size, dim, padding_idx),
+                nn.Embedding(size, dim, self.padding_idx),
                 nn.Linear(dim, output_dim, bias=False)
             )
             self.embeddings.append(seq)
+            self.padding_idx = None
+        self.padding_idx = padding_idx
 
         def init_weights(m):
             if isinstance(m, nn.Embedding):