Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some question about Deit's two [cls] token processing. #77

Open
liyunlongaaa opened this issue Jul 27, 2022 · 2 comments
Open

some question about Deit's two [cls] token processing. #77

liyunlongaaa opened this issue Jul 27, 2022 · 2 comments
Labels
question Further information is requested

Comments

@liyunlongaaa
Copy link

Hi, sorry to bother you. Why are the two special [CLS]tokens in DeiT said to be average as a single [CLS] token in the paper, but in the code I see that they are indeed cat together, what am I missing?

cls_tokens = self.v.cls_token.expand(B, -1, -1) 
dist_token = self.v.dist_token.expand(B, -1, -1)
x = torch.cat((cls_tokens, dist_token, x), dim=1)
@liyunlongaaa
Copy link
Author

oh, I see it.

x = (x[:, 0] + x[:, 1]) / 2
sorry to bother you. thank you for your good work, I am newer for my master's degree in the speech area, and I want to graduate but have to post a dissertation, thank you for helping me along the way, although I haven't issued a dissertation yet haha~

@liyunlongaaa liyunlongaaa reopened this Jul 27, 2022
@YuanGongND YuanGongND added the question Further information is requested label Jul 27, 2022
@YuanGongND
Copy link
Owner

To use DEIT initialization, we have to initialize in the same way as DEIT, but as you point out, we average it in the forward pass.

Good luck with your dissertation.

-Yuan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants