You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I want to know if the image feature and multi modal feature has position meaning to the original image?
Like the blip2_feature_extractor produce (1,32,768) for both image feature and multi modal feature, are they corresponding to the same patch? And does the patch follow the image order like from 0 to 31, it is corresponding to vision encoder's split?
The text was updated successfully, but these errors were encountered:
Hi, I want to know if the image feature and multi modal feature has position meaning to the original image?
Like the blip2_feature_extractor produce (1,32,768) for both image feature and multi modal feature, are they corresponding to the same patch? And does the patch follow the image order like from 0 to 31, it is corresponding to vision encoder's split?
The text was updated successfully, but these errors were encountered: