-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redundant outputs when performing in-context learning #77
Comments
Yes, this is a normal phenomenon as the output format of pretrained model is hard to control. The behavior is also similar to that of LLaMA-1-30B, especially when the model is provided with a limited number of in-context examples. We change the hyperparams and conduct post-processing on model's outputs to refine the answers:
def short_answer(answer):
# shorten pretrained model's uncontrollable output for benchmark evaluation
answer = answer.split('\n')[0]
answer = answer.split('. ')[0]
answer = answer.split('\"')[0]
answer = answer.split(', ')[0]
answer = answer.strip()
answer = answer.lower()
answer = answer if len(answer) == 0 or answer[-1] != '.' else answer[:-1]
answer = answer.replace('it is ', '', 1) if answer.startswith('it is ') else answer
answer = answer.replace('it\'s ', '', 1) if answer.startswith('it\'s ') else answer
answer = answer.replace('a ', '', 1) if answer.startswith('a ') else answer
answer = answer.replace('an ', '', 1) if answer.startswith('an ') else answer
answer = answer.replace('the ', '', 1) if answer.startswith('the ') else answer
return answer By the way, the evaluation code will be released in the near future. We will notify you and you can also refer to our implementation then. |
Thanks for your reply! I used the same hyper parameters and followed the post-process procedure and evaluation code of VQAv2. However, the VQA score is only 49 on OKVQA dataset when the number of in-context examples is 16. I wonder where the problem is. |
@yqy2001 thank you for the post-processing and hyperparameters! Do you have an approximate estimate for when the evaluation code might be released? |
Hello, thanks for the great work!
I am trying to do some VQA tasks using Emu2. I selected the examples following the same approach mentioned in the paper and used the same prompt. And I used the function 'generate' in the same way as the inference example in readme.
However, the model may repeat the question and generate redundant outputs when the number of in-context examples is 4 or 8, like the picture below. When the number of in-context examples is 16, the model won't generate redundant outputs. I wonder whether it's normal or I have made some mistakes. If there are mistakes, how can I fix the problem?
Thanks in advance!
The text was updated successfully, but these errors were encountered: