Redundant outputs when performing in-context learning #77

chan-ming · 2024-01-16T11:40:57Z

Hello, thanks for the great work!
I am trying to do some VQA tasks using Emu2. I selected the examples following the same approach mentioned in the paper and used the same prompt. And I used the function 'generate' in the same way as the inference example in readme.
However, the model may repeat the question and generate redundant outputs when the number of in-context examples is 4 or 8, like the picture below. When the number of in-context examples is 16, the model won't generate redundant outputs. I wonder whether it's normal or I have made some mistakes. If there are mistakes, how can I fix the problem?
Thanks in advance!

yqy2001 · 2024-01-17T03:25:46Z

Yes, this is a normal phenomenon as the output format of pretrained model is hard to control. The behavior is also similar to that of LLaMA-1-30B, especially when the model is provided with a limited number of in-context examples.

We change the hyperparams and conduct post-processing on model's outputs to refine the answers:

hyperparameters for questions requiring short answers (e.g. VQAv2):
max_new_tokens: 10
min_length: 1
num_beams: 5
length_penalty: -1.0
post-process to refine the outputs' format:

def short_answer(answer):
    # shorten pretrained model's uncontrollable output for benchmark evaluation
    answer = answer.split('\n')[0]
    answer = answer.split('. ')[0]
    answer = answer.split('\"')[0]
    answer = answer.split(', ')[0]
    answer = answer.strip()
    answer = answer.lower()
    answer = answer if len(answer) == 0 or answer[-1] != '.' else answer[:-1]
    answer = answer.replace('it is ', '', 1) if answer.startswith('it is ') else answer
    answer = answer.replace('it\'s ', '', 1) if answer.startswith('it\'s ') else answer
    answer = answer.replace('a ', '', 1) if answer.startswith('a ') else answer
    answer = answer.replace('an ', '', 1) if answer.startswith('an ') else answer
    answer = answer.replace('the ', '', 1) if answer.startswith('the ') else answer
    return answer

By the way, the evaluation code will be released in the near future. We will notify you and you can also refer to our implementation then.

chan-ming · 2024-01-17T08:05:56Z

Thanks for your reply! I used the same hyper parameters and followed the post-process procedure and evaluation code of VQAv2. However, the VQA score is only 49 on OKVQA dataset when the number of in-context examples is 16. I wonder where the problem is.

sachit-menon · 2024-01-21T00:06:39Z

@yqy2001 thank you for the post-processing and hyperparameters! Do you have an approximate estimate for when the evaluation code might be released?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redundant outputs when performing in-context learning #77

Redundant outputs when performing in-context learning #77

chan-ming commented Jan 16, 2024

yqy2001 commented Jan 17, 2024

chan-ming commented Jan 17, 2024

sachit-menon commented Jan 21, 2024

Redundant outputs when performing in-context learning #77

Redundant outputs when performing in-context learning #77

Comments

chan-ming commented Jan 16, 2024

yqy2001 commented Jan 17, 2024

chan-ming commented Jan 17, 2024

sachit-menon commented Jan 21, 2024