-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aider benchmark, DeepSeek-6.7B-Instruct model hardly generates SEARCH/REPLACE blocks, leading to very low pass rates #192
Comments
it is the repo of qwen2.5-coder, maybe you should submit your issue to ds-coder? |
@cyente What "ds-coder" are you referring to? Thanks. |
@ytxmobile98 I think you need to set the |
Looks like |
Update 2024-12-11I have done some further work in the past two days, testing the Qwen2.5-7B-Instruct model and the DeepSeek-Coder-6.7B-Instruct model, and found one key cause: The benchmarking program relies on the search-replace blocks to copy code from the chat history and paste them in the *.py files. While the output of the Qwen2.5 model mostly follows the expected format, the DeepSeek model seems to output content as if it is solving a regular coding problem rather than a diff problem. Example:
|
Update 2024-12-11
I have found that when running the Aider benchmark with the DeepSeek-Coder-6.7B-Instruct model, most of the results generated by the model did not include the SEARCH/REPLACE blocks which is used by the benchmarking program to save the code into Python source files and run unit tests. See this comment.
Original post on 2024-11-29
I got some extraordinary low results on running Aider benchmark, with the DeepSeek-6.7B-Instruct model. When I inspected the output files, what most astonished me was that most of the output files do not contain valid solution code, but instead the original signature along with the
pass
statement. What steps did I miss to run the evaluations and get the desired results? Thanks.My results
Edit mode:
diff
Edit mode:
whole
The output
When I inspected the outputs, I noticed that the majority of the code files were not edited to contain the correct solution, but still left with the signature + a simple
pass
statement. For example theisogram
test case has the following outputisogram.py
:The model's config file
Meanwhile here is the model's
config.json
file:The bash scripts
run.sh
evaluate.sh
test.sh
The text was updated successfully, but these errors were encountered: