-
Notifications
You must be signed in to change notification settings - Fork 26
Add LongBench V2 benchmark #249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: eshwarprasadS <[email protected]>
Signed-off-by: eshwarprasadS <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @eshwarprasadS !
The PR has all of the right ideas, there are just a few minor changes that you'll want to make which I've outlined in this review. Once we've addressed those, this should be good to merge
src/instructlab/eval/longbench.py
Outdated
) / 2 | ||
|
||
# Calculate overall score | ||
all_scores = [v for k, v in eval_results.items() if k != "overall_score"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check if k != "overall_score"
? We shouldn't have set this key yet
…-cuda extras Signed-off-by: eshwarprasadS <[email protected]>
…y served openai-compatible model endpoints Signed-off-by: eshwarprasadS <[email protected]>
… name parameter Signed-off-by: eshwarprasadS <[email protected]>
Signed-off-by: eshwarprasadS <[email protected]>
Signed-off-by: eshwarprasadS <[email protected]>
Signed-off-by: eshwarprasadS <[email protected]>
Signed-off-by: eshwarprasadS <[email protected]>
Signed-off-by: eshwarprasadS <[email protected]>
Signed-off-by: eshwarprasadS <[email protected]>
Adding LongBench to eval options,
Install extras with:
pip install instructlab-eval[longbench]
Uses VLLM backend for serving the model for generation
Runs like so:
Output json looks like so: