Skip to content

Add LongBench V2 benchmark #249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

eshwarprasadS
Copy link

@eshwarprasadS eshwarprasadS commented Apr 30, 2025

Adding LongBench to eval options,

Install extras with:

pip install instructlab-eval[longbench]

Uses VLLM backend for serving the model for generation

Runs like so:

evaluator = LongBenchEvaluator(
    model_path="path/to/model",
    num_gpus=N,
    output_file="path/to/results.json",
    eval_config={"batch_size": "auto"},
    vllm_config={"max_model_len": max_len}
)

results = evaluator.run()  # Returns LongBenchResult

Output json looks like so:

{
  "en_multidoc": 0.5424139838230786,
  "zh_multidoc": 0.24335639081098673,
  "en_singledoc": 0.4233139199560039,
  "zh_singledoc": 0.46157875457875464,
  "en_summ": 0.27244809337990245,
  "zh_summ": 0.1359562304911904,
  "en_fewshot": 0.45692449627485754,
  "zh_fewshot": 0.24416666666666667,
  "en_synthetic": 0.3799285714285714,
  "zh_synthetic": 0.4775,
  "code_avg": 0.30225,
  "overall_score": 0.3581670097645466
}

@mergify mergify bot added dependencies Pull requests that update a dependency file ci-failure labels Apr 30, 2025
Copy link
Member

@RobotSail RobotSail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @eshwarprasadS !

The PR has all of the right ideas, there are just a few minor changes that you'll want to make which I've outlined in this review. Once we've addressed those, this should be good to merge

) / 2

# Calculate overall score
all_scores = [v for k, v in eval_results.items() if k != "overall_score"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check if k != "overall_score"? We shouldn't have set this key yet

@mergify mergify bot added ci-failure and removed ci-failure labels May 1, 2025
…y served openai-compatible model endpoints

Signed-off-by: eshwarprasadS <[email protected]>
@mergify mergify bot added ci-failure and removed ci-failure labels May 5, 2025
@mergify mergify bot added ci-failure and removed ci-failure labels May 9, 2025
Signed-off-by: eshwarprasadS <[email protected]>
@mergify mergify bot added ci-failure and removed ci-failure labels May 9, 2025
Signed-off-by: eshwarprasadS <[email protected]>
@mergify mergify bot added ci-failure and removed ci-failure labels May 13, 2025
Signed-off-by: eshwarprasadS <[email protected]>
@mergify mergify bot added ci-failure and removed ci-failure labels May 13, 2025
@mergify mergify bot added testing Relates to testing ci-failure and removed ci-failure labels May 13, 2025
@mergify mergify bot added ci-failure and removed ci-failure labels May 13, 2025
Signed-off-by: eshwarprasadS <[email protected]>
@mergify mergify bot added ci-failure and removed ci-failure labels May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure dependencies Pull requests that update a dependency file testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants