Skip to content

Add Baseline to database #285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 42 commits into
base: main
Choose a base branch
from
Open

Add Baseline to database #285

wants to merge 42 commits into from

Conversation

PaliC
Copy link
Collaborator

@PaliC PaliC commented May 28, 2025

This PR adds a baseline run command. This (by design) is a ranked leaderboard run, except with the caveats of 1) It just uses the reference implementation, 2) The user is always predetermined, and 3) It can only be run by admins. This is a followup to #283

The propose of this is so that we can have a baseline in the database. This way we can do aggregates relative to the baseline such as avg improvement against baseline, max improvement against baseline, etc.

It would look something like this
Screenshot 2025-05-29 at 3 56 41 PM

Copilot generated summary for consumption

This pull request introduces significant changes to support baseline runs in the evaluation and leaderboard systems, alongside updates to workflows and administrative commands. The key updates include enabling baseline submissions, modifying the evaluation logic to handle baseline runs, and adding a new admin command to trigger baseline runs.

Support for Baseline Runs:

Administrative Enhancements:

Workflow Updates:

Miscellaneous Improvements:

@PaliC PaliC marked this pull request as ready for review May 30, 2025 17:10
@Copilot Copilot AI review requested due to automatic review settings May 30, 2025 17:10
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a “baseline” submission mode for reference runs alongside existing test, benchmark, profile, and leaderboard modes. It updates the pipeline, database, command handlers, and examples to detect, store, and report baseline runs.

  • Add detection and special handling of baseline submissions in prepare_submission, build_task_config, and run_config
  • Extend run_eval.py, reporting, and scoring logic to include baseline runs
  • Provide admin command to trigger baseline runs and update the example eval script

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/discord-cluster-manager/utils.py Added baseline args and reset logic in build_task_config
src/discord-cluster-manager/submission.py Detect baseline submissions before filename/content checks
src/discord-cluster-manager/run_eval.py Support “baseline” mode in evaluation and PyTorch runner
src/discord-cluster-manager/report.py Include baseline run in short report
src/discord-cluster-manager/leaderboard_db.py Import baseline constants, create baseline entries, check for existing baseline run
src/discord-cluster-manager/consts.py Define BASELINE_USER and BASELINE_USER_ID
src/discord-cluster-manager/cogs/submit_cog.py Adjust score logic to choose between leaderboard or baseline
src/discord-cluster-manager/cogs/leaderboard_cog.py Route submissions with no script as baseline
src/discord-cluster-manager/cogs/admin_cog.py Add admin command to trigger baseline runs
examples/eval.py Update example eval script to branch on baseline mode
.github/workflows/nvidia_workflow.yml Bump checkout action and wrap input-file creation in retry
Comments suppressed due to low confidence (3)

src/discord-cluster-manager/utils.py:248

  • SubmissionMode is referenced here without an import, and args is unconditionally reset to an empty list, which will override any existing arguments for non-baseline modes. Import SubmissionMode at the top and only initialize or clear args when mode == SubmissionMode.BASELINE, preserving the intended args in other cases.
args = []

examples/eval.py:159

  • The name active_kernel is not defined in this scope, causing a NameError. Introduce a local definition (e.g., active_kernel = custom_kernel or ref_kernel based on a baseline flag) before using it.
submission_output = active_kernel(_clone_data(data))

examples/eval.py:312

  • Similar to the test runner, active_kernel is undefined here and will raise a NameError. Define active_kernel (e.g., based on is_baseline_run) before its use or revert to custom_kernel.
submission_output = active_kernel(_clone_data(data))

@PaliC PaliC requested review from msaroufim, ngc92 and S1ro1 May 30, 2025 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant