-
Notifications
You must be signed in to change notification settings - Fork 12
Add Baseline to database #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a “baseline” submission mode for reference runs alongside existing test, benchmark, profile, and leaderboard modes. It updates the pipeline, database, command handlers, and examples to detect, store, and report baseline runs.
- Add detection and special handling of baseline submissions in
prepare_submission
,build_task_config
, andrun_config
- Extend
run_eval.py
, reporting, and scoring logic to include baseline runs - Provide admin command to trigger baseline runs and update the example eval script
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
src/discord-cluster-manager/utils.py | Added baseline args and reset logic in build_task_config |
src/discord-cluster-manager/submission.py | Detect baseline submissions before filename/content checks |
src/discord-cluster-manager/run_eval.py | Support “baseline” mode in evaluation and PyTorch runner |
src/discord-cluster-manager/report.py | Include baseline run in short report |
src/discord-cluster-manager/leaderboard_db.py | Import baseline constants, create baseline entries, check for existing baseline run |
src/discord-cluster-manager/consts.py | Define BASELINE_USER and BASELINE_USER_ID |
src/discord-cluster-manager/cogs/submit_cog.py | Adjust score logic to choose between leaderboard or baseline |
src/discord-cluster-manager/cogs/leaderboard_cog.py | Route submissions with no script as baseline |
src/discord-cluster-manager/cogs/admin_cog.py | Add admin command to trigger baseline runs |
examples/eval.py | Update example eval script to branch on baseline mode |
.github/workflows/nvidia_workflow.yml | Bump checkout action and wrap input-file creation in retry |
Comments suppressed due to low confidence (3)
src/discord-cluster-manager/utils.py:248
SubmissionMode
is referenced here without an import, andargs
is unconditionally reset to an empty list, which will override any existing arguments for non-baseline modes. ImportSubmissionMode
at the top and only initialize or clearargs
whenmode == SubmissionMode.BASELINE
, preserving the intended args in other cases.
args = []
examples/eval.py:159
- The name
active_kernel
is not defined in this scope, causing aNameError
. Introduce a local definition (e.g.,active_kernel = custom_kernel
orref_kernel
based on a baseline flag) before using it.
submission_output = active_kernel(_clone_data(data))
examples/eval.py:312
- Similar to the test runner,
active_kernel
is undefined here and will raise aNameError
. Defineactive_kernel
(e.g., based onis_baseline_run
) before its use or revert tocustom_kernel
.
submission_output = active_kernel(_clone_data(data))
This PR adds a baseline run command. This (by design) is a ranked leaderboard run, except with the caveats of 1) It just uses the reference implementation, 2) The user is always predetermined, and 3) It can only be run by admins. This is a followup to #283
The propose of this is so that we can have a baseline in the database. This way we can do aggregates relative to the baseline such as avg improvement against baseline, max improvement against baseline, etc.
It would look something like this

Copilot generated summary for consumption
This pull request introduces significant changes to support baseline runs in the evaluation and leaderboard systems, alongside updates to workflows and administrative commands. The key updates include enabling baseline submissions, modifying the evaluation logic to handle baseline runs, and adding a new admin command to trigger baseline runs.
Support for Baseline Runs:
examples/eval.py
: Added logic to differentiate between baseline and leaderboard runs, including anis_baseline_run
parameter in benchmarking functions and the use ofref_kernel
for baseline runs instead ofcustom_kernel
. [1] [2] [3] [4] [5] [6]src/discord-cluster-manager/cogs/leaderboard_cog.py
: Updated submission logic to handle baseline runs without requiring a script and to create fake baseline submissions with predefined user data. [1] [2] [3] [4] [5] [6] [7]Administrative Enhancements:
src/discord-cluster-manager/cogs/admin_cog.py
: Added a new admin commandbaseline-run
to create or force-create baseline runs for leaderboards. This includes checks for existing baseline runs and integration with the submission system. [1] [2]Workflow Updates:
.github/workflows/nvidia_workflow.yml
: Upgradedactions/checkout
to version 4 and added thenick-fields/retry
action to improve reliability in creating input files.Miscellaneous Improvements:
examples/eval.py
: Enhanced themain()
function to handle a newbaseline
mode and improved error handling for unimplemented modes. [1] [2]src/discord-cluster-manager/cogs/submit_cog.py
: Updated leaderboard scoring logic to account for baseline runs in addition to leaderboard runs.