Add Baseline to database #285

PaliC · 2025-05-28T20:08:37Z

This PR adds a baseline run command. This (by design) is a ranked leaderboard run, except with the caveats of 1) It just uses the reference implementation, 2) The user is always predetermined, and 3) It can only be run by admins. This is a followup to #283

The propose of this is so that we can have a baseline in the database. This way we can do aggregates relative to the baseline such as avg improvement against baseline, max improvement against baseline, etc.

It would look something like this

Copilot generated summary for consumption

This pull request introduces significant changes to support baseline runs in the evaluation and leaderboard systems, alongside updates to workflows and administrative commands. The key updates include enabling baseline submissions, modifying the evaluation logic to handle baseline runs, and adding a new admin command to trigger baseline runs.

Support for Baseline Runs:

examples/eval.py: Added logic to differentiate between baseline and leaderboard runs, including an is_baseline_run parameter in benchmarking functions and the use of ref_kernel for baseline runs instead of custom_kernel. [1] [2] [3] [4] [5] [6]
src/discord-cluster-manager/cogs/leaderboard_cog.py: Updated submission logic to handle baseline runs without requiring a script and to create fake baseline submissions with predefined user data. [1] [2] [3] [4] [5] [6] [7]

Administrative Enhancements:

src/discord-cluster-manager/cogs/admin_cog.py: Added a new admin command baseline-run to create or force-create baseline runs for leaderboards. This includes checks for existing baseline runs and integration with the submission system. [1] [2]

Workflow Updates:

.github/workflows/nvidia_workflow.yml: Upgraded actions/checkout to version 4 and added the nick-fields/retry action to improve reliability in creating input files.

Miscellaneous Improvements:

examples/eval.py: Enhanced the main() function to handle a new baseline mode and improved error handling for unimplemented modes. [1] [2]
src/discord-cluster-manager/cogs/submit_cog.py: Updated leaderboard scoring logic to account for baseline runs in addition to leaderboard runs.

Copilot

Pull Request Overview

This PR introduces a “baseline” submission mode for reference runs alongside existing test, benchmark, profile, and leaderboard modes. It updates the pipeline, database, command handlers, and examples to detect, store, and report baseline runs.

Add detection and special handling of baseline submissions in prepare_submission, build_task_config, and run_config
Extend run_eval.py, reporting, and scoring logic to include baseline runs
Provide admin command to trigger baseline runs and update the example eval script

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/discord-cluster-manager/utils.py	Added baseline args and reset logic in `build_task_config`
src/discord-cluster-manager/submission.py	Detect baseline submissions before filename/content checks
src/discord-cluster-manager/run_eval.py	Support “baseline” mode in evaluation and PyTorch runner
src/discord-cluster-manager/report.py	Include baseline run in short report
src/discord-cluster-manager/leaderboard_db.py	Import baseline constants, create baseline entries, check for existing baseline run
src/discord-cluster-manager/consts.py	Define `BASELINE_USER` and `BASELINE_USER_ID`
src/discord-cluster-manager/cogs/submit_cog.py	Adjust score logic to choose between leaderboard or baseline
src/discord-cluster-manager/cogs/leaderboard_cog.py	Route submissions with no script as baseline
src/discord-cluster-manager/cogs/admin_cog.py	Add admin command to trigger baseline runs
examples/eval.py	Update example eval script to branch on `baseline` mode
.github/workflows/nvidia_workflow.yml	Bump checkout action and wrap input-file creation in retry

Comments suppressed due to low confidence (3)

src/discord-cluster-manager/utils.py:248

SubmissionMode is referenced here without an import, and args is unconditionally reset to an empty list, which will override any existing arguments for non-baseline modes. Import SubmissionMode at the top and only initialize or clear args when mode == SubmissionMode.BASELINE, preserving the intended args in other cases.

args = []

examples/eval.py:159

The name active_kernel is not defined in this scope, causing a NameError. Introduce a local definition (e.g., active_kernel = custom_kernel or ref_kernel based on a baseline flag) before using it.

submission_output = active_kernel(_clone_data(data))

examples/eval.py:312

Similar to the test runner, active_kernel is undefined here and will raise a NameError. Define active_kernel (e.g., based on is_baseline_run) before its use or revert to custom_kernel.

submission_output = active_kernel(_clone_data(data))

PaliC added 30 commits May 28, 2025 13:08

push to test workflow

77d04e1

test push

f42f885

test push

1c5d122

test push

f9b0ea3

test push

729045e

test push

72ac686

test push

54a2418

test push

089445b

test push

b1c1152

test push

e3b49dd

test push

110aea6

test push

371cad8

test push

cb81911

test push

dc17b30

test push

c2a5367

test push

19647d2

test push

a240485

test push

7f1e04a

test push

104d1c9

test push

1f54c96

test push

7f8cf64

test push

79672f0

test push

419fb22

test push

ea8812c

test push

13b2aef

test push

3ae64d5

test push

f265a37

test push

5b8c698

test push

fdb1ed6

test push

2e48944

PaliC added 9 commits May 29, 2025 10:41

test push

d78119c

test push

474d9c4

it works now cleanup

6fda930

cleanup

8967956

test push

4bc6e8d

test push

f1eb9a7

fix lint errors and cleanup

748a537

fix lint errors and cleanup

8224cb0

final cleanup

9ea4e77

PaliC marked this pull request as ready for review May 30, 2025 17:10

Copilot AI review requested due to automatic review settings May 30, 2025 17:10

Copilot AI reviewed May 30, 2025

View reviewed changes

PaliC requested review from msaroufim, ngc92 and S1ro1 May 30, 2025 17:34

PaliC added 3 commits May 30, 2025 10:57

get full error to fix bug in ci

d9b2d02

fix ci bug

ad9e758

fix ci bug

6a365ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Baseline to database #285

Add Baseline to database #285

Uh oh!

PaliC commented May 28, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Add Baseline to database #285

Are you sure you want to change the base?

Add Baseline to database #285

Uh oh!

Conversation

PaliC commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Copilot generated summary for consumption

Support for Baseline Runs:

Administrative Enhancements:

Workflow Updates:

Miscellaneous Improvements:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

PaliC commented May 28, 2025 •

edited

Loading