Skip to content

test: fix large job #496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

test: fix large job #496

wants to merge 1 commit into from

Conversation

cdoern
Copy link
Contributor

@cdoern cdoern commented Apr 24, 2025

Fixes: #494

Signed-off-by: Charlie Doern <[email protected]>
@mergify mergify bot added the CI/CD Affects CI/CD configuration label Apr 24, 2025
Copy link

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

@ktdreyer
Copy link
Contributor

I edited the PR description to link this with #496 , hope you don't mind.

From discussion with @cdoern today: A while ago the .github/workflows/e2e-nvidia-l40s-x4.yml file in this repository was copied from https://github.com/instructlab/instructlab/blob/main/.github/workflows/e2e-nvidia-l40s-x4.yml , and nothing/no one keeps it up-to-date with the instructlab copy. This training copy can fall behind the instructlab canonical version. @cdoern has copied the latest changes from instructlab into this repo in this PR.

Copy link

e2e workflow failed on this PR: View run, please investigate.

@booxter
Copy link
Contributor

booxter commented Apr 28, 2025

#496 was merged and linked issue closed. Do we still need this PR?

@ktdreyer
Copy link
Contributor

Yeah, we solved #494 with #496, rather than solving it in this PR.

Charlie's point from last week remains true: The .github/workflows/e2e-nvidia-l40s-x4.yml file in this tree was a copy of the one in instructlab's tree, and when we edit instructlab's copy without making those changes here as well, then this file falls out of date, and it's difficult to root-cause CI failures.

To fix the immediate issue, we could manually read through the .github/workflows/e2e-nvidia-l40s-x4.yml file in instructlab and make any synchronization changes here.

I'm not sure how we would improve this long-term. Maybe there are more code items or YAML blocks we could centralize to https://github.com/instructlab/ci-actions/ .

@cdoern cdoern closed this May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Affects CI/CD configuration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

e2e large ci failures
3 participants