Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of methodology for modeling generative AI systems #221

Merged
merged 20 commits into from
Oct 3, 2024

Conversation

bokelley
Copy link
Contributor

No description provided.

docs/overview.mdx Outdated Show resolved Hide resolved
docs/overview.mdx Outdated Show resolved Hide resolved
docs/overview.mdx Outdated Show resolved Hide resolved
docs/overview.mdx Outdated Show resolved Hide resolved
docs/overview.mdx Outdated Show resolved Hide resolved
docs/overview.mdx Show resolved Hide resolved
docs/overview.mdx Outdated Show resolved Hide resolved
@@ -0,0 +1,12 @@
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this snippet used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, but figured we'd want it when we do the model later? Wasn't sure if we'd do an AI variant of calculations.mdx, so really just here as a placeholder.

| --------- | -------------- |
| Total reserved time | 118 days |
| Reservation start time | January 2022 (?) |
| GPU hours for final model | 1,082,990 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is GPU hours suppose to be the hours where the GPUs were used during the intermediate and final training?

In this case with a cluster of 384 GPUs, this total GPU hours represents 117.5 (1082990/384/24) cluster days, which is the same as total reserved time.

I would have thought total reserved time in days to be more than the GPU hours for final model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - that's where we backfill the intermediate down below when we normalize. If you have ideas on how to clarify please lmk


Embodied emissions = (cluster embodied emissions per hour) x (training time)

Usage emissions = (usage energy per GPU-hour) x (total GPU hours) x (average grid intensity during training)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if instead of total GPU hours, we had GPU hours utilized by hour/time, we could potentially leverage the actual grid mix during that hour. i'm thinking of cases where potentially training could be paused and resumed to only happen during times where marginal grid mix intensity is low

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep agreed - though I was thinking this would be easiest to represent as a lower average grid mix?

@MikeFreyberger MikeFreyberger changed the base branch from main to preview October 3, 2024 18:24
@bokelley bokelley merged commit 6379647 into preview Oct 3, 2024
2 checks passed
@bokelley bokelley deleted the genai branch October 3, 2024 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants