Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aws instance type to affinity terms in the pod template #3783

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

austinzh
Copy link

When a pool has multiple instance e.g: cpu and gpu mix pool. We would like to specify instance type

@austinzh austinzh force-pushed the u/austinzh/add-template branch from cecffa3 to 5dafcb4 Compare January 19, 2024 22:26
@austinzh austinzh force-pushed the u/austinzh/add-template branch from 5dafcb4 to 86537ac Compare January 19, 2024 22:31
@88manpreet
Copy link
Contributor

Code looks ok to me. Can you add unit-tests and few manual tests in the relevant ticket?

Copy link
Member

@nemacysts nemacysts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@austinzh just curious: how are we expecting folks to use this?

imo, the easiest (user-experience-wise, that is) approach would be to have a flag like --job-type X or something (where X could be things like generic, model-training, model-inference, etc) and ML Compute handles updating what instance types and whatnot those map to in the background - that way, Spark users don't need to worry about what instance types they need/want/can use

(that said, we would likely still want something like this for power-users and whatnot that want to run on specific hardware for whatever reason)

@@ -265,6 +249,11 @@ def add_subparser(subparsers):
default=default_spark_pool,
)

list_parser.add_argument(
"--aws-instance-types",
help="AWS instance types for executor, seperate by comma(,)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small wording edit:

Suggested change
help="AWS instance types for executor, seperate by comma(,)",
help="AWS instance types for executor, separated by commas (,)",

it might also be nice to have arparse handle the splitting for us with something like:

Suggested change
help="AWS instance types for executor, seperate by comma(,)",
help="AWS instance types for executor, separated by commas (,)",
type=lambda instances: [instance for instance in instances.split(","))

@@ -522,6 +511,47 @@ def should_enable_compact_bin_packing(disable_compact_bin_packing, cluster_manag
return True


# inplace add a low priority podAffinityTerm for compact bin packing
def add_compact_bin_packing_affinity_term(pod: Dict, spark_pod_label: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: i'd probably rename pod here to pod_template to reduce confusion

suggestion: if y'all ever want to get rid of the incompletely typed Dict here, a possible option would be to use the models from the kubernetes client (e.g., https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1PodTemplate.md) internally and then serialize to yaml at the very end :)

suggestion: imo, it's a little preferable to not mutate inputs in-place since pure functions are generally easier to work with/test - but it's not a particularly big deal :)

suggestion (if this remains an impure function): typing this as def add_compact_bin_packing_affinity_term(pod: Dict, spark_pod_label: str) -> None and removing the return would reduce confusion

(same points apply to add_node_affinity_terms() below)

].setdefault("nodeSelectorTerms", []).extend(
[
{
"key": "node.kubernetes.io/instance-type",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious: do we want users to specify an instance type (e.g., g4dn.xlarge vs g4dn.2xlarge) or would we be fine having them specify a family (e.g., g4dn) and letting karpenter spin up the most optimal instance type for the given requests at the time?

@nemacysts
Copy link
Member

@austinzh do we still want to get this merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants