-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add ability to run tasks dataproc. #948
Conversation
…s into benb/run_pipeline_on_dataproc_task
…s into benb/run_pipeline_on_dataproc_task
) | ||
except google.api_core.exceptions.NotFound: | ||
return False | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this else
needed here? won't the following lines be executed anyway if no exception is thrown in the try
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah... I was getting a lint error if I had the return within the try
. I don't think the rule properly accounts for the return
in the expect
block. I can lift them out of the "else" no problem though.
request={ | ||
'project_id': Env.GCLOUD_PROJECT, | ||
'region': Env.GCLOUD_REGION, | ||
'job_id': f'{self.task_name}-{self.run_id}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, but you could make job_id an instance attribute self.job_id = f'{self.task_name}-{self.run_id}'
so that you do it in just 1 spot instead of 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -166,11 +175,13 @@ def run(self): | |||
'cluster': get_cluster_config(self.reference_genome, self.run_id), | |||
}, | |||
) | |||
while True: | |||
wait_s = 0 | |||
while wait_s < TIMEOUT_S: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we also want this waiting behavior in BaseRunJobOnDataprocTask
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it out because I couldn't think of a good timeout... there's going to be wide variability depending on who's using it and how much compute they as for. Setting an extreme timout (like 48 hours) might be better though.
* Add service account credentialing (#997) * Add service account credentialing * ruff * feat: Handle parsing empty predicted sex into Unknown (#1000) * Add helper functions for querying `Terra Data Repository` (#998) * Add service account credentialing * ruff * First pass * tests passing * add coverage of bigquery test * change function names * use generators everywhere * bq requirement * resolver * Update sample id name * Build Sex Check Table from TDR Metrics (#999) * refactor: Move feature flags to FeatureFlag enum. (#1002) * refactor: Move feature flags out of environment to their own dataclass * lint: ruff * ruff * bugfix: exclude samples from relationship checking that are not present in the expected loadable samples (#1003) * bugfix: exclude samples from relationship checking that are not present in the expected loadable samples * cleanup * feat: add remap and family loading failures as validation exceptions … (#1005) * feat: add remap and family loading failures as validation exceptions rather than runtime errors * move on * Update write_remapped_and_subsetted_callset_test.py * ruff * feat: Add ability to run tasks dataproc. (#948) * Support gcs dirs in rsync * ws * Add create dataproc cluster task * add dataproc * ruff * requirements * still struggling * Gencode refactor to remove gcs * bump reqs * Run dataproc job * lib * running * merge requirements * Flip'em * Better exception handling * Cleaner approach if less generalizable * write a test * Fix tests * lint * Add test for success * refactor to use a base class... better for adding support for multiple jobs * cleanup * ruff * Fix missing mock * Fix flapping test * pr comments
Note that it is not actually enabled and the feature flag isn't created here.