feat: Add ability to run tasks dataproc. #948

bpblanken · 2024-11-01T15:57:16Z

Note that it is not actually enabled and the feature flag isn't created here.

…s into benb/run_pipeline_on_dataproc_task

…e jobs

jklugherz · 2025-01-06T18:09:28Z

v03_pipeline/lib/tasks/dataproc/base_run_job_on_dataproc.py

+            )
+        except google.api_core.exceptions.NotFound:
+            return False
+        else:


is this else needed here? won't the following lines be executed anyway if no exception is thrown in the try block?

yeah... I was getting a lint error if I had the return within the try. I don't think the rule properly accounts for the return in the expect block. I can lift them out of the "else" no problem though.

jklugherz · 2025-01-06T18:12:48Z

v03_pipeline/lib/tasks/dataproc/base_run_job_on_dataproc.py

+                request={
+                    'project_id': Env.GCLOUD_PROJECT,
+                    'region': Env.GCLOUD_REGION,
+                    'job_id': f'{self.task_name}-{self.run_id}',


nit, but you could make job_id an instance attribute self.job_id = f'{self.task_name}-{self.run_id}' so that you do it in just 1 spot instead of 4

jklugherz · 2025-01-06T18:17:10Z

v03_pipeline/lib/tasks/dataproc/create_dataproc_cluster.py

@@ -166,11 +175,13 @@ def run(self):
                'cluster': get_cluster_config(self.reference_genome, self.run_id),
            },
        )
-        while True:
+        wait_s = 0
+        while wait_s < TIMEOUT_S:


do we also want this waiting behavior in BaseRunJobOnDataprocTask?

I left it out because I couldn't think of a good timeout... there's going to be wide variability depending on who's using it and how much compute they as for. Setting an extreme timout (like 48 hours) might be better though.

* Add service account credentialing (#997) * Add service account credentialing * ruff * feat: Handle parsing empty predicted sex into Unknown (#1000) * Add helper functions for querying `Terra Data Repository` (#998) * Add service account credentialing * ruff * First pass * tests passing * add coverage of bigquery test * change function names * use generators everywhere * bq requirement * resolver * Update sample id name * Build Sex Check Table from TDR Metrics (#999) * refactor: Move feature flags to FeatureFlag enum. (#1002) * refactor: Move feature flags out of environment to their own dataclass * lint: ruff * ruff * bugfix: exclude samples from relationship checking that are not present in the expected loadable samples (#1003) * bugfix: exclude samples from relationship checking that are not present in the expected loadable samples * cleanup * feat: add remap and family loading failures as validation exceptions … (#1005) * feat: add remap and family loading failures as validation exceptions rather than runtime errors * move on * Update write_remapped_and_subsetted_callset_test.py * ruff * feat: Add ability to run tasks dataproc. (#948) * Support gcs dirs in rsync * ws * Add create dataproc cluster task * add dataproc * ruff * requirements * still struggling * Gencode refactor to remove gcs * bump reqs * Run dataproc job * lib * running * merge requirements * Flip'em * Better exception handling * Cleaner approach if less generalizable * write a test * Fix tests * lint * Add test for success * refactor to use a base class... better for adding support for multiple jobs * cleanup * ruff * Fix missing mock * Fix flapping test * pr comments

bpblanken added 12 commits October 24, 2024 09:17

Support gcs dirs in rsync

92055cb

ws

cc97c0a

Add create dataproc cluster task

f28a2eb

add dataproc

3140303

ruff

e68a327

requirements

39a39e5

still struggling

7e1a05e

Gencode refactor to remove gcs

eb034a2

merge

a66870f

bump reqs

dc1c483

merge

7917845

Run dataproc job

c95da39

bpblanken changed the base branch from dev to benb/create_dataproc_cluster_task November 1, 2024 15:57

lib

8254e25

Base automatically changed from benb/create_dataproc_cluster_task to dev November 6, 2024 06:08

bpblanken added 15 commits November 7, 2024 11:01

running

99fe6fb

merge

9b9b54c

merge requirements

5f75ba6

Flip'em

0672f66

Merge branch 'dev' of github.com:broadinstitute/seqr-loading-pipeline…

7804223

…s into benb/run_pipeline_on_dataproc_task

merge

fb97fee

Merge branch 'dev' of github.com:broadinstitute/seqr-loading-pipeline…

0f860c7

…s into benb/run_pipeline_on_dataproc_task

Better exception handling

10f7e3a

Cleaner approach if less generalizable

c5edfe5

write a test

339c370

Fix tests

8260cb0

lint

0eb6ef1

Add test for success

998fe9d

refactor to use a base class... better for adding support for multipl…

1b69751

…e jobs

cleanup

aea4a74

ruff

e830d15

bpblanken changed the title ~~Benb/run pipeline on dataproc task~~ feat: Add ability to run tasks dataproc. Jan 2, 2025

bpblanken added 2 commits January 2, 2025 10:17

Fix missing mock

d427017

Fix flapping test

eafd47f

bpblanken marked this pull request as ready for review January 2, 2025 17:59

bpblanken requested a review from a team as a code owner January 2, 2025 17:59

jklugherz reviewed Jan 6, 2025

View reviewed changes

pr comments

e33ea5b

jklugherz approved these changes Jan 6, 2025

View reviewed changes

bpblanken merged commit 25db277 into dev Jan 6, 2025
3 checks passed

bpblanken deleted the benb/run_pipeline_on_dataproc_task branch January 6, 2025 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add ability to run tasks dataproc. #948

feat: Add ability to run tasks dataproc. #948

bpblanken commented Nov 1, 2024 •

edited

Loading

jklugherz Jan 6, 2025

bpblanken Jan 6, 2025 •

edited

Loading

jklugherz Jan 6, 2025

bpblanken Jan 6, 2025

jklugherz Jan 6, 2025

bpblanken Jan 6, 2025

feat: Add ability to run tasks dataproc. #948

feat: Add ability to run tasks dataproc. #948

Conversation

bpblanken commented Nov 1, 2024 • edited Loading

jklugherz Jan 6, 2025

Choose a reason for hiding this comment

bpblanken Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

jklugherz Jan 6, 2025

Choose a reason for hiding this comment

bpblanken Jan 6, 2025

Choose a reason for hiding this comment

jklugherz Jan 6, 2025

Choose a reason for hiding this comment

bpblanken Jan 6, 2025

Choose a reason for hiding this comment

bpblanken commented Nov 1, 2024 •

edited

Loading

bpblanken Jan 6, 2025 •

edited

Loading