Specify which folders to include in the .zip that is created and uploaded by the databricks_pyspark_step_launcher to Databricks #17948
Replies: 2 comments
-
We have the same issue in our project, a solution would be highly appreciated! |
Beta Was this translation helpful? Give feedback.
0 replies
-
Closing this in favor of the corresponding Github Issue: #18099 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Context
We use the
databricks_pyspark_step_launcher
to execute assets on Databricks.As we also deal with dbt-integrations, and a lot of other code that is not connected to the ingestion jobs running on Databricks, we want to be careful about what we include in the .zip that is generated by the
databricks_pyspark_step_launcher
.Problem
In some instances, the .zip file gets quite big (due to the number of files in the repos), and uploading time to Databricks increases tremendeously. We want to avoid waiting for several minutes just for the code to arrive in Databricks.
Question / Solution
There should be a way of specifying, which folders should be included in the .zip process of the
databricks_pyspark_step_launcher
. I investigated a bit, and it seems that this is the code where the zipping is executed. The referenced functionbuild_pyspark_zip
(here) seems to take a parameterexclude
, where you can pass a list of folders to exclude in the zipping. However, this option is not propagated upwards to thedatabricks_pyspark_step_launcher
.If there is no other way to achieve this, I am happy to create an official issue for that and work on a PR that gives this "exclude option" on the
databricks_pyspark_step_launcher
level.Beta Was this translation helpful? Give feedback.
All reactions