-
When using dagster with AWS EMR, does the necessary dependencies to run dagster need to be installed on the cluster itself. So would I have to connect to the master node and install dagster on the node in order to run jobs using the cluster? The question was originally asked in Dagster Slack. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
There are a couple ways to use Dagster with EMR. They're outlined in the Using Dagster with Spark guide: https://docs.dagster.io/integrations/spark. In the "Asset body submits Spark job" way, EMR doesn't need to be aware of Dagster at all. In the "Asset accepts and produces DataFrames or RDDs" way, you do need to get Dagster onto the cluster, but the If you're not using PySpark (or not even using Spark at all), then you'll definitely go the first way. Otherwise, you have a choice. |
Beta Was this translation helpful? Give feedback.
There are a couple ways to use Dagster with EMR. They're outlined in the Using Dagster with Spark guide: https://docs.dagster.io/integrations/spark.
In the "Asset body submits Spark job" way, EMR doesn't need to be aware of Dagster at all.
In the "Asset accepts and produces DataFrames or RDDs" way, you do need to get Dagster onto the cluster, but the
emr_pyspark_step_launcher
might be able to help with this.If you're not using PySpark (or not even using Spark at all), then you'll definitely go the first way. Otherwise, you have a choice.