Skip to content

What dependencies do I need to install on the EMR cluster itself to use Dagster? #17960

Answered by sryza
dagsir[bot] bot asked this question in Q&A
Discussion options

You must be logged in to vote

There are a couple ways to use Dagster with EMR. They're outlined in the Using Dagster with Spark guide: https://docs.dagster.io/integrations/spark.

In the "Asset body submits Spark job" way, EMR doesn't need to be aware of Dagster at all.

In the "Asset accepts and produces DataFrames or RDDs" way, you do need to get Dagster onto the cluster, but the emr_pyspark_step_launcher might be able to help with this.

If you're not using PySpark (or not even using Spark at all), then you'll definitely go the first way. Otherwise, you have a choice.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by sryza
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant