Dagster use case suitability inquiry #4134
-
I am considering implementing Dagster but although I have read the documentation (which is very good) I am still unsure about its suitability for my use case and am looking for the opinions of those more experienced than me before committing to a technology that I do not fully comprehend. Suppose you are building a client-facing app which displays analytics on their usage of a variety of software tools. I was thinking of creating pipelines using Dagster that: It seems to me that this can be described as “a graph of functional computations that produce and consume data assets” and hence Dagster may be the correct tool. I see numerous benefits to using Dagster, namely the data testability, the monitorability, seamless integration with an existing Python stack, etc. Furthermore, it would be extensible if I progressed to more heavyweight data science pipelines. However, I also read that “…Dagster is not a technology for product developers. It is for data scientists, data engineers, analysts, and the infrastructure engineers that support them.” If I wanted users to able to sync their data on demand, does Dagster remain the correct tool? I appreciate that pipeline runs can be triggered via GraphQL requests but is it created with this usage in mind - i.e. running pipeline runs whenever a user wants refresh their data? Presumably there is an overhead to running pipelines? Am I confusing things and would it be more practical to run the (relatively) lightweight tasks listed above as a task triggered by a message queueing service? I appreciate this is not a specific question but any guidance and/or advice on areas I may not have considered would be greatly appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @ClementFinch - based on the work you described, it sounds to me like, at least in this context, you are a data engineer. Dagster is helpful if you want to be able to:
On the other hand, as you brought up, Dagster has some overhead relative to other tools. In particular, it logs multiple records to a database for every step of every run. If you expect your users to fire thousands of queries per second and need those queries to return in a few seconds, Dagster is probably not the right choice. If you expect someone to manually kick off a job via your interface a few times a day, then that's a fairly common way Dagster is used. |
Beta Was this translation helpful? Give feedback.
Hi @ClementFinch - based on the work you described, it sounds to me like, at least in this context, you are a data engineer.
Dagster is helpful if you want to be able to:
On the other hand, as you brought up, Dagster has some overhead relative to other tools. In particular, it logs multiple records to a database f…