How to best add metadata to partitions in a backfill situation #18610
Unanswered
derHeinzer
asked this question in
Q&A
Replies: 1 comment
-
Hi, I am also interested in a solution for this! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everybody!
I've encountered a challenge in Dagster and would like to share my thoughts with the community.
I'm working on adding metadata to a partitioned asset, and it works smoothly when materializing partition by partition. However, I also want it to work seamlessly in backfill situations, such as when initially materializing an asset that represents a table in a database. One specific metadata information I want the asset to hold is the record count by partition.
While I can yield additional AssetObservations with this metadata by partition, it doesn't show up in the web server when visualizing metadata plots by partition (although it does show up in plots by execution time, which is not my desired outcome).
Alternatively, I've managed to make the plots work by using additional AssetMaterialization events. However, these must be yielded after the original single output object. Otherwise, the last materialization triggered by the single output event would win, and the plots still won't work. The drawback is that the data provenance will break since the latest materialization events will have no tags with data provenance.
To work around this issue, I've implemented a somewhat hacky function that I call in the asset body:
In a single partition scenario, I attach the partition information to the output metadata. However, in a multi-partition scenario, this approach doesn't work since the Output object can't hold metadata by partition. Hence, I manually call _get_output_asset_materializations and attach the right metadata to each yielded event.
I'm wondering if this is an acceptable way to handle it. I would prefer a cleaner solution but haven't found one yet. Perhaps it would be best to add a feature where the Output object can hold partition-specific metadata?
Thank you very much for your thoughts on this one!
Beta Was this translation helpful? Give feedback.
All reactions