Replies: 4 comments 3 replies
-
Relates to #20802 |
Beta Was this translation helpful? Give feedback.
-
hi @ammerzon just wanted to let you know that improving partitions, with a specific eye on improving performance of large (25,000+) partitions, is being actively worked on. In terms of advice, splitting the partition is likely the most viable option, but i do get that that's annoyingly complicated in a lot of cases. I'll reiterate what the linked discussion says that the performance issues are primarily with UI load time, and mostly things like synced/unsynced status and reporting the materialization status of each partition. If you're able to, I'd recommend writing your asset with the large number of partitions first and if you experience performance issues, then consider restructuring the asset. |
Beta Was this translation helpful? Give feedback.
-
Hey @jamiedemaria, could you list the different option that we have currently if we want to configure an asset with more than 25,000 partitions? You've listed the option of reducing/grouping partitions together. This is possible but bring a lot of problems. From my experience, grouping the partitions makes backfill process easy, but complicated everything after that. Let's take a scenario where you have Since the partitions are grouped together, you can't easily configure your runs to consume the next I've also try turning off If you have any other ways of doing this that you could share it would be very helpfull. |
Beta Was this translation helpful? Give feedback.
-
I'd also like to follow updates on this issue. My use case is extracting football player match reports from an API, and landing the raw results as json files. Initially I will manage 500 players, and with say, 200 matches played on average over their careers, this is already 100,000 partitions. And I will likely acquire many more in time. I guess I can sub-partition by player, but it all sounds a bit messy. |
Beta Was this translation helpful? Give feedback.
-
The Dagster documentation advises the following:
Consider this use case where each partition corresponds to an ISBN. Given over 25,000 ISBNs, this would exceed the recommended partition limit.
In exploring potential solutions, I encountered several challenges:
Does anyone have an idea how this can be handled instead?
Beta Was this translation helpful? Give feedback.
All reactions