Replies: 3 comments 6 replies
-
This looks great to me. Preserves existing behaviour while allowing a way to more elegantly handle partitioned inputs. I hadn't considered "Currently, you would need to check if the input type is a dict, but even that isn't foolproof, because what if the value of an individual partition is a dict?"! I would be interested to see how this is documented to help users avoid this pitfall as it's subtle that adding the type changes the iomanager behaviour. |
Beta Was this translation helpful? Give feedback.
-
Some questions: (1) Is this a mistake? Why does
(2) You write:
Can't you currently call (3) Is the idea that If that is the case, IMO that's confusing. I would expect an abstraction with this generic a name to be usable with any IO manager. Given that we already have the available information in the framework to determine whether a loaded value represents multiple partitions or a single partition, why not have the framework do the wrapping in In this case
Advantages:
The other half of this is |
Beta Was this translation helpful? Give feedback.
-
I strongly favor the "framework-does-the-wrapping" approach I outlined above-- any thoughts on that? |
Beta Was this translation helpful? Give feedback.
-
Sometimes the input of an asset computation is fed by multiple partitions of an upstream asset. For example:
This discussion is about what Python types should be passed between the IO manager and the asset input in this case. I.e. it concerns what should go in the question-marks below:
Status quo
DataFrame
, which can includes values from all the partitions.UPathIOManager
returns a dictionary that maps each input partition key to the input value for that partition key.The reason these are different is that the DB IO managers load all the partitions using a single SQL query, but the
UPathIOManager
loads each partition independently from a separate file.While it's a little weird, I believe it's fine that they handle these in different ways. Different IO managers generally operate on different types.
Problem with the status quo that led me to create this Discussion
What input type should your asset expect if its inputs are loaded using the
UPathIOManager
, and it sometimes receives multiple partitions and other times receives a single partition?Currently, you would need to check if the input type is a dict, but even that isn't foolproof, because what if the value of an individual partition is a dict?
I have encountered this in two situations:
Proposal
Introduce a
PartitionedInput
type that extendsdict
. Then here's how the IO manager and asset would get implemented:Beta Was this translation helpful? Give feedback.
All reactions