New Feature: Add a TidyData Class #5335
Replies: 17 comments
-
Any reason to not support this explicitly via For |
Beta Was this translation helpful? Give feedback.
-
@NowanIlfideme That sounds like a good approach to me. Want to do a PR? |
Beta Was this translation helpful? Give feedback.
-
Yeah, feel free to open a PR, basing it on @aseyboldt's previous approach or not. ArviZ and xarray evolved a lot since Adrian's first thought about this, so there may be easier approches now. This issue is just a guide to express the need and explicit the goal |
Beta Was this translation helpful? Give feedback.
-
@twiecki @AlexAndorra Does this mean that the |
Beta Was this translation helpful? Give feedback.
-
Hi @kc611 ! Mmmh, I think it's more about extending the current implementation to work with xarray datasets (and thus Pandas dataframe), as shown in the example above. That way, you get all the dims, coords and associated indexes defined at the same time and place, and conveniently so. |
Beta Was this translation helpful? Give feedback.
-
Alright I'll see if I can come up with something. Should this class also support implicit conversion for |
Beta Was this translation helpful? Give feedback.
-
Awesome, looking forward to it @kc611 !
I think inputs could be both pd.Dataframe and xr.Dataset, but the conversion should not be made by PyMC, it should be made by the user |
Beta Was this translation helpful? Give feedback.
-
I don't think anyone would input xarrays, we only need DataFrames. |
Beta Was this translation helpful? Give feedback.
-
@twiecki I have a related draft running(implemented using I think it'll be a better idea if you have a look at it (the draft PR) anyway in it's current state. |
Beta Was this translation helpful? Give feedback.
-
Disagree completely - I would love to input xarray datasets, in fact for several of my latest models I've had to manually add the values to the model. Dataframes are good too, but 2d Dataframes are easily converted to xarray, while Multi-Index ones can be trickier (the implementers just needs to make sure that they don't auto-expand the MultiIndex during conversion, ie need to choose the proper converter). Finally, with arviz integrating with xarray more and more, I think pymc3 doing so would be great as well. |
Beta Was this translation helpful? Give feedback.
-
I am trying to understand the difference between the proposed TidyData and the current Data. Is there a reason why their functionalities need to be in separate objects? Is TidyData something that should not be changed after model specification? Is TidyData a "2D" version of Data? |
Beta Was this translation helpful? Give feedback.
-
For now it's expected to do indexing for string data. Maybe it's functionality can be extended to doing things like cleaning/tidying up data ( as the name suggests ) like filling in missing values automatically (using nearest neighbours) or one-hot encoding for categorical data. I don't know the extent of it's use in PyMC models tho.
No, I think it'll be a good idea to add that functionality too. That said I think it'll be a better idea to properly discuss the scope and use cases of this new class ( and if a new class is needed at all ) before I continue adding random stuff in my PR :-p |
Beta Was this translation helpful? Give feedback.
-
Started looking a bit closer and I'm a bit confused about the API. Currently, we can specify dims with: coords = {"date": df_data.index, "city": df_data.columns}
with pm.Model(coords=coords) as model:
city_offset = pm.Normal("city_offset", mu=0.0, sd=3.0, dims="city") I think Anyone have a full grasp on this? CC @aseyboldt |
Beta Was this translation helpful? Give feedback.
-
It can simply make use of
|
Beta Was this translation helpful? Give feedback.
-
This has been dormant for quite a while. Is there still interest in pursuing this, or can it be closed? |
Beta Was this translation helpful? Give feedback.
-
I'm interested as a user. Usually I need to wrap the model class into something I dynamically construct, using my own encoding scheme. Having this built into PyMC would make this much more standardized. |
Beta Was this translation helpful? Give feedback.
-
@NowanIlfideme Are you interested in giving this a shot? We'd help of course. |
Beta Was this translation helpful? Give feedback.
-
This issue is up for grabs, with the goal of developing a new feature called
pm.TidyData
. The goal is to deal with automatically translating strings from a dataframe into integer arrays for indexing in models, and then make sure we still get the right labels in the output, for plots and diagnostics.@aseyboldt implemented a first version and an example, but the implementation is not mature enough, so we decided to not include it in #3551.
This would be a very useful new feature though, which is why we created this issue -- to remind ourselves of it and in case someone wants to give it a try 😉
Beta Was this translation helpful? Give feedback.
All reactions