-
Notifications
You must be signed in to change notification settings - Fork 1
NestedSeries Implementation #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Click here to view all benchmarks. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #331 +/- ##
==========================================
+ Coverage 98.11% 98.19% +0.08%
==========================================
Files 18 19 +1
Lines 1748 1829 +81
==========================================
+ Hits 1715 1796 +81
Misses 33 33 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
# Allow boolean masking given a Series of booleans | ||
if isinstance(key, pd.Series) and pd.api.types.is_bool_dtype(key.dtype): | ||
flat_df = self.to_flat() # Use the flat representation | ||
if not key.index.equals(flat_df.index): | ||
raise ValueError("Boolean mask must have the same index as the flattened nested dataframe.") | ||
# Apply the mask to the series, return a new NestedFrame | ||
return NestedFrame(index=self._series.index).add_nested(flat_df[key], name=self._series.name) | ||
# return NestedFrame(index=self._series.index).add_nested(flat_df[key], name=self._series.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code?
# if len(key) == 1 and not isinstance(new_array.dtype.field_dtype(key[0]), NestedDtype): | ||
# # If only one field is requested, return it as a pd.Series | ||
# return self._series[key[0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code or future plan?
if not isinstance(self.dtype, NestedDtype): | ||
return super().__getitem__(key) | ||
|
||
# Return a flatten series for a single field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Return a flatten series for a single field | |
# Return a flattened series for a single field |
# Handle boolean masking | ||
if isinstance(key, pd.Series) and pd.api.types.is_bool_dtype(key.dtype): | ||
return self.nest[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
class NestedSeries(pd.Series): | ||
""" | ||
A Series that can contain nested data structures, such as lists or dictionaries. | ||
This class extends the functionality of a standard pandas Series to handle nested data. | ||
""" | ||
|
||
def __init__(self, *args, **kwargs): | ||
super().__init__(*args, **kwargs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a user does binary operations on a NestedSeries
with a Series
? I suspect that you may want to follow the same procedure when extending a Pandas class that _SeriesFromNest
does, here.
I had been wondering whether _SeriesFromNest
and NestedSeries
could be dovetailed, but on reflection I do think they are serving different purposes: the former tracks a series (field) extracted from a nest, and the latter represents the nest as a first-class object. Do you agree?
I wonder if this means that this PR resolves (or helps resolve) #284.
@@ -585,17 +606,18 @@ def to_flatten_inner(self, field: str) -> pd.Series: | |||
>>> from nested_pandas import NestedFrame | |||
>>> from nested_pandas.datasets import generate_data | |||
>>> nf = generate_data(5, 2, seed=1).rename(columns={"nested": "inner"}) | |||
>>> nf["b"] = "b" # Shorten width of example output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny! 🙏 for the comment. Is that because 'black' formatting interferes with doctests?
Resolves #304. This PR has gotten large enough that it might have been better to split it up into a few smaller steps, sorry about that. I've yet to write documentation, but I think it makes sense to write it as a follow up PR while we iron out any implementation/api behaviors here.
Some design choices I made here, which we can definitely do differently: