NestedSeries Implementation #331

dougbrn · 2025-08-18T19:14:38Z

Resolves #304. This PR has gotten large enough that it might have been better to split it up into a few smaller steps, sorry about that. I've yet to write documentation, but I think it makes sense to write it as a follow up PR while we iron out any implementation/api behaviors here.

Some design choices I made here, which we can definitely do differently:

NestedSeries takes on a set of the nest accessor functions for direct use
NestedSeries works with non-nested dtypes just as a normal pandas series, but nested specific properties/methods are tagged with a decorator which will throw an exception for attempted use with a non-nested dtype
When returning a non-nested series, still try to return a native pandas series. Don't use NestedSeries as an everywhere replacement for pandas series when it's not needed.
For masking, return result as a NestedSeries always, instead of sometimes as a NestedFrame

github-actions · 2025-08-18T19:21:45Z

Before [`919fe82`]	After [`47b8b7c`]	Ratio	Benchmark (Parameter)
1.24±0.01ms	1.35±0ms	1.09	benchmarks.NestedFrameReduce.time_run
10.9±0.1ms	11.1±0.2ms	1.02	benchmarks.NestedFrameQuery.time_run
11.6±0.4ms	11.7±0.3ms	1.01	benchmarks.NestedFrameAddNested.time_run
177M	179M	1.01	benchmarks.ReadFewColumnsHTTPS.peakmem_run
136M	136M	1	benchmarks.CountNestedBy.peakmem_run
102M	102M	1	benchmarks.NestedFrameAddNested.peakmem_run
107M	107M	1	benchmarks.NestedFrameQuery.peakmem_run
106M	106M	1	benchmarks.NestedFrameReduce.peakmem_run
271M	270M	1	benchmarks.ReassignHalfOfNestedSeries.peakmem_run
250M	247M	0.99	benchmarks.AssignSingleDfToNestedSeries.peakmem_run

Click here to view all benchmarks.

codecov · 2025-08-18T21:14:15Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.19%. Comparing base (46acb8e) to head (7e1233c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #331      +/-   ##
==========================================
+ Coverage   98.11%   98.19%   +0.08%     
==========================================
  Files          18       19       +1     
  Lines        1748     1829      +81     
==========================================
+ Hits         1715     1796      +81     
  Misses         33       33

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gitosaurus · 2025-08-20T18:39:03Z

src/nested_pandas/series/accessor.py

        # Allow boolean masking given a Series of booleans
        if isinstance(key, pd.Series) and pd.api.types.is_bool_dtype(key.dtype):
            flat_df = self.to_flat()  # Use the flat representation
            if not key.index.equals(flat_df.index):
                raise ValueError("Boolean mask must have the same index as the flattened nested dataframe.")
            # Apply the mask to the series, return a new NestedFrame
-            return NestedFrame(index=self._series.index).add_nested(flat_df[key], name=self._series.name)
+            # return NestedFrame(index=self._series.index).add_nested(flat_df[key], name=self._series.name)


gitosaurus · 2025-08-20T18:39:41Z

src/nested_pandas/series/accessor.py

+            # if len(key) == 1 and not isinstance(new_array.dtype.field_dtype(key[0]), NestedDtype):
+            #    # If only one field is requested, return it as a pd.Series
+            #    return self._series[key[0]]


Dead code or future plan?

gitosaurus · 2025-08-20T18:43:06Z

src/nested_pandas/series/nestedseries.py

+        if not isinstance(self.dtype, NestedDtype):
+            return super().__getitem__(key)
+
+        # Return a flatten series for a single field


Suggested change

# Return a flatten series for a single field

# Return a flattened series for a single field

gitosaurus · 2025-08-20T18:43:37Z

src/nested_pandas/series/nestedseries.py

+        # Handle boolean masking
+        if isinstance(key, pd.Series) and pd.api.types.is_bool_dtype(key.dtype):
+            return self.nest[key]


gitosaurus · 2025-08-20T23:32:11Z

src/nested_pandas/series/nestedseries.py

+class NestedSeries(pd.Series):
+    """
+    A Series that can contain nested data structures, such as lists or dictionaries.
+    This class extends the functionality of a standard pandas Series to handle nested data.
+    """
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+


What happens if a user does binary operations on a NestedSeries with a Series? I suspect that you may want to follow the same procedure when extending a Pandas class that _SeriesFromNest does, here.

I had been wondering whether _SeriesFromNest and NestedSeries could be dovetailed, but on reflection I do think they are serving different purposes: the former tracks a series (field) extracted from a nest, and the latter represents the nest as a first-class object. Do you agree?

I wonder if this means that this PR resolves (or helps resolve) #284.

gitosaurus · 2025-08-20T23:35:18Z

src/nested_pandas/series/accessor.py

@@ -585,17 +606,18 @@ def to_flatten_inner(self, field: str) -> pd.Series:
        >>> from nested_pandas import NestedFrame
        >>> from nested_pandas.datasets import generate_data
        >>> nf = generate_data(5, 2, seed=1).rename(columns={"nested": "inner"})
+        >>> nf["b"] = "b"  # Shorten width of example output


Funny! 🙏 for the comment. Is that because 'black' formatting interferes with doctests?

dougbrn added 2 commits August 18, 2025 11:16

barebones nestedseries

3ae532b

preserve some pd.Series outputs; fix broken tests

a7b5f2f

dougbrn added 2 commits August 18, 2025 14:12

add a few nestedseries methods; black run

a16c2ac

black run

26e9e12

dougbrn added 14 commits August 18, 2025 14:19

ci fixes

8376d91

ci fixes

6501dc0

initial setitem and getitem

d855cf8

setitem use nest accessor directly

fbbb183

boolean masking enabled

2497664

revert to super setitem

a4be8d7

extra space

d1c354c

nested_only decorator, a few tests

fba5cab

representation tests

cff8d0b

more tests

1fde41d

more tests; fix issue with accessor masking

fb3dda6

stay packed for nested[['sub_column']]

56f8ae4

setitem nest accessor inclusion; weird formatting fix

57b926f

remove redundant setitem logic

1a6c807

dougbrn changed the title ~~[WIP] NestedSeries Implementation~~ NestedSeries Implementation Aug 20, 2025

dougbrn marked this pull request as ready for review August 20, 2025 17:11

remove trace

7e1233c

dougbrn requested review from gitosaurus and hombit August 20, 2025 17:22

gitosaurus reviewed Aug 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NestedSeries Implementation #331

NestedSeries Implementation #331

Uh oh!

dougbrn commented Aug 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 18, 2025 •

edited

Loading

Uh oh!

codecov bot commented Aug 18, 2025 •

edited

Loading

Uh oh!

gitosaurus Aug 20, 2025

Uh oh!

gitosaurus Aug 20, 2025

Uh oh!

gitosaurus Aug 20, 2025

Uh oh!

gitosaurus Aug 20, 2025

Uh oh!

gitosaurus Aug 20, 2025

Uh oh!

gitosaurus Aug 20, 2025

Uh oh!

Uh oh!

	# Return a flatten series for a single field
	# Return a flattened series for a single field

NestedSeries Implementation #331

Are you sure you want to change the base?

NestedSeries Implementation #331

Uh oh!

Conversation

dougbrn commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gitosaurus Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

gitosaurus Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

gitosaurus Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

gitosaurus Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

gitosaurus Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

gitosaurus Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dougbrn commented Aug 18, 2025 •

edited

Loading

github-actions bot commented Aug 18, 2025 •

edited

Loading

codecov bot commented Aug 18, 2025 •

edited

Loading