Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Currently, pandas.IntervalArray
suffer from 3 major limitations:
They are limited to data with the same closedness on both sides.no longer the case apparently- All datapoints are limited to the same closedness in the array. (i.e. the same array can only store closed intervals or only open intervals).
- Intervals do not allow missing values
- In particular one cannot represent unbounded intervals for data types that lack an actual infinity value like
int32
.
- In particular one cannot represent unbounded intervals for data types that lack an actual infinity value like
- Some dtypes are not allowed like
string
As a practical application for (1) that I am very interested in is storing information about the range of valid values for the columns of another DataFrame
.
Feature Description
Given the better integration with pyarrow since 2.0, we can recreate IntervalDtype using pyarrow.struct
:
import pyarrow as pa
def arrow_interval_dtype(subtype):
fields = [
("lower_bound", subtype),
("upper_bound", subtype),
("lower_inclusive", pa.bool_()),
("upper_inclusive", pa.bool_()),
]
return pa.struct(fields)
Contrary to the current IntervalDtype
, this would solve all 3 major problems at once:
- Each element of the resulting
StructArray
can have separate closedness - Pyarrow datatypes all support missing values
- We can in principle use any ordered data type for the subtype.
Alternative Solutions
None.
Additional Context
Additionally, common request is adding extra operations for interval dtypes:
- ENH: Interval type should support intersection, union & overlaps & difference #21998
- ENH: Arithmetic operations on intervals #43629
- API: Implement interval-point joins #21901
- Features which Interval / IntervalIndex should probably have #19480
Additionally, one could imagine having a IntervalUnion
type, that can represent finite unions of intervals, combining the interval type discussed here with pyarrow list-type. This type would naturally arise when performing unions of intervals, such as [0, 2]∪[3, 5]. The nice thing here is that the resulting space is mathematically closed under the standard set operations (union, intersection, complements, difference)