Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect error message raised for Polars.DataFrame.rolling: error is about index column not period #20081

Open
2 tasks done
stucash opened this issue Nov 30, 2024 · 0 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@stucash
Copy link

stucash commented Nov 30, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

dates = [
    "2020-01-01 13:45:48",
    "2020-01-01 16:42:13",
    "2020-01-01 16:45:09",
    "2020-01-02 18:12:48",
    "2020-01-03 19:45:32",
    "2020-01-08 23:16:43",
]
df = pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]})
df.rolling(index_column='dt',period='3d').agg(pl.mean('a').alias('mean_a'))

# The error disappeared with:
df2 = df.with_columns(pl.col('dt').str.to_datetime())
df2.rolling(index_column='dt',period='3d').agg(pl.mean('a').alias('mean_a'))

Log output

---------------------------------------------------------------------------
InvalidOperationError                     Traceback (most recent call last)
Cell In[12], line 1
----> 1 df.rolling(index_column='dt', period='3d').agg(pl.mean('a').alias('bb'))

File $PROJECT_HOME/lib/python3.12/site-packages/polars/dataframe/group_by.py:846, in RollingGroupBy.agg(self, *aggs, **named_aggs)
    818 def agg(
    819     self,
    820     *aggs: IntoExpr | Iterable[IntoExpr],
    821     **named_aggs: IntoExpr,
    822 ) -> DataFrame:
    823     """
    824     Compute aggregations for each group of a group by operation.
    825 
   (...)
    834         The resulting columns will be renamed to the keyword used.
    835     """
    836     return (
    837         self.df.lazy()
    838         .rolling(
    839             index_column=self.time_column,
    840             period=self.period,
    841             offset=self.offset,
    842             closed=self.closed,
    843             group_by=self.group_by,
    844         )
    845         .agg(*aggs, **named_aggs)
--> 846         .collect(no_optimization=True)
    847     )

File $PROJECT_HOME/lib/python3.12/site-packages/polars/lazyframe/frame.py:2029, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, collapse_joins, no_optimization, streaming, engine, background, _eager, **_kwargs)
   2027 # Only for testing purposes
   2028 callback = _kwargs.get("post_opt_callback", callback)
-> 2029 return wrap_df(ldf.collect(callback))

InvalidOperationError: unsupported data type: str for `period`, expected UInt64, UInt32, Int64, Int32, Datetime, Date, Duration, or Time

Issue description

The error was raised for period however the actual error was due to the index_column dt being str. According to the error message, the index_column was supposed to be one of the named types.

Expected behavior

Error message should be raised for dt (i.e., the index column but it seems that the index column wasn't required to be of datetime object, so I guess it is when the index column should be a datetime object, the error is raised for it not being a datetime), not period.

Installed versions

--------Version info---------
Polars:              1.16.0
Index type:          UInt32
Platform:            Linux-6.11.10-1-liquorix-amd64-x86_64-with-glibc2.36
Python:              3.12.0 (main, Oct  5 2024, 16:16:05) [GCC 12.2.0]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.10.0
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                1.26.4
openpyxl             <not installed>
pandas               2.2.3
pyarrow              18.1.0
pydantic             2.10.2
pyiceberg            <not installed>
sqlalchemy           2.0.36
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@stucash stucash added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

1 participant