You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using int_range or int_ranges in streaming mode (LazyFrame.sink_*) causes a polars.exceptions.InvalidOperationError. That makes some operations (e.g. prefix explodes) impossible within streaming mode.
Example:
importpolarsaspl# imagine a larger-than-ram datasetlf=pl.scan_parquet("hf://datasets/allenai/tulu-3-sft-mixture/data/*.parquet").head(100)
# get indiceslf=lf.with_columns(
#indices=pl.lit([0, 1, 2, 3]) # worksindices=pl.int_ranges(0, pl.col("messages").list.len()) # doesn't work
).explode("indices")
# prefix explodelf=lf.select(
messages=pl.col("messages").list.slice(0, pl.col("indices"))
)
print(lf.explain(streaming=True))
# fails herelf.sink_parquet("output.parquet")
# but this works#lf.collect().write_parquet("output.parquet")
The text was updated successfully, but these errors were encountered:
Description
Using
int_range
orint_ranges
in streaming mode (LazyFrame.sink_*
) causes apolars.exceptions.InvalidOperationError
. That makes some operations (e.g. prefix explodes) impossible within streaming mode.Example:
The text was updated successfully, but these errors were encountered: