You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New partitioned parquet file should be created locally or in S3
Actual Result
From Rust implementation:
DatasetError: Failed while saving data to data set
EagerPolarsDataset(file_format=parquet, filepath=/tmp/test.parquet,
load_args={}, protocol=file, save_args={'partition_by': ['dt1y']}).
'BytesIO' object cannot be converted to 'PyString'
From Pyarrow:
DatasetError: Failed while saving data to data set
LazyPolarsDataset(filepath=/tmp/test.parquet, load_args={}, protocol=file,
save_args={'pyarrow_options': {'compression': zstd, 'partition_cols': ['dt1y'],
'write_statistics': True}, 'use_pyarrow': True}).
Argument 'filesystem' has incorrect type (expected pyarrow._fs.FileSystem, got
NoneType)
Your Environment
Kedro version used (pip show kedro or kedro -V): 0.19.3
Polars: 1.9.0 and 1.6.0
Python version used (python -V): 3.11
Operating system and version: MacOS M1 using Docker Compose + Docker Desktop
The text was updated successfully, but these errors were encountered:
Hi @alexdavis24 , I've been able to replicate the issue. I'm not super familiar with Polars and/or Pyarrow, but I think your analysis that the issue lies in the saving with BytesIO is correct. It also seems that because in the implementation of the save method, the data is written to a BytesIO buffer and then uses fsspec to write it to the target path, it completely bypasses the PyArrow filesystem and shouldn't require you to pass a filesystem argument. However, if PyArrow is being invoked with a None filesystem somehow, the issue might be with how fsspec or the BytesIO buffer is handled.
I managed to get things working with the following catalog entry:
So removing the explicit filesystem argument and also removing use_pyarrow: True. I don't know if this produces the desired result though. Let me know what you think of this.
Description
filesystem
within catalog (see code below) does not work.Context
Steps to Reproduce
Expected Result
Actual Result
From Rust implementation:
From Pyarrow:
Your Environment
pip show kedro
orkedro -V
): 0.19.3python -V
): 3.11The text was updated successfully, but these errors were encountered: