Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_csv to Google Cloud Storage ignores mode='a' #51821

Closed
3 tasks done
Courvoisier13 opened this issue Mar 7, 2023 · 6 comments
Closed
3 tasks done

BUG: to_csv to Google Cloud Storage ignores mode='a' #51821

Courvoisier13 opened this issue Mar 7, 2023 · 6 comments
Labels
Bug IO CSV read_csv, to_csv Needs Info Clarification about behavior needed to assess issue

Comments

@Courvoisier13
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame({
    'account-start': ['2017-02-03', '2017-03-03', '2017-01-01'],
    'client': ['Alice Anders', 'Bob Baker', 'Charlie Chaplin'],
    'balance': [-1432.32, 10.43, 30000.00],
    'db-id': [1234, 2424, 251],
    'proxy-id': [525, 1525, 2542],
    'rank': [52, 525, 32],
    ...
})
header = True
to_csv_mode = 'w'
with pd.read_csv(gs_path, chunksize=1) as reader:
    for r in reader:
        r.to_csv(temp_gs_path, index=False, header=header, mode=to_csv_mode)
        header = False
        to_csv_mode = 'a'


### Issue Description

I tried the following:

- Code is running from whithin cloud run and has access to the cloud storage.

But the file created in the gcs bucket is always overwritten and not appended after the first time (to_csv_mode = 'a' is ignored). So in the end I end up with the last chunk in the file.

### Expected Behavior

append to dataframe

### Installed Versions

INSTALLED VERSIONS
------------------
python : 3.9.16.final.0
python-bits : 64
OS : Linux
OS-release : 4.4.0
Version : #1 SMP Sun Jan 10 15:06:54 PST 2016
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.5.3
numpy : 1.24.2
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 58.1.0
pip : 23.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.3
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2022.8.2
gcsfs : 2022.8.2
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : 1.4.39
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : 2022.1
@Courvoisier13 Courvoisier13 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 7, 2023
@twoertwein
Copy link
Member

I believe pandas uses fsspec for google cloud storage: you would need to tell fsspec.open through storage_options={...} how to open the file.

@phofl
Copy link
Member

phofl commented Mar 7, 2023

@Courvoisier13 could you try and report results?

@phofl phofl added IO CSV read_csv, to_csv Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 7, 2023
@Courvoisier13
Copy link
Author

I got an answer from google. They said

As described in the official documentation [1] Google Cloud Storage objects are immutable, which means append is not a functionality that Google Cloud Storage supports. If you write to the same object name, it is always going to replace the existing object.

This means mode = 'a' wouldn't work. It would be nice if pandas can give a warning or an outright error in this case.

@twoertwein
Copy link
Member

It might be good to clarify how mode is being used in the to_csv documentation.

@twoertwein
Copy link
Member

I thought mode wasn't forwarded to fsspec.open but it is. It might be worth opening an issue at fsspec to let them trigger a warning/error. Pandas can not catch all the corner cases of the many protocols that fsspec supports.

@mroeschke
Copy link
Member

Closing due to fsspec/gcsfs#533

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

4 participants