Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checksum x-amz-checksum-crc32 seems to be added INSIDE my files #4435

Open
1 task done
jgaucher-cs opened this issue Feb 10, 2025 · 8 comments
Open
1 task done

Checksum x-amz-checksum-crc32 seems to be added INSIDE my files #4435

jgaucher-cs opened this issue Feb 10, 2025 · 8 comments
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue potential-regression Marking this issue as a potential regression to be checked by team member s3 third-party

Comments

@jgaucher-cs
Copy link

jgaucher-cs commented Feb 10, 2025

Describe the bug

When uploading a local file to my s3 storage, some kind of checksum seems to be added directory inside the contents of my file:

326d # added by boto3

# my file contents ...

0 # added by boto3
x-amz-checksum-crc32:6da4RA== # added by boto3

Regression Issue

  • Select this option if this issue appears to be a regression.

It does not happen with boto3==1.35.41

Expected Behavior

The uploaded file contents should not be modified.

Current Behavior

The uploaded file contents are modified.

Reproduction Steps

Create a dummy file locally:

echo toto > /tmp/toto.txt

Install boto3:

pip install boto3==1.36.16

Init s3 client and upload file:

import boto3
s3_session = boto3.session.Session()
client = s3_session.client(
    service_name="s3",
    aws_access_key_id="my-access",
    aws_secret_access_key="my-secret",
    endpoint_url="https://my-url",
    region_name="my-region",
)
client.upload_file("/tmp/toto.txt", "my-bucket", "my-folder/toto.txt")

# Now read uploaded file contents:
print(client.get_object(Bucket="my-bucket", Key="my-folder/toto.txt")["Body"].read())
# returns: b'5\r\ntoto\n\r\n0\r\nx-amz-checksum-crc32:+H0IvQ==\r\n\r\n'

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.36.16

Environment details (OS name and version, etc.)

Linux Ubuntu 22

@jgaucher-cs jgaucher-cs added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Feb 10, 2025
@github-actions github-actions bot added the potential-regression Marking this issue as a potential regression to be checked by team member label Feb 10, 2025
@jgaucher-cs
Copy link
Author

It seems to be resolved if I add this at the top of my code:

import os
os.environ["AWS_REQUEST_CHECKSUM_CALCULATION"] = "when_required"
os.environ["AWS_RESPONSE_CHECKSUM_VALIDATION"] = "when_required"

@khushail khushail added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Feb 10, 2025
@khushail khushail self-assigned this Feb 10, 2025
@khushail
Copy link

Hi @jgaucher-cs , thanks for reaching out. This change was recently announced by Python team, related to Announcement: S3 default integrity change -

In AWS SDK for Python v1.36.0, we released changes to the S3 client that adopts new default integrity protections. For more information on default integrity behavior, please refer to the official [SDK documentation](https://docs.aws.amazon.com/sdkref/latest/guide/feature-dataintegrity.html)

The workaround you suggested has also been mentioned to bypass the default checksum.

Hope that clarifies your questions. Please feel free to reach out if this does not help.

Thanks

@khushail khushail added s3 p2 This is a standard priority issue response-requested Waiting on additional information or feedback. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. potential-regression Marking this issue as a potential regression to be checked by team member labels Feb 10, 2025
@jgaucher-cs
Copy link
Author

Hi @khushail, thank you for your response. The official SDK documentation says:

Amazon S3 independently calculates a checksum on the server side and validates it against the provided value before durably storing the object and its checksum in the object's metadata.

In my case, the checksum is not stored in the object's metadata, it's stored inside the file contents, making it corrupt and unreadable (e.g. if it's a Python script, it cannot be run anymore). Why is that ?

@ZeniT21
Copy link

ZeniT21 commented Feb 11, 2025

I also have this problem. Have any ideas?

@jonathan343
Copy link
Contributor

Hey @jgaucher-cs @ZeniT21,

I was not able to reproduce this issue when making requests to Amazon S3. When making the request to S3 using the example you provided, I receive the following body: b'toto\n'. Are you using a third-party S3 compatible service?

As mentioned above, you can prevent the default checksum calculation behavior using the when_required value for the AWS_REQUEST_CHECKSUM_CALCULATION environment variable or request_checksum_calculation config option as mentioned in the boto3 configuration guide.

@nleconte-csgroup
Copy link

Hey @jonathan343

Are you using a third-party S3 compatible service?

Yes indeed we are using S3 compatible service from other Cloud provider than AWS (Orange Flexible Engine and/or OVH). The issue might be they are not fully compliant.

@ZeniT21
Copy link

ZeniT21 commented Feb 11, 2025

I used an older version and it worked fine.

@khushail khushail added potential-regression Marking this issue as a potential regression to be checked by team member investigating This issue is being investigated and/or work is in progress to resolve the issue. p1 This is a high priority issue and removed response-requested Waiting on additional information or feedback. p2 This is a standard priority issue labels Feb 11, 2025
@khushail
Copy link

Hi @jgaucher-cs , since you are using 3rd party services which might not be compatible and does not support aws chunked requests, there is a workaround as mentioned in Announcement shared earlier, which is what you are using here. So This should be workable as suggested.

@khushail khushail added third-party p2 This is a standard priority issue and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. p1 This is a high priority issue labels Feb 11, 2025
@khushail khushail removed their assignment Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue potential-regression Marking this issue as a potential regression to be checked by team member s3 third-party
Projects
None yet
Development

No branches or pull requests

5 participants