Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace random data generation with pseudo random #43118

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

alzimmermsft
Copy link
Member

@alzimmermsft alzimmermsft commented Nov 27, 2024

Description

Changes random data generation logic to generate pseudo random data to reduce git pack file sizes and help future work in sessions records where request and response bodies may be compressed (to help shrink session record sizes).

The following are how .git folder sizes shrunk due to this change:

Blob
image
image

Changefeed
image
image

Cryptography
image
image

NIO
image
image

Datalake
image
image

Shares
image
image

This reduce .git folder sizes from 1258 MB to 688 MB, saving 570 MB each time a full set of session records need to be downloaded.

A follow-up to this change should be an investigation in changing how much data the Cryptography library uses in testing to see if the session record sizes can be shrunk as the request and response bodies are encrypted data streams which won't be able to compress well given encrypted data randomness.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@alzimmermsft alzimmermsft self-assigned this Nov 27, 2024
@github-actions github-actions bot added the Storage Storage Service (Queues, Blobs, Files) label Nov 27, 2024
@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

Copy link
Member

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I just had one small suggestion. Otherwise 🚢

@@ -269,7 +269,20 @@ public static byte[] getRandomByteArray(int size, TestResourceNamer testResource
long seed = UUID.fromString(testResourceNamer.randomUuid()).getMostSignificantBits() & Long.MAX_VALUE;
Random rand = new Random(seed);
byte[] data = new byte[size];
rand.nextBytes(data);
byte[] pseudoRandom = new byte[31];
rand.nextBytes(pseudoRandom);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could have a comment here on why we are using this approach to randomness. Mainly, unless someone is looking at the blame and track it down to this pull request, it is not obvious that it is needed in order to reduce the download size of the test assets and someone could unknowingly change it back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants