Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OpenReadAsync overprovisions buffer memory #47821

Open
rlrossiter opened this issue Jan 14, 2025 · 3 comments · May be fixed by #47958
Open

[BUG] OpenReadAsync overprovisions buffer memory #47821

rlrossiter opened this issue Jan 14, 2025 · 3 comments · May be fixed by #47958
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)

Comments

@rlrossiter
Copy link
Contributor

Library name and version

Azure.Storage.Blobs 12.23.0

Describe the bug

OpenReadAsync, if not provided a buffer size, defaults to 4MB. OpenReadAsync, as part of it's initialization, pulls the blob properties to fail fast if the blob does not exist. The blob length in these properties is used to initialize the initialLenght (note: misspelled) of the LazyLoadingReadonlyStream.

There is an inefficiency here though. In the case where allowModifications = false and initialLength < Constants.DefaultStreamingDownloadSize, the stream will pre-initialize a buffer of 4MB (the default download size), but will never use the entire buffer. the math for the buffer initialization should be

int maxBufferSize = allowModifications ? Constants.DefaultStreamingDownloadSize : Math.Min(initialLength, Constants.DefaultStreamingDownloadSize);
_bufferSize = bufferSize ?? maxBufferSize;
_buffer = ArrayPool<byte>.Shared.Rent(_bufferSize);

In the case that a user requests a larger buffer (by specifying bufferSize in the read options), the client will honor that and create that buffer. Otherwise the default should allocate the buffer size wisely.

Expected behavior

LazyLoadingReadonlyStream with a blob that does not allow modifications should create the smallest buffer necessary

Actual behavior

LazyLoadingReadonlyStream always creates a 4 MB buffer (if buffer size not specified)

Reproduction Steps

Create 2KB blob
BlobOpenReadOptions options = new BlobOpenReadOptions(allowModifications: false)
var stream = await blobClient.OpenReadAsync(options: options)

stream._buffer is a byte array is 4MB in size.

Environment

No response

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 14, 2025
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

@github-actions github-actions bot added Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files) labels Jan 14, 2025
@tadassukys
Copy link

The issue is especially relevant for scenarios when working with large amounts of blobs. I faced this issue recently while running the code that opened streams of 700+ blobs (all blobs actually are very small in size, less than 1KiB) and the process heap size exploded to 3GiB.
I think the method BlobClient.OperRead() has the same issue as the internal buffer of Azure.Storage.LazyLoadingReadOnlyStream instance by default is initialized to 4MiB too.

@rlrossiter rlrossiter linked a pull request Jan 23, 2025 that will close this issue
@rlrossiter
Copy link
Contributor Author

@tadassukys we not only saw that, but because LazyLoadingReadOnlyStream.Dispose() calls ArrayPool<byte>.Shared.Return() with clearArray: true, our flame charts were showing significant amounts of time spent on ZeroMemory.

I've submitted a PR ( #47958 ) that tries to minimize the buffer allocations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants