Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUERY] Storage blob requests are queued? #48140

Closed
vRune4 opened this issue Feb 6, 2025 · 13 comments
Closed

[QUERY] Storage blob requests are queued? #48140

vRune4 opened this issue Feb 6, 2025 · 13 comments
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)

Comments

@vRune4
Copy link

vRune4 commented Feb 6, 2025

Library name and version

Azure.Storage.Blobs 12.23.0

Query/Question

If I load meta information sequentially like this:

	[HttpGet]
	public ActionResult<IAsyncEnumerable<ASBFileInfo>> GetAsync()
	{
		return Ok(ReadMetaInfoAsync());
	}

	private async IAsyncEnumerable<ASBFileInfo> ReadMetaInfoAsync([EnumeratorCancellation] CancellationToken cancellationToken = default)
	{
		BlobContainerClient containerClient = _blobServiceClient.GetBlobContainerClient("test");
		foreach (string name in BlobIds)
		{
			BlobClient blobClient = containerClient.GetBlobClient(name);
			// the following line is redundant, but necessary for testing purposes. Do not do this in production.
			Azure.Response<BlobProperties> metaInfo = await blobClient.GetPropertiesAsync(null, cancellationToken);
			yield return new ASBFileInfo(name, metaInfo.Value.ContentType, metaInfo.Value.ContentLength);
		}
	}

(BlobIds is a list of about a hundred blobs)

Active Insights shows this:

Image

Each request takes <10ms which I consider to be quite good, but a colleague challenged me to do this in parallel.

	private async IAsyncEnumerable<ASBFileInfo> ReadMetaInfoInParallelAsync([EnumeratorCancellation] CancellationToken cancellationToken = default)
	{
		BlobContainerClient containerClient = _blobServiceClient.GetBlobContainerClient("test");
		List<Task<Response<BlobProperties>>> tasks = new(BlobIds.Length);
		foreach (string name in BlobIds)
		{
			BlobClient blobClient = containerClient.GetBlobClient(name);
			tasks.Add(blobClient.GetPropertiesAsync(null, cancellationToken));
		}
		
		await Task.WhenAll(tasks);
		foreach (Task<Response<BlobProperties>> task in tasks)
		{
			Response<BlobProperties> metaInfo = await task;
			yield return new ASBFileInfo("unknown", metaInfo.Value.ContentType, metaInfo.Value.ContentLength);
		}
	}

After this change I get a worse result. At best it takes as long as the original approach, but can easily take twice as long. Each individual request seemingly takes more than a 100ms:

Image

It looks really strange to me. It does not look like each request is queued up internally, but I suspect that could be an artifact from the telemetry.

The main reason I ask: I see this in our application. Simple requests like GetPropertiesAsync() takes hundreds of milliseconds to complete). Several blobs are effectively fetched in parallel and there is a significant delay calling GetPropertiesAsync().

The blob client is created as a singleton:

builder.Services.AddAzureClients(clientBuilder =>
{
	const string accountName = "mystorage";
	StorageSharedKeyCredential credential = new (accountName, "myaccesskeybecauserbacaddsevenmoreoverhead");
	clientBuilder.AddBlobServiceClient(new Uri($"https://{accountName}.blob.core.windows.net/"), credential);
});

Environment

Azure Container Apps.

resource containerApp 'Microsoft.App/containerApps@2024-08-02-preview' = {
  name: resourceNames.containerApps
  location: location
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: {
      '${userId}': {}
    }
  }
  properties: {
    environmentId: managedEnvId
    configuration: {
      runtime: { dotnet: { autoConfigureDataProtection: true } }
      activeRevisionsMode: 'Single'
      ingress: {
        external: true
        targetPort: 8080
        exposedPort: 0
        transport: 'auto'
        allowInsecure: false
      }
      registries: [
        {
          server: loginServer
          identity: userId
        }
      ]
    }
    template: {
      containers: [
        {
          image: '${loginServer}/${dockerImageName}'
          name: serviceName
          resources: {
            cpu: json('0.25')
            memory: '0.5Gi'
          }
          env: [
            {
              name: 'AppConfig__Name'
              value: appConfigName
            }
            {
              name: 'AppConfig__Prefix'
              value: serviceName
            }
            {
              name: 'AZURE_CLIENT_ID'
              value: clientId
            }
          ]
        }
      ]
      scale: {
        minReplicas: 1
        maxReplicas: 1
      }
    }
  }
}
<TargetFramework>net9.0</TargetFramework>
@vRune4 vRune4 changed the title [QUERY] Storage blob requests queue [QUERY] Storage blob requests are queued? Feb 6, 2025
@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 6, 2025
@jsquire jsquire self-assigned this Feb 6, 2025
@jsquire jsquire added Storage Storage Service (Queues, Blobs, Files) Client This issue points to a problem in the data-plane of the library. labels Feb 6, 2025
@github-actions github-actions bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Feb 6, 2025
@jsquire
Copy link
Member

jsquire commented Feb 6, 2025

Hi @vRune4. Thanks for reaching out and we regret that you're experiencing difficulties. The behavior that you're seeing is unrelated to the Blob Storage package.

The reason that you're seeing this is because you're attempting to return a deferred enumerable (IAsyncEnumerable<ASBFileInfo>>) as the result of an HTTP request. This does not work how you'd like it to. The caller will have only access to the entries have already been materialized - they cannot use your enumerable to get more values. Callers do not have access to your service's internal state - including the container client that you intend to service them. Callers receive only a static set of data that was serialized by your service and then deserialized by the caller.

To return the full set of data, you'd have to first materialize the set - such as by creating a list and then populating it by walking the enumerable. For example:

[HttpGet]
public async Task<ActionResult<IEnumerable<ASBFileInfo>>> GetAsync()
{
    var returnList = new List<ASBFileInfo>();

    await foreach (var info in ReadMetaInfoAsync())
    {
        returnList.Add(info);
    }

    return Ok(returnList);
}

@jsquire jsquire added the issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. label Feb 6, 2025
Copy link

github-actions bot commented Feb 6, 2025

Hi @vRune4. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@vRune4
Copy link
Author

vRune4 commented Feb 6, 2025

/unresolve

The reason that you're seeing this is because you're attempting to return a deferred enumerable

Look at the screenshot I posted. Each request to the storace account starts reasonably soon, then lingers for a long time (> 100ms).

If I instead make sure I fire off my requests sequentially (my first example that use the same async enumerable return type), they finish within less than ten ms.

With the code you posted:

Image

StorageBlobLogs
|project TimeGenerated, AuthenticationType, ServerLatencyMs, DurationMs, ResponseBodySize

shows that from the storage account's point of view, these requests were handled in typically <20ms. So the client puts in hundreds of milliseconds worth of delay for no discernable reason.

Maybe it is just really wonky telemetry. Still does not explain why calling GetProperties() a 100 times sequentially finishes in a second, while trying to run in parallell takes closer to 4 seconds.

(Oh, pulling back to five -- just five! -- blobs show similar delays)

@github-actions github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. labels Feb 6, 2025
@jsquire
Copy link
Member

jsquire commented Feb 6, 2025

Your parallel results all share a connection and have to compete for both network and other resources on your local host. If you're seeing worse performance in parallel, it indicates that you're using more concurrency than is helpful for your host. Because the telemetry that you're looking at is emitted by the client - you'll see the client's view of the world. If a task is in a paused state waiting on I/O and the thread pool takes time to schedule the async completion because there are no resources, that will still count towards the time the client sees for the request. In other words, that's your end-to-end execution time on the client and does not accurately reflect the duration of the actual service call.

This is not related to the Azure SDK package in use - it does not internally do any queueing of requests. Each operation you call is immediately executed. The client has no insight nor influence over how resources are shared/allocated - it just requests them based on the patterns in your code.

@jsquire jsquire added the issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. label Feb 6, 2025
@github-actions github-actions bot removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Feb 6, 2025
Copy link

github-actions bot commented Feb 6, 2025

Hi @vRune4. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@vRune4
Copy link
Author

vRune4 commented Feb 6, 2025

I did another run. This time I just feed the task results into a List...

Image

The storage account log looks like this:

TimeGenerated AuthenticationType ServerLatencyMs DurationMs
2025-02-06T18:49:58.3306687Z AccountKey 3 3
2025-02-06T18:49:58.3288505Z AccountKey 2 2
2025-02-06T18:49:58.3287247Z AccountKey 2 2
2025-02-06T18:49:58.320479Z AccountKey 2 2
2025-02-06T18:49:58.3204232Z AccountKey 2 2
2025-02-06T18:49:58.3171838Z AccountKey 2 2
2025-02-06T18:49:58.3160938Z AccountKey 2 2
2025-02-06T18:49:58.1352835Z AccountKey 6 6
2025-02-06T18:49:58.1239915Z AccountKey 2 2
2025-02-06T18:49:58.1231167Z AccountKey 2 2
2025-02-06T18:49:58.1208995Z AccountKey 3 3
2025-02-06T18:49:58.1208382Z AccountKey 3 3
2025-02-06T18:49:58.1201469Z AccountKey 2 2
2025-02-06T18:49:58.11993Z AccountKey 2 2
2025-02-06T18:49:58.0241502Z AccountKey 2 2
2025-02-06T18:49:58.0239236Z AccountKey 6 6
2025-02-06T18:49:58.0236284Z AccountKey 2 2
2025-02-06T18:49:58.0213346Z AccountKey 2 2
2025-02-06T18:49:58.019889Z AccountKey 2 2
2025-02-06T18:49:58.0196831Z AccountKey 2 2
2025-02-06T18:49:58.0194631Z AccountKey 4 4
2025-02-06T18:49:58.017095Z AccountKey 2 2
2025-02-06T18:49:58.0160945Z AccountKey 2 2
2025-02-06T18:49:58.0160353Z AccountKey 2 2
2025-02-06T18:49:58.0152778Z AccountKey 2 2
2025-02-06T18:49:58.0151398Z AccountKey 2 2
2025-02-06T18:49:57.9206533Z AccountKey 5 5
2025-02-06T18:49:57.9178986Z AccountKey 2 2
2025-02-06T18:49:57.9168866Z AccountKey 2 2
2025-02-06T18:49:57.9162154Z AccountKey 2 2
2025-02-06T18:49:57.8236158Z AccountKey 6 6
2025-02-06T18:49:57.734319Z AccountKey 6 6
2025-02-06T18:49:57.7270347Z AccountKey 3 3
2025-02-06T18:49:57.7248355Z AccountKey 2 2
2025-02-06T18:49:57.7237913Z AccountKey 2 2
2025-02-06T18:49:57.7236685Z AccountKey 3 3
2025-02-06T18:49:57.7235147Z AccountKey 3 3
2025-02-06T18:49:57.7231488Z AccountKey 3 3
2025-02-06T18:49:57.7220099Z AccountKey 2 2
2025-02-06T18:49:57.7215176Z AccountKey 2 2
2025-02-06T18:49:57.7201791Z AccountKey 2 2
2025-02-06T18:49:57.7201269Z AccountKey 2 2
2025-02-06T18:49:57.7195208Z AccountKey 2 2
2025-02-06T18:49:57.6264109Z AccountKey 6 6
2025-02-06T18:49:57.6246679Z AccountKey 2 2
2025-02-06T18:49:57.6231981Z AccountKey 8 8
2025-02-06T18:49:57.622522Z AccountKey 5 5
2025-02-06T18:49:57.6213551Z AccountKey 2 2
2025-02-06T18:49:57.6211493Z AccountKey 2 2
2025-02-06T18:49:57.6210997Z AccountKey 2 2
2025-02-06T18:49:57.6207706Z AccountKey 2 2
2025-02-06T18:49:57.6201176Z AccountKey 2 2
2025-02-06T18:49:57.6200961Z AccountKey 2 2
2025-02-06T18:49:57.617949Z AccountKey 2 2
2025-02-06T18:49:57.5274749Z AccountKey 2 2
2025-02-06T18:49:57.527043Z AccountKey 2 2
2025-02-06T18:49:57.5263021Z AccountKey 5 5
2025-02-06T18:49:57.5253642Z AccountKey 2 2
2025-02-06T18:49:57.524453Z AccountKey 3 3
2025-02-06T18:49:57.5243719Z AccountKey 2 2
2025-02-06T18:49:57.5243318Z AccountKey 2 2
2025-02-06T18:49:57.5230859Z AccountKey 2 2
2025-02-06T18:49:57.4361858Z AccountKey 2 2
2025-02-06T18:49:57.434822Z AccountKey 2 2
2025-02-06T18:49:57.432849Z AccountKey 2 2
2025-02-06T18:49:57.4327652Z AccountKey 2 2
2025-02-06T18:49:57.4326445Z AccountKey 6 6
2025-02-06T18:49:57.4323058Z AccountKey 2 2
2025-02-06T18:49:57.4315639Z AccountKey 2 2
2025-02-06T18:49:57.4239163Z AccountKey 3 3
2025-02-06T18:49:57.4221253Z AccountKey 6 6
2025-02-06T18:49:57.420398Z AccountKey 2 2
2025-02-06T18:49:57.4203035Z AccountKey 2 2
2025-02-06T18:49:57.4189008Z AccountKey 2 2
2025-02-06T18:49:57.4185912Z AccountKey 2 2
2025-02-06T18:49:57.4185804Z AccountKey 2 2
2025-02-06T18:49:57.329473Z AccountKey 2 2
2025-02-06T18:49:57.3293389Z AccountKey 2 2
2025-02-06T18:49:57.3279877Z AccountKey 3 3
2025-02-06T18:49:57.3268878Z AccountKey 2 2
2025-02-06T18:49:57.3254466Z AccountKey 2 2
2025-02-06T18:49:57.2297778Z AccountKey 2 2
2025-02-06T18:49:57.2293651Z AccountKey 5 5
2025-02-06T18:49:57.2279183Z AccountKey 5 5
2025-02-06T18:49:57.2268725Z AccountKey 2 2
2025-02-06T18:49:57.2236049Z AccountKey 2 2
2025-02-06T18:49:57.2232431Z AccountKey 2 2
2025-02-06T18:49:57.2226432Z AccountKey 3 3
2025-02-06T18:49:57.2219245Z AccountKey 2 2
2025-02-06T18:49:57.2181839Z AccountKey 2 2
2025-02-06T18:49:57.2180964Z AccountKey 2 2
2025-02-06T18:49:57.217248Z AccountKey 3 3
2025-02-06T18:49:57.1351125Z AccountKey 7 7
2025-02-06T18:49:57.1283198Z AccountKey 2 2
2025-02-06T18:49:57.1253662Z AccountKey 6 6
2025-02-06T18:49:57.1233315Z AccountKey 3 3
2025-02-06T18:49:57.1230897Z AccountKey 2 2
2025-02-06T18:49:57.0272647Z AccountKey 7 7
2025-02-06T18:49:57.0182347Z AccountKey 2 2
2025-02-06T18:49:56.9277547Z AccountKey 7 7

@vRune4
Copy link
Author

vRune4 commented Feb 6, 2025

Your parallel results all share a connection and have to compete for both network and other resources on your local host.

The operations in question are extremely low weight. The payload fits within a network packet or two.

Look again. Test this yourself.

I can reduce the test to five blobs. That means five concurrent tasks. It shows the same pattern.

@vRune4
Copy link
Author

vRune4 commented Feb 6, 2025

FIVE concurrent (trying to...) requests (rather than 100):

Image

TimeGenerated AuthenticationType ServerLatencyMs DurationMs
2025-02-06T19:10:38.5859902Z AccountKey 3 3
2025-02-06T19:10:38.5821146Z AccountKey 3 3
2025-02-06T19:10:38.5257318Z AccountKey 6 6
2025-02-06T19:10:38.5257078Z AccountKey 7 7
2025-02-06T19:10:38.5129914Z AccountKey 7 7

It takes almost as much time as firing off 100 (one hundred!) sequential requests:

Image

An ocean of difference. It isn't in my code. It isn't in the storage account.

@vRune4
Copy link
Author

vRune4 commented Feb 6, 2025

/unresolve

@github-actions github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. labels Feb 6, 2025
@jsquire
Copy link
Member

jsquire commented Feb 6, 2025

@vRune4 : I'll pass this over to the owners of the package for further investigation, as you'd like.

@jsquire jsquire removed their assignment Feb 6, 2025
@jsquire jsquire added the Service Attention Workflow: This issue is responsible by Azure service team. label Feb 6, 2025
Copy link

github-actions bot commented Feb 6, 2025

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

@seanmcc-msft
Copy link
Member

We concur with everything @jsquire, the clients in the Storage package function exactly the same as the clients in the other SDKs in this repo.

@vRune4
Copy link
Author

vRune4 commented Feb 7, 2025

/unresolve

We concur with everything @jsquire, the clients in the Storage package function exactly the same as the clients in the other SDKs in this repo.

So no clients can handle concurrent requests to the storage account? Its performance is abysmal, unless my code carefully sequences all requests.

@seanmcc-msft you really ought to read my comments. Just to be clear: "deferred enumerable" is irrelevant. I can put everything into a List instead and not defer anything. It does not change what I observe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

3 participants