Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQ] Performance Improvement for DownloadBlobTo API #36910

Open
IyerBhuvi opened this issue Jun 8, 2023 · 7 comments
Open

[FEATURE REQ] Performance Improvement for DownloadBlobTo API #36910

IyerBhuvi opened this issue Jun 8, 2023 · 7 comments
Labels
Client This issue points to a problem in the data-plane of the library. Container Registry customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved.
Milestone

Comments

@IyerBhuvi
Copy link
Member

Library name

Azure.Containers.ContainerRegistry

Please describe the feature.

Context
We used the DownloadTo method as shared in the trailing mail, to test the scenario of copying large blobs from a repository in ACR to a container in StorageAccount.

Azure VM with configuration close to that of our Service fabric VM nodes runs the application which does the copy of Large blob from ACR to Storage Account. The VM configuration is as follows:
RAM: 8 GB, CPU core count: 2, Disk Size: 127 GB Standard HDD LR
The performance results are as follows: Time taken for copy of single blob file (Filetype - .VHD) ranges from 1 minute for a 500 MB vhd file to 27 minutes for a 30 GB vhd file.
If we try to copy multiple files parallelly as part of same process, the time taken is around 1 minute for copying 4 files 500MB each and 39 minutes for copying 4 files 30 GB each. CPU utilization of the VM running the process (copy application) is around ~20% for single copy of VHDs for both sizes (500MB, 30 GB) and between 50% to 75% for parallel copy for 500MB and 30GB file respectively.

Ask
The minimum expected load for our service requires us to be able to copy atleast 4 blobs parallelly with blob sizes ranging from 30 GB to 200 GB. Based on the testing we have done the CPU utilization of the VM is very high during the time of process execution. As seen from below snippet, the CPU utilization of VM goes up when the copy application is running.

Given the above perf numbers, we strongly suspect that running the copy via DownloadBlobTo might pose unintended impacts when used in our service (e.g.: VMSS Node restart due to high CPU utilization and this is an impactful operation as it would affect other services running on the same node).
Hence, the ask is to optimize DownloadBlobTo API to improve performance numbers measured by CPU cycles of the application.

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-triage Workflow: This issue needs the team to triage. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files) labels Jun 8, 2023
@jsquire jsquire added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team CXP Attention and removed needs-team-triage Workflow: This issue needs the team to triage. labels Jun 8, 2023
@github-actions
Copy link

github-actions bot commented Jun 8, 2023

Thank you for your feedback. This has been routed to the support team for assistance.

@navba-MSFT navba-MSFT added Service Attention Workflow: This issue is responsible by Azure service team. and removed CXP Attention labels Jun 9, 2023
@navba-MSFT
Copy link
Contributor

Adding Service team to look into this.

@github-actions
Copy link

github-actions bot commented Jun 9, 2023

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

@annelo-msft annelo-msft added Container Registry and removed Storage Storage Service (Queues, Blobs, Files) Service Attention Workflow: This issue is responsible by Azure service team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jun 9, 2023
@github-actions github-actions bot added the needs-team-triage Workflow: This issue needs the team to triage. label Jun 9, 2023
@annelo-msft
Copy link
Member

annelo-msft commented Jun 9, 2023

@IyerBhuvi, for our help in categorizing this issue, can you help us understand the following? It says that you used Azure.Containers.ContainerRegistry to obtain the file from ACR. Did you also use the Azure.Storage.Blobs library to upload the file to your storage instance? If not, can you share the approach you used for that?

Additionally, as discussed offline, we have recommended repeating the test with the ContainerRegistryContentClient (DownloadBlobStreaming)[https://learn.microsoft.com/en-us/dotnet/api/azure.containers.containerregistry.containerregistrycontentclient.downloadblobstreamingasync?view=azure-dotnet-preview] API, with the expectation that that will improve performance. We are interested as well in learning what performance characteristics you see with this approach.

@annelo-msft annelo-msft removed the needs-team-triage Workflow: This issue needs the team to triage. label Jun 9, 2023
@pallavit pallavit added this to the 2023-07 milestone Jun 12, 2023
@IyerBhuvi
Copy link
Member Author

IyerBhuvi commented Jun 13, 2023

@annelo-msft, yes we have used Azure.Storage.Blobs.
For copying the blob to the storage account, we are opening a Stream to write to storage page blob and this stream is passed in the DownloadBlobTo method as the second parameter.

Also we have tried using the DownloadBlobStreaming API[https://learn.microsoft.com/en-us/dotnet/api/azure.containers.containerregistry.containerregistrycontentclient.downloadblobstreamingasync?view=azure-dotnet-preview]. The performance results and the setup on which the test was done are as follows:

Setup Details:
VM Azure VM with configuration close to that of our Service fabric VM nodes runs the application which does the copy of Large blob from ACR to Storage Account. The VM configuration is as follows:
RAM: 8 GB, CPU core count: 2, Disk Size: 127 GB Standard HDD LR.

Performance Results for 500 MB Blob
Parallel copy of 4 500 MB files from ACR to storage account takes 1 minute and 14 seconds with DownloadBlobStreaming API.
The CPU usage of the VM reaches > 90%.

Results for Blob size > 2 GB

For any digest with size > 2gb, the downloadStreaming throws exception “System.OverflowException: 'Value was either too large or too small for an Int32.”. The rootcause for this exception from our analysis is that the type for the ContentLength in ResponseHeaders is Int32. As a result, any value > 2147483647 is faced with the above exception. Please find the stack trace for the same below.

StackTrace" at System.Number.ThrowOverflowOrFormatException(ParsingStatus status, TypeCode type)\r\n at Azure.Core.ResponseHeaders.get_ContentLength()\r\n at Azure.Containers.ContainerRegistry.ContainerRegistryContentClient.CheckContentLength(Response response)\r\n at Azure.Containers.ContainerRegistry.ContainerRegistryContentClient.d__58.MoveNext()\r\n at Azure.Containers.ContainerRegistry.ContainerRegistryContentClient.d__57.MoveNext()\r\n at ACRtoSA.ACRManager.d__0.MoveNext() in C:\Users\testVM\source\repos\ACRtoSAStreaming\ACRtoSAStreaming\Program.cs:line 138" string

@IyerBhuvi
Copy link
Member Author

The SLA for the DownloadBlobTo API we are expecting is something that is comparable to AzCopy' SLA.

@annelo-msft
Copy link
Member

The SLA for the DownloadBlobTo API we are expecting is something that is comparable to AzCopy' SLA.

Hi @IyerBhuvi - I don't think we would be able to achieve performance comparable to AzCopy in the ACR library, because we are required to validate the content digest of the registry blobs in the SDK. This adds some performance overhead.

It is possible that we could find an approach to computing the content digest that is faster than what we are doing today. This is not currently planned work for this semester. We can evaluate the priority based on the cost of the work and balancing with other priorities we have.

Out of curiosity, if you need an SLA comparable to Storage copy, would it be possible for you to use Azure Storage for your blobs instead of ACR?

@annelo-msft annelo-msft modified the milestones: 2023-07, Backlog Jun 30, 2023
@annelo-msft annelo-msft added feature-request This issue requires a new behavior in the product in order be resolved. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. Container Registry customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved.
Projects
Status: No status
Development

No branches or pull requests

5 participants