-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQ] Performance Improvement for DownloadBlobTo API #36910
Comments
Thank you for your feedback. This has been routed to the support team for assistance. |
Adding Service team to look into this. |
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage. |
@IyerBhuvi, for our help in categorizing this issue, can you help us understand the following? It says that you used Azure.Containers.ContainerRegistry to obtain the file from ACR. Did you also use the Azure.Storage.Blobs library to upload the file to your storage instance? If not, can you share the approach you used for that? Additionally, as discussed offline, we have recommended repeating the test with the ContainerRegistryContentClient (DownloadBlobStreaming)[https://learn.microsoft.com/en-us/dotnet/api/azure.containers.containerregistry.containerregistrycontentclient.downloadblobstreamingasync?view=azure-dotnet-preview] API, with the expectation that that will improve performance. We are interested as well in learning what performance characteristics you see with this approach. |
@annelo-msft, yes we have used Azure.Storage.Blobs. Also we have tried using the DownloadBlobStreaming API[https://learn.microsoft.com/en-us/dotnet/api/azure.containers.containerregistry.containerregistrycontentclient.downloadblobstreamingasync?view=azure-dotnet-preview]. The performance results and the setup on which the test was done are as follows: Setup Details: Performance Results for 500 MB Blob Results for Blob size > 2 GB For any digest with size > 2gb, the downloadStreaming throws exception “System.OverflowException: 'Value was either too large or too small for an Int32.”. The rootcause for this exception from our analysis is that the type for the ContentLength in ResponseHeaders is Int32. As a result, any value > 2147483647 is faced with the above exception. Please find the stack trace for the same below. StackTrace" at System.Number.ThrowOverflowOrFormatException(ParsingStatus status, TypeCode type)\r\n at Azure.Core.ResponseHeaders.get_ContentLength()\r\n at Azure.Containers.ContainerRegistry.ContainerRegistryContentClient.CheckContentLength(Response response)\r\n at Azure.Containers.ContainerRegistry.ContainerRegistryContentClient.d__58.MoveNext()\r\n at Azure.Containers.ContainerRegistry.ContainerRegistryContentClient.d__57.MoveNext()\r\n at ACRtoSA.ACRManager.d__0.MoveNext() in C:\Users\testVM\source\repos\ACRtoSAStreaming\ACRtoSAStreaming\Program.cs:line 138" string |
The SLA for the DownloadBlobTo API we are expecting is something that is comparable to AzCopy' SLA. |
Hi @IyerBhuvi - I don't think we would be able to achieve performance comparable to AzCopy in the ACR library, because we are required to validate the content digest of the registry blobs in the SDK. This adds some performance overhead. It is possible that we could find an approach to computing the content digest that is faster than what we are doing today. This is not currently planned work for this semester. We can evaluate the priority based on the cost of the work and balancing with other priorities we have. Out of curiosity, if you need an SLA comparable to Storage copy, would it be possible for you to use Azure Storage for your blobs instead of ACR? |
Library name
Azure.Containers.ContainerRegistry
Please describe the feature.
Context
We used the DownloadTo method as shared in the trailing mail, to test the scenario of copying large blobs from a repository in ACR to a container in StorageAccount.
Azure VM with configuration close to that of our Service fabric VM nodes runs the application which does the copy of Large blob from ACR to Storage Account. The VM configuration is as follows:
RAM: 8 GB, CPU core count: 2, Disk Size: 127 GB Standard HDD LR
The performance results are as follows: Time taken for copy of single blob file (Filetype - .VHD) ranges from 1 minute for a 500 MB vhd file to 27 minutes for a 30 GB vhd file.
If we try to copy multiple files parallelly as part of same process, the time taken is around 1 minute for copying 4 files 500MB each and 39 minutes for copying 4 files 30 GB each. CPU utilization of the VM running the process (copy application) is around ~20% for single copy of VHDs for both sizes (500MB, 30 GB) and between 50% to 75% for parallel copy for 500MB and 30GB file respectively.
Ask
The minimum expected load for our service requires us to be able to copy atleast 4 blobs parallelly with blob sizes ranging from 30 GB to 200 GB. Based on the testing we have done the CPU utilization of the VM is very high during the time of process execution. As seen from below snippet, the CPU utilization of VM goes up when the copy application is running.
Given the above perf numbers, we strongly suspect that running the copy via DownloadBlobTo might pose unintended impacts when used in our service (e.g.: VMSS Node restart due to high CPU utilization and this is an impactful operation as it would affect other services running on the same node).
Hence, the ask is to optimize DownloadBlobTo API to improve performance numbers measured by CPU cycles of the application.
The text was updated successfully, but these errors were encountered: