-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store absolute blob URIs instead of just the blob name #929
base: main
Are you sure you want to change the base?
Conversation
/azp run |
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just the usual nits about code style and comments.
I definitely recommend doing manual testing for this, assuming we don't already have an automated test for the scenario you outlined.
string blobUrl = this.messageManager.GetBlobUrl(blobName); | ||
string blobData = blobProperty.StringValue; | ||
|
||
// We now store blobs as absolute URIs to minimize chance of 'blob not found' errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is too vague. I think it needs to be more specific to help future maintainers understand the implications:
// We now store blobs as absolute URIs to minimize chance of 'blob not found' errors | |
// As of 1.15.1, we store blobs as absolute URIs so that we don't have to assume what task hub they were created for. | |
// This is necessary in cases where a common tracking store is shared across multiple task hubs. |
entity.Properties.Add(blobPropertyName, new EntityProperty(blobName)); | ||
|
||
// Store blob URL so the blob can always be downloaded from the absolute path | ||
string bloburl = this.messageManager.GetBlobUrl(blobName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C# naming nit: should be blobUrl
@@ -1234,7 +1260,19 @@ async Task DecompressLargeEntityProperties(DynamicTableEntity entity, List<strin | |||
if (entity.Properties.TryGetValue(blobPropertyName, out EntityProperty property)) | |||
{ | |||
string blobName = property.StringValue; | |||
string decompressedMessage = await this.messageManager.DownloadAndDecompressAsBytesAsync(blobName); | |||
|
|||
// We now store blob URIs instead of just blob names to prevent blob not found errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about making the code comment more specific. I think the important point here is that older versions store just the blob name, but newer versions store blobs as full URLs. I don't think it's necessary to talk about "blob not found" here because that assumes very specific context of a bug fix which future maintainers won't know about.
Background:
(1) Large inputs and outputs to orchestrators are stored as blobs inside containers prefixed by the application's taskhub name. For example, if "app A" has taskhub "taskhubA", then its blob container will be of the form "taskhubA-largemessages/".
(2) Apps may share an instance and history table across taskhubs via the trackingStore setting. This means that app A and app B with taskhubs taskhubA and taskhubB respectively may both re-use a history and instance table inside a tracking store named, for example, "mytrackingStore".
Problem scenario
Consider two apps "A" and "B" with different taskhubs: taskhubA and taskhubB respectively. Assume they both make requests to a singleton orchestrator with ID "mySingleton" and send large inputs to it.
This scenario may lead to the following buggy sequence:
(1) App A creates "mySingleton" in the shared instance and history tables. Since "mySingleton" has a large input, it is stored as a blob inside taskhubA as "taskhubA-largeMessages/myBlob". Then it stores the name of the blob, myBlob, in the history and instance tables.
(2) App B now attempts to process a request for "mySingleton". It finds in the shared history table that there's already an entry to it. When reading its history, it needs to download the input blob. All app B knows about the blob is that it is named "myBlob" so it tries to download it from its taskhub container: "taskhubB-largemessages".
However, myBlob cannot be found there, it can only be found in taskhubA-largemessages. As a result, app B fails in this operation. App B believes this may be a transient failure, so it will attempt this operation again in some time, but will inevitably fail again.
Now app B has a poison message, needing manual intervention to fix.
Solution
There are multiple solutions to this problem.
Solution 1:
If I did not have to worry about backwards compatibility, I would ensure that all blobs created for shared instance/history tables would be stored in a shared container as well. I think it would make sense to name this container after the value of
trackingStorePrefix
. However, this is a dangerous change to make - I worry it could break in-progress orchestrations as well as our ability to delete pre-existing blobs when performing purge operations.Solution 2 (this PR):
This PR makes a simple change: it makes the instance and history table store the entire blob URI, not just the blob name. This way, the framework does not have to assume that the blob URI can be constructed from its taskhub name, instead it will always know the precise location of the blob.
For backwards compatibility, if a blob field cannot be parsed as a URI, then we assume the blob must come from the taskhub container.
In other words, new blobs will be recorded as absolute URLs in the instance and history tables. Meanwhile, we should still be able to process pre-existing blobs because we fallback to the old logic when we can't parse the blob name as a URI.
Finally: This PR modifies a code-path that I'm not terribly familiar with, so feel free to challenge my assumptions. However, this PR is informed by a real and recent poison message investigation.