Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store absolute blob URIs instead of just the blob name #929

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

davidmrdavid
Copy link
Collaborator

@davidmrdavid davidmrdavid commented Jul 28, 2023

Background:
(1) Large inputs and outputs to orchestrators are stored as blobs inside containers prefixed by the application's taskhub name. For example, if "app A" has taskhub "taskhubA", then its blob container will be of the form "taskhubA-largemessages/".

(2) Apps may share an instance and history table across taskhubs via the trackingStore setting. This means that app A and app B with taskhubs taskhubA and taskhubB respectively may both re-use a history and instance table inside a tracking store named, for example, "mytrackingStore".

Problem scenario

Consider two apps "A" and "B" with different taskhubs: taskhubA and taskhubB respectively. Assume they both make requests to a singleton orchestrator with ID "mySingleton" and send large inputs to it.

This scenario may lead to the following buggy sequence:

(1) App A creates "mySingleton" in the shared instance and history tables. Since "mySingleton" has a large input, it is stored as a blob inside taskhubA as "taskhubA-largeMessages/myBlob". Then it stores the name of the blob, myBlob, in the history and instance tables.

(2) App B now attempts to process a request for "mySingleton". It finds in the shared history table that there's already an entry to it. When reading its history, it needs to download the input blob. All app B knows about the blob is that it is named "myBlob" so it tries to download it from its taskhub container: "taskhubB-largemessages".

However, myBlob cannot be found there, it can only be found in taskhubA-largemessages. As a result, app B fails in this operation. App B believes this may be a transient failure, so it will attempt this operation again in some time, but will inevitably fail again.

Now app B has a poison message, needing manual intervention to fix.

Solution
There are multiple solutions to this problem.

Solution 1:
If I did not have to worry about backwards compatibility, I would ensure that all blobs created for shared instance/history tables would be stored in a shared container as well. I think it would make sense to name this container after the value of trackingStorePrefix. However, this is a dangerous change to make - I worry it could break in-progress orchestrations as well as our ability to delete pre-existing blobs when performing purge operations.

Solution 2 (this PR):
This PR makes a simple change: it makes the instance and history table store the entire blob URI, not just the blob name. This way, the framework does not have to assume that the blob URI can be constructed from its taskhub name, instead it will always know the precise location of the blob.

For backwards compatibility, if a blob field cannot be parsed as a URI, then we assume the blob must come from the taskhub container.

In other words, new blobs will be recorded as absolute URLs in the instance and history tables. Meanwhile, we should still be able to process pre-existing blobs because we fallback to the old logic when we can't parse the blob name as a URI.

Finally: This PR modifies a code-path that I'm not terribly familiar with, so feel free to challenge my assumptions. However, this PR is informed by a real and recent poison message investigation.

@davidmrdavid davidmrdavid changed the title Store large data blobs in "shared" container when using a dedicated tracking store Store absolute blob URIs instead of just the blob name Jul 28, 2023
@davidmrdavid davidmrdavid marked this pull request as ready for review July 28, 2023 23:07
@davidmrdavid davidmrdavid marked this pull request as draft July 28, 2023 23:32
@davidmrdavid
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@davidmrdavid davidmrdavid marked this pull request as ready for review September 25, 2023 18:48
Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just the usual nits about code style and comments.

I definitely recommend doing manual testing for this, assuming we don't already have an automated test for the scenario you outlined.

string blobUrl = this.messageManager.GetBlobUrl(blobName);
string blobData = blobProperty.StringValue;

// We now store blobs as absolute URIs to minimize chance of 'blob not found' errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is too vague. I think it needs to be more specific to help future maintainers understand the implications:

Suggested change
// We now store blobs as absolute URIs to minimize chance of 'blob not found' errors
// As of 1.15.1, we store blobs as absolute URIs so that we don't have to assume what task hub they were created for.
// This is necessary in cases where a common tracking store is shared across multiple task hubs.

entity.Properties.Add(blobPropertyName, new EntityProperty(blobName));

// Store blob URL so the blob can always be downloaded from the absolute path
string bloburl = this.messageManager.GetBlobUrl(blobName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C# naming nit: should be blobUrl

@@ -1234,7 +1260,19 @@ async Task DecompressLargeEntityProperties(DynamicTableEntity entity, List<strin
if (entity.Properties.TryGetValue(blobPropertyName, out EntityProperty property))
{
string blobName = property.StringValue;
string decompressedMessage = await this.messageManager.DownloadAndDecompressAsBytesAsync(blobName);

// We now store blob URIs instead of just blob names to prevent blob not found errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about making the code comment more specific. I think the important point here is that older versions store just the blob name, but newer versions store blobs as full URLs. I don't think it's necessary to talk about "blob not found" here because that assumes very specific context of a bug fix which future maintainers won't know about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants