Adds config for min and delta backoff poll intervals #174

dmetzgar · 2023-06-27T21:30:09Z

PR for #172. Makes the minimum and delta backoff intervals configurable for both activity and orchestration instance polling. For situations where a lot of pods have TaskHubWorkers, the default poll interval of 50ms uses a significant amount of DTU. This can become more prominent when SQL Azure decides to change the query plan due to parameter sniffing. In production, we see 12 polls per second on NewTasks and Instances tables each during averaged over 24 hours.

A behavior we've seen that you might be able to confirm @cgillum is on new task polling. It appears that the behavior of activities scheduled by an orchestration are inserted to the NewTasks table with a lock. The TaskHubWorker that is running the orchestration instance is then also expected to run the activities and only after the lock expires could they be picked up by other workers. I think this is the behavior because when I put orchestration and activity into separate TaskHubWorkers, the activities don't get executed. From my point of view, the time when tasks need to be picked up is when the TaskHubWorker goes down due to failure or deployment or if the task is a timer set further into the future than the lock expiration. Therefore, we have increased the intervals on activity polling to much higher than instance polling.

dmetzgar · 2023-06-27T21:31:17Z

@microsoft-github-policy-service agree company="UiPath"

cgillum · 2023-08-31T18:06:34Z

It appears that the behavior of activities scheduled by an orchestration are inserted to the NewTasks table with a lock.

I'm pretty confident that this is not the case. There should be no lock when rows are added to the NewTasks table. We would much prefer that these activities can be load balanced across multiple workers.

I think this is the behavior because when I put orchestration and activity into separate TaskHubWorkers, the activities don't get executed.

We actually require that all TaskHubWorkers register the exact same set of activities and orchestrations. If you don't do this, you can expect runtime exceptions complaining about how either an activity wasn't found or an orchestration wasn't found on the worker which didn't register that orchestration or activity. I'd like to add a feature that allows splitting them up, but I haven't been able to prioritize it yet.

If you're observing that activities typically run on the same worker as the orchestrations that schedule them, it's likely because we reset the backoff polling interval on the local worker when we detect an orchestration has scheduled activities (or sub-orchestrations). We do this to minimize the latency between scheduling tasks and having them start running. The side-effect of this is that tasks are biased to run on the same worker that scheduled them. Distribution typically happens when load is higher and the local worker can't fetch the tasks and execute them as quickly.

Adds config for min and delta backoff poll intervals

a164483

cgillum approved these changes Aug 31, 2023

View reviewed changes

Update CHANGELOG.md

b90c348

cgillum linked an issue Aug 31, 2023 that may be closed by this pull request

Configurable minimum for backoff retry for polling interval #172

Closed

cgillum merged commit b3a7bf4 into microsoft:main Aug 31, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds config for min and delta backoff poll intervals #174

Adds config for min and delta backoff poll intervals #174

dmetzgar commented Jun 27, 2023 •

edited

Loading

dmetzgar commented Jun 27, 2023

cgillum commented Aug 31, 2023

Adds config for min and delta backoff poll intervals #174

Adds config for min and delta backoff poll intervals #174

Conversation

dmetzgar commented Jun 27, 2023 • edited Loading

dmetzgar commented Jun 27, 2023

cgillum commented Aug 31, 2023

dmetzgar commented Jun 27, 2023 •

edited

Loading