Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds config for min and delta backoff poll intervals #174

Merged
merged 2 commits into from
Aug 31, 2023

Conversation

dmetzgar
Copy link
Contributor

@dmetzgar dmetzgar commented Jun 27, 2023

PR for #172. Makes the minimum and delta backoff intervals configurable for both activity and orchestration instance polling. For situations where a lot of pods have TaskHubWorkers, the default poll interval of 50ms uses a significant amount of DTU. This can become more prominent when SQL Azure decides to change the query plan due to parameter sniffing. In production, we see 12 polls per second on NewTasks and Instances tables each during averaged over 24 hours.

A behavior we've seen that you might be able to confirm @cgillum is on new task polling. It appears that the behavior of activities scheduled by an orchestration are inserted to the NewTasks table with a lock. The TaskHubWorker that is running the orchestration instance is then also expected to run the activities and only after the lock expires could they be picked up by other workers. I think this is the behavior because when I put orchestration and activity into separate TaskHubWorkers, the activities don't get executed. From my point of view, the time when tasks need to be picked up is when the TaskHubWorker goes down due to failure or deployment or if the task is a timer set further into the future than the lock expiration. Therefore, we have increased the intervals on activity polling to much higher than instance polling.

@dmetzgar
Copy link
Contributor Author

@microsoft-github-policy-service agree company="UiPath"

@cgillum
Copy link
Member

cgillum commented Aug 31, 2023

It appears that the behavior of activities scheduled by an orchestration are inserted to the NewTasks table with a lock.

I'm pretty confident that this is not the case. There should be no lock when rows are added to the NewTasks table. We would much prefer that these activities can be load balanced across multiple workers.

I think this is the behavior because when I put orchestration and activity into separate TaskHubWorkers, the activities don't get executed.

We actually require that all TaskHubWorkers register the exact same set of activities and orchestrations. If you don't do this, you can expect runtime exceptions complaining about how either an activity wasn't found or an orchestration wasn't found on the worker which didn't register that orchestration or activity. I'd like to add a feature that allows splitting them up, but I haven't been able to prioritize it yet.

If you're observing that activities typically run on the same worker as the orchestrations that schedule them, it's likely because we reset the backoff polling interval on the local worker when we detect an orchestration has scheduled activities (or sub-orchestrations). We do this to minimize the latency between scheduling tasks and having them start running. The side-effect of this is that tasks are biased to run on the same worker that scheduled them. Distribution typically happens when load is higher and the local worker can't fetch the tasks and execute them as quickly.

@cgillum cgillum linked an issue Aug 31, 2023 that may be closed by this pull request
@cgillum cgillum merged commit b3a7bf4 into microsoft:main Aug 31, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configurable minimum for backoff retry for polling interval
2 participants