Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vine: count child_count properly in the transfer server process #4078

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

JinZhou5042
Copy link
Member

@JinZhou5042 JinZhou5042 commented Feb 26, 2025

Proposed Changes

I found this problamatic while investigating #4076

In the worker's transfer server process, the time when increasing the child_count is before we even check if the lnk is valid or if the fork was successful. This leads to inaccurate counting, especially when the connection is NULL (timeout) or the fork fails. The result is that the worker could block indefinitely in the waitpid call, waiting for a child process that doesn't exist.

The solution is to only increment the child_count after successfully forking a child process, and handle fork failures separately to ensure the count remains accurate, preventing unnecessary blocking.

Merge Checklist

The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update: Update the manual to reflect user-visible changes.
  • Type Labels: Select a github label for the type: bugfix, enhancement, etc.
  • Product Labels: Select a github label for the product: TaskVine, Makeflow, etc.
  • PR RTM: Mark your PR as ready to merge.

@JinZhou5042 JinZhou5042 marked this pull request as ready for review February 26, 2025 20:03
@JinZhou5042 JinZhou5042 requested a review from dthain February 26, 2025 20:24
@JinZhou5042 JinZhou5042 self-assigned this Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant