Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hard ordering constraints in HPC mirroring #1297

Open
petersilva opened this issue Nov 14, 2024 · 9 comments · Fixed by #1329
Open

hard ordering constraints in HPC mirroring #1297

petersilva opened this issue Nov 14, 2024 · 9 comments · Fixed by #1329
Labels
bug Something isn't working Design impacts API, or code structure changes Developer not a problem, more of a note to self for devs about work to do. HPC related to hich performance computing mirroing use case Refactor change implementation of existing functionality.

Comments

@petersilva
Copy link
Contributor

petersilva commented Nov 14, 2024

See Corresponding C implementation issue for background: MetPX/sarrac#174

Things that need to be done to robustly support that on the python side:

If we want to implement exchangeSplit properly, then as pointed out here: MetPX/sarrac#174 (comment)

We need to establish instances over an entire cluster, not just a single node, in order to get singleton processing working properly.

e.g... a winnow publishes to 20 exchanges... we have 4 consuming nodes, each with 5 instances... we would want the bindings to look something like:

x1 -> n1i1, x2->n1i2, ... x5-> n1i5, x6->n2i1 ... x10 -> n2i5, x11 -> n3i1, ... x15 -> n3i5 ....

or another mapping would be to have subsets of instances on each node...

x1 -> n1i1 .... x6 -> n2i6, x11->n3i11 ...

Would have to do the math one way or another. and create the exchanges, queues and bindings.

@petersilva
Copy link
Contributor Author

petersilva commented Nov 14, 2024

  • I think counting instances per node is probably simpler/more consistent/easier in practice.
  • if we have singletons (unique instances on nodes.) then we need vips and failover in case one node dies.

@petersilva
Copy link
Contributor Author

petersilva commented Nov 14, 2024

so... maybe we need to define a topology putting hostnames in a directive?

cluster_nodes transfer04 transfer05 transfer06

so with such a layout, transfer04 is "NODE" 0, transfer05 is NODE 1, 6 is 2.
the count of the hostnames gives the node count, and can distribute things across it
using NODE numbers and the count as a divisor.

@petersilva
Copy link
Contributor Author

the other wrinkle is... if you add/remove a node, you may need to redo all the bindings... so then #35 becomes a dependency.

@petersilva
Copy link
Contributor Author

fwiw... it seems like auditing the flow of file operations is much simpler and more straight-forward than implementing all this. We could figure it out today and make had-crafted configurations for each subscriber today... but It would be quite painful to modify... having sr3 make the necessary calculations & linkages would be an easier approach for the analysts, but it's still a long haul.

All the implementation will do is serialize the transfers to preserve ordering, which, in the vast majority of cases, is not needed. It will reduce performance, but how much is unclear.

An audit that identifies files with potential access race conditions will preserve parallelism, and maximize performance... but it requires more analysis for deployment/developers.

@petersilva
Copy link
Contributor Author

related to #35 , #624

@petersilva
Copy link
Contributor Author

After some more auditing... maybe a tweak to existing mdelaylatest plugin to suppress unlinks when the timing is suspicious... have to think about it.

@petersilva petersilva added bug Something isn't working Design impacts API, or code structure changes Developer not a problem, more of a note to self for devs about work to do. Refactor change implementation of existing functionality. HPC related to hich performance computing mirroing use case labels Dec 4, 2024
@petersilva
Copy link
Contributor Author

so... reviewed how post_exchangeSplit works... and it's not helpful for this problem. I am changing it (both C and python.) the old/current code splits based on checksum/integrity value. will switch to picking based on a
checksum of the relPath (ore retrievePath if missing... should be pretty darn rare.)

Will use that to fix #1322 as well.

petersilva added a commit that referenced this issue Jan 10, 2025
with an update to an ordinary file, each new content replaces the old
but sometimes there is a sequence of events, where order is necessary.
so dispatch the earlier event immediately, rather than discarding it.
@petersilva petersilva reopened this Jan 10, 2025
@petersilva
Copy link
Contributor Author

OK... in test case for hpc-mirroring with linux kernel encountered this:
during git clone ...

  • it creates a temporary file.
  • hard links to it.... 3 times.
  • removes the tmp file.

mirroring delays everything by 30 seconds to allow for quiesence, so the tmp_file being linked to is gone by the time it is being actioned on the destination side.

with the hardlink logic we had (copy of symlink logic) the link would just fail.
Now we added fall-through logic, where if the link fails, we copy the data from the source.
net effect: the file is no longer a link, but a separate copy.

In the case above... we end up with three copies of the data, instead of three links to 1 file.

The fall-through logic solves the "problem" in that git works after... but this subtle difference...
does it matter?

@petersilva
Copy link
Contributor Author

note... in the above case... the temporary file has a different name from the links created to it, so the idea of tracking operations on the same path by having them get assigned to the same node... fails...

petersilva added a commit that referenced this issue Jan 10, 2025
with an update to an ordinary file, each new content replaces the old
but sometimes there is a sequence of events, where order is necessary.
so dispatch the earlier event immediately, rather than discarding it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Design impacts API, or code structure changes Developer not a problem, more of a note to self for devs about work to do. HPC related to hich performance computing mirroing use case Refactor change implementation of existing functionality.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant