-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hard ordering constraints in HPC mirroring #1297
Comments
|
so... maybe we need to define a topology putting hostnames in a directive?
so with such a layout, transfer04 is "NODE" 0, transfer05 is NODE 1, 6 is 2. |
the other wrinkle is... if you add/remove a node, you may need to redo all the bindings... so then #35 becomes a dependency. |
fwiw... it seems like auditing the flow of file operations is much simpler and more straight-forward than implementing all this. We could figure it out today and make had-crafted configurations for each subscriber today... but It would be quite painful to modify... having sr3 make the necessary calculations & linkages would be an easier approach for the analysts, but it's still a long haul. All the implementation will do is serialize the transfers to preserve ordering, which, in the vast majority of cases, is not needed. It will reduce performance, but how much is unclear. An audit that identifies files with potential access race conditions will preserve parallelism, and maximize performance... but it requires more analysis for deployment/developers. |
After some more auditing... maybe a tweak to existing mdelaylatest plugin to suppress unlinks when the timing is suspicious... have to think about it. |
so... reviewed how post_exchangeSplit works... and it's not helpful for this problem. I am changing it (both C and python.) the old/current code splits based on checksum/integrity value. will switch to picking based on a Will use that to fix #1322 as well. |
with an update to an ordinary file, each new content replaces the old but sometimes there is a sequence of events, where order is necessary. so dispatch the earlier event immediately, rather than discarding it.
OK... in test case for hpc-mirroring with linux kernel encountered this:
mirroring delays everything by 30 seconds to allow for quiesence, so the tmp_file being linked to is gone by the time it is being actioned on the destination side. with the hardlink logic we had (copy of symlink logic) the link would just fail. In the case above... we end up with three copies of the data, instead of three links to 1 file. The fall-through logic solves the "problem" in that git works after... but this subtle difference... |
note... in the above case... the temporary file has a different name from the links created to it, so the idea of tracking operations on the same path by having them get assigned to the same node... fails... |
with an update to an ordinary file, each new content replaces the old but sometimes there is a sequence of events, where order is necessary. so dispatch the earlier event immediately, rather than discarding it.
See Corresponding C implementation issue for background: MetPX/sarrac#174
Things that need to be done to robustly support that on the python side:
If we want to implement exchangeSplit properly, then as pointed out here: MetPX/sarrac#174 (comment)
We need to establish instances over an entire cluster, not just a single node, in order to get singleton processing working properly.
e.g... a winnow publishes to 20 exchanges... we have 4 consuming nodes, each with 5 instances... we would want the bindings to look something like:
x1 -> n1i1, x2->n1i2, ... x5-> n1i5, x6->n2i1 ... x10 -> n2i5, x11 -> n3i1, ... x15 -> n3i5 ....
or another mapping would be to have subsets of instances on each node...
x1 -> n1i1 .... x6 -> n2i6, x11->n3i11 ...
Would have to do the math one way or another. and create the exchanges, queues and bindings.
The text was updated successfully, but these errors were encountered: