-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage prefixes always #954
base: master
Are you sure you want to change the base?
Conversation
159a409
to
2719728
Compare
return KATANA_CHECKED(katana::PropertyGraph::MakeEphemeral( | ||
TopologyFromCSR(edge_indices, edge_destinations))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reasonable usage of this python function is to call it and then call write
on the result to store it to a specific location. Will that work correctly? It relies on writing the RDG to a different location compared to where it started (in the tmp dir).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought a lot about that use case when I was writing this so it should work correctly. But I need to check to see if we are testing it currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure I test it from Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you point me in the right direction to short-circuit my search a little?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
katana/python/test/test_convert_graph.py
Line 227 in 95ad4cf
@pytest.mark.required_env("KATANA_TEST_DATASETS") |
Through the power of searching for write
then looking at the tests for importing.
~EphemeralStoragePrefix() { | ||
std::vector<std::string> files; | ||
auto list_future = FileListAsync(prefix_.path(), &files); | ||
if (!list_future.valid()) { | ||
KATANA_LOG_WARN( | ||
"unable to list files, not cleaning up ephemeral storage"); | ||
return; | ||
} | ||
|
||
auto list_future_res = list_future.get(); | ||
if (!list_future_res) { | ||
KATANA_LOG_WARN( | ||
"unable to list files, not cleaning up ephemeral storage: {}", | ||
list_future_res.error()); | ||
} | ||
|
||
std::unordered_set deletable_files(files.begin(), files.end()); | ||
auto delete_res = FileDelete(prefix_.path(), deletable_files); | ||
if (!delete_res) { | ||
KATANA_LOG_WARN( | ||
"unable to delete files, not cleaning up ephemeral storage: {}", | ||
delete_res.error()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized something you may not be aware of. On Linux/Unix, you can create, open, and immediately delete a file. The file disappears to anyone listing the directly, but it actually still exists as long as it is still open. You can use that for temp files, that you don't need to close and reopen. It avoids the need to explicitly clean up the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contract here is that PropertyGraph
has a storage prefix that it is free to do whatever it wants with. So I don't think we will be able to use those sorts of temporary files without invasive changes to the storage layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. We might need to setup a signal handler to do this work then, since destructors don't run if the program crashes. But for now we should just document that our handling of temporary files is not complete and we will leak temp files in a number of cases. One case, BTW, is python interpreter exit with a live graph reference in the global scope. Python finalizers are not guaranteed to be called in that case, so C++ destructors may not be either. This could very easily happen with someone restarting their Jupyter "kernel" (python interpreter). The result could be creating a new set of temp files every time they restart the kernel.
I have two thoughts:
|
That is true. One downside is that in that model we don't nuke the ephemeral place unless we die so we could potentially build up a lot of cruft there.
|
Defines a system-wide policy for choosing a temporary directory.
2719728
to
056f20d
Compare
Add methods to URI to check if one URI is a prefix of another.
A utility class that wraps a storage prefix and deletes all files under that prefix when it is destroyed.
RDG and PropertyGraph both now provide a MakeEphemeral(), which creates a graph that is backed by an ephemeral storage location and approximates an in-memory graph.
Some instances of this behavior can be replaced with MakeEphemeral() and some can be replaced with a call that provides a storage prefix.
056f20d
to
9163154
Compare
If that's a concern we could choose random sub directories of that prefix and keep the parts of one property graph, and auto remove that sub-directory when the property graph goes away (leaving the global cleanup on library unload so that we don't have to know about all the live ephemeral objects when we die). But the check in libtsuba would be just as simple. |
This part confused me. Callers of |
This is actually the current state of this PR (sort of). I don't know if this is an actual concern but since I had already written some code to manage per-graph locations I implemented exactly the hybrid you described. I didn't actually write the signal handler to do clean up but I can tack that onto this PR before I merge. |
I don't think it is impossible that we would get this complaint. But I don't feel all that strongly and it is definitely a cleaner interface to have all the |
Because we need to be able to return ephemeral graphs to the remote API as handles and we cannot guarantee that the workers will still be running when the next request related to the graph comes in. So the ephemeral graphs need to be persistented for the client session or something like that. |
There is no way to keep a So the proper (but maybe a little snotty) answer to your question is that there is no such thing as an in-memory |
So is the idea with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be to drop Ephemeral
from the name of the factory function and just promote the Make
variant without a path to be what (I think) everyone expected it was anyway.
|
||
/// Make a property graph from topology and associate it with an ephemeral | ||
/// storage prefix. This is approximates an in-memory graph. | ||
static Result<std::unique_ptr<PropertyGraph>> MakeEphemeral( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support Tyler's suggestion of keeping the name Make
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good from my side.
Yes. In particular |
I propose (via PR) complicating both
RDG
andPropertyGraph
. They can both now optionally be constructed as "ephemeral". An ephemeral graph is backed by a storage location that will be deleted when the in-memory graph goes out of scope. It cannot be committed to that location but it can be written to another location. An existing graph can not be made ephemeral but an ephemeral graph can be made non-ephemeral by writing it to a new location. This functionality is used in three primary places:katana.local.Graph
python classPropertyGraph
s on the fly as intermediate state