Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage] Garbage collector #250

Closed
wants to merge 2 commits into from

Conversation

odesenfans
Copy link
Contributor

Added a new process in charge of deleting files from local storage.
Files can now be marked for deletion by being listed in
the scheduled_deletions collection. The garbage collector process
will periodically look up this collection and delete all
the files for which the delete_by datetime field is in the past.

Files are now automatically marked for deletion when the user posts
them using the /storage/add_json and /storage/add_file endpoints.
The deletion is cancelled if a user creates a message using this
content before a given period of time (one hour by default).

Added a migration script that goes through all the files currently
stored on a CCN and schedules all the files not related to an Aleph
message for deletion.

@odesenfans odesenfans requested a review from hoh May 5, 2022 19:42
@odesenfans
Copy link
Contributor Author

odesenfans commented May 5, 2022

To discuss:

  • Ideally, we should have integration tests.
  • Do we want to use the same mechanism for the garbage collection linked to FORGET messages? (If yes, can be a separate PR)
  • Need to analyze the results of the migration script on a system with a fully synchronized node.
  • Documentation.
  • With this mechanism, we can change the behavior of data fetches across the network. At the moment, storage.get_hash_content takes a store_value boolean parameter that determines whether to store the file in local storage. With the garbage collector, we can change this parameter to store_permanently. This way, if a file is needed by a user through the API, we can cache it for an hour on any given node. This has the potential to reduce the load on the network. Note that this PR assumes that all the use cases where store_value is true originate from the processing of an Aleph message. Therefore, we do not need to schedule the files for deletion in these cases.

@odesenfans odesenfans force-pushed the od-garbage-collection branch 7 times, most recently from 5c3c693 to 7865478 Compare May 10, 2022 17:29
Added a new process in charge of deleting files from local storage.
Files can now be marked for deletion by being listed in
the scheduled_deletions collection. The garbage collector process
will periodically look up this collection and delete all
the files for which the `delete_by` datetime field is in the past.

Files are now automatically marked for deletion when the user posts
them using the /storage/add_json and /storage/add_file endpoints.
The deletion is cancelled if a user creates a message using this
content before a given period of time (one hour by default).

Added a migration script that goes through all the files currently
stored on a CCN and schedules all the files not related to an Aleph
message for deletion.
@odesenfans odesenfans force-pushed the od-garbage-collection branch from 7865478 to 455394e Compare May 10, 2022 17:37
New tests:
* Tests based on the API, checking that files added with
  the /storage/add_* endpoints are scheduled for deletion.
* A test to check that a STORE message will dequeue the deletion
  of the underlying file.
@odesenfans
Copy link
Contributor Author

Replaced by #269, as running a GC automatically on the main file collection might be a bit dangerous for user data.

@odesenfans odesenfans closed this May 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant