-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a memory limited hashset with LocalVocab
#1570
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1570 +/- ##
==========================================
- Coverage 89.21% 89.20% -0.01%
==========================================
Files 372 373 +1
Lines 34723 34789 +66
Branches 3915 3919 +4
==========================================
+ Hits 30979 31035 +56
- Misses 2471 2480 +9
- Partials 1273 1274 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first round of reviews.
src/util/HashSet.h
Outdated
@@ -32,4 +37,88 @@ template <class T, | |||
using HashSetWithMemoryLimit = | |||
absl::flat_hash_set<T, HashFct, EqualElem, Alloc>; | |||
|
|||
template <class T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HashSet has a Slot array which might have more slots than elements and might rehash with size doubling or so .
Also take this into account, it requires some more interaction with the absl
code, but the documentation is good and the interface is all there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the logic to track the slot sizes. But I do not yet know how to deal with the size doubling.
@@ -90,5 +91,7 @@ class Literal { | |||
static Literal literalWithoutQuotes( | |||
std::string_view rdfContentWithoutQuotes, | |||
std::optional<std::variant<Iri, std::string>> descriptor = std::nullopt); | |||
|
|||
size_t getDynamicMemoryUsage() const { return content_.capacity(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you can figure out if the small buffer optimization applies.
(basically check if &content <= content.data() < (&content + sizeof(content))
.
(but again, not supejr important).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A preliminary round of reviews.
src/parser/Iri.h
Outdated
// memory usage as this does not currently take into account small string | ||
// optimization of `std::string` | ||
size_t getDynamicMemoryUsage() const { | ||
return sizeof(std::string) + iri_.capacity(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to make clear, whether the hash set expects only the dynamic part of the memory usage, without the sizeof()
part (probably yes, as this is managed differently by a lot of containers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this from Iri.h
and Literal.h
. I think the allocator (I am not sure if I could use the exisiting AllocatorWithLimit
or if I need to write a new one??) could keep track of the static memory while the SizeGetter
s just return the dynamic memory usage.
|
||
// Try to allocate the amount of memory requested | ||
void increaseMemoryUsed(ad_utility::MemorySize amount) { | ||
memoryLeft_.ptr()->wlock()->decrease_if_enough_left_or_throw(amount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little wasteful, as we always have to obtain a global mutex.
I think for a follow-up (or preparation) PR,
You could implement a wrapper around the memoryLeftThreadsafe object, that stores a (single-threaded) small pool of memory and only goes to the global wlock()
when that pool is exhausted.
Currently increase (1), increase(1), increase(1)
needs a global synchronization for each of the three inserts, which seems wastefult.
(But as you have abstracted away all the memory handling, that is easy to integrate.)
7499478
to
1d976dd
Compare
…the memory usage instead of `ad_utility::MemorySize`
…ry used by its `AllocationMemoryLeftThreadSafe` object.
…iteralOrIri` objects. Also added comments.
…ncrease and decrease memory usage
…. Change the default template argument of `SizeGetter` from `SizeOfSizeGetter` to `DefaultValueSizeGetter`.
…ables used to count slot size to have same name. Initialize `memoryUsed_` and `currentNumSlots_` in class.
…izeGetter variables.
915afb0
to
a73c84a
Compare
Conformance check passed ✅No test result changes. |
Quality Gate passedIssues Measures |
CustomHashSetWithMemoryLimit
: a wrapper class around absl::node_hash_set which tracks the memory used by the elements of the hashset.LocalVocab
to use the wrapper hashset implementation.