-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(index): deduplicate map values #61
Comments
alternative: use bitflags to keep fast lookup and stack-allocation while allowing for multi-typed instances. Note Different example than above to illustrate instances with multiple types: with vecs: {
"types": [
"<http://example.org/Employee>",
"<http://example.org/Organization>",
"<http://example.org/Person>",
"<http://example.org/Researcher>"
],
"map": {
"8836142820109335346": [
0,
2,
3
],
"16856853018305323212": [
1
],
"15184171154146416828": [
0,
2
],
"8501891617301323111": [
0,
3
]
}
} With bitflags: {
"types": [
"<http://example.org/Employee>",
"<http://example.org/Organization>",
"<http://example.org/Person>",
"<http://example.org/Researcher>"
],
"map": {
"8836142820109335346": 12, # == 0b1101
"16856853018305323212": 1, # == 0b0001
"15184171154146416828": 5, # == 0b0101
"8501891617301323111": 8 # == 0b1001
}
} Significantly more compact than the current format... |
Using bitflags would limit us to 64 types, and using bitvecs/bitarrays would be too much overhead. let's stick to smallVec |
arent there infinite bitflags, which compact multiple |
Yeah i think that's what bitvec/bitarrays are (e.g. https://github.com/ferrilab/bitvec) But it seems like more overhead than just using smallvecs |
The index currently looks like this:
In each value of
map
, we should store only unique values. For performance reasons, we may want to keep using SmallVec instead of HashSet (on the heap), but then we should add a check on append.The desired structure of above example would be:
The text was updated successfully, but these errors were encountered: