-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement xxhash non-zero seed support #9
Comments
Hi there! 👋 I've been pondering over the What if we introduce a unified function, let's call it Here's a quick breakdown of the idea:
Here’s a rough sketch of what the implementation might look like in Rust: enum HashFunctionType {
XXH32,
XXH64,
XXH3_64,
XXH3_128,
}
fn hash_with_function(hash_function: HashFunctionType, data: &[u8]) -> u128 {
match hash_function {
HashFunctionType::XXH32 => {
let seed: u32 = 0; // Hard-coded seed for XXH32
xxh32::hash_with_seed(data, seed) as u128
},
HashFunctionType::XXH64 => {
let seed: u64 = 0; // Hard-coded seed for XXH64
xxh64::hash_with_seed(data, seed) as u128
},
HashFunctionType::XXH3_64 => {
let seed: u64 = 0; // Hard-coded seed for XXH3_64
xxh3_64::hash_with_seed(data, seed) as u128
},
HashFunctionType::XXH3_128 => {
let seed: u128 = 0; // Hard-coded seed for XXH3_128
xxh3_128::hash_with_seed(data, seed)
},
}
}
// Usage examples
let hash32 = seeded_xx(HashType::XX32, 12345, b"hello world");
let hash64 = seeded_xx(HashType::XX64, 12345, b"hello world"); A few points to consider: I believe this approach keeps things user-friendly while maintaining the flexibility and robustness of our implementation. What do you all think? Looking forward to your feedback and insights! 🌟 Cheers, |
thx @gmottajr for the suggestion. From the implementation perspective, we clearly should unifie the design, just like i did with most other implementation here - e.g. there is a shared code to instantiate hasher, keep track on the current state when using aggregate hasher, and finalize it - all done via From the usage perspective, it would break a lot of common practices, without really gaining us much. It is easier to write |
Hi Yuri, Anyway, I'm eager to contribute to the project in ways that align with its current needs and goals. In case you want to proceed with this change or If there are other areas where you think my skills could be useful, please let me know. I'm open to any suggestions or areas where my efforts could add value. |
@gmottajr thank you, glad to see so much interest in my projects :) As a simple starter, perhaps try maplibre/martin#1093 ? At least the first two items are very easy to do. Let me know if you run into any problems. You can also ping me on OSM-US Slack (see link in any MapLibre's or Martin readme) |
I am not certain this is needed, but if anyone knows more about this, and finds it useful, putting down some thoughts on how it can be done.
The xxhash functions
xxh32, xxh64, xxh3_64, xxh3_128
support, and in case of xxh32 and xxh64 require (in their underlying implementation) to provide a seed value (u32
for xxh32, andu64
for the rest). Currently, the seed value is always set to 0.Some users may need to have a seed value at least for
xxh32, xxh64
for them to match expectations. For that, we could provide additional non-zero-seeded SQL functions:SEEDED_XXH32(seed, value1, value2, ...)
- the first argument is treated as an integer seed value, the rest are processed as with all other scalar hashing functions.NULL
will be returned if seed value isNULL
, regardless of the rest of the values.SEEDED_XXH32_CONCAT(seed, column1, column2, ...)
- aggregated/window function, will only process the seed value on the first row, and ignore it on the subsequent ones. Not sure this is what the users will expect though.The text was updated successfully, but these errors were encountered: