-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about segcache eviction policy #5
Comments
Both TTL and eviction will cause items to be removed from the cache. When the cache size is not large enough, items may get evicted, and when you have TTLs set, items will be removed after reaching TTL. In your case, if you have 500s TTL, and your test runs for 1 min, then no items should be removed/expired. I do not understand your question, |
@1a1a11a Thanks for the answer! In my case, I actually found the key still exists in the cache, but no value is associated with that key because the value is set to 0. I guess it's evicted or expired. The TTL is set to 500s, so the reason is the cache size is too small? Where can I check the cache size? |
@luzhang6 How are you reading this key right now? Using the function API or accessing it remotely via RPC? Was it 0 immediately after you set the key, if you try to read your own write right away? Can you maybe present a minimal setup to reproduce this problem? If so we can try to reproduce independently and save some back-and-forth. |
@thinkingfish Thanks! I think we use function API in our setup. We create random keys and values, and put into the cache, then we read the keys from the cache again to validate the values. However, some values are 0s. So we think those keys expired somehow in our test. |
@luzhang6 - if the key is not present in the cache, a miss is returned. You shouldn't get an actual value response. Are you using some client to communicate with the Pelikan Segcache server? Or are you using the "seg" storage library directly? |
It'll return the |
Thank you @brayniac ! I'm also thinking if there is some mistake on the caller side, here is impl of the caller. (it's a rust-java binding) Java side code: https://github.com/beinan/data_cache/blob/master/cache_jni/java_lib/src/main/java/alluxio/sdk/file/cache/NativeCacheManager.java And a unit test in java: https://github.com/beinan/data_cache/blob/master/cache_jni/java_lib/src/test/java/alluxio/sdk/file/cache/NativeCacheManagerTest.java The unit test is running really well, no flaky at all. But it failed in the microbench, https://github.com/Alluxio/alluxio/pull/16448/files#diff-465844ed4d3733bb67ded1023462180095271ba97f3cbd21c196c0427d1eeefa Some of the key is missing as @luzhang6 mentioned. Any idea or suggestions? Thanks a lot! |
@beinan - if you can provide a minimum repro in pure Rust, I can take a look. I'm not going to be able to troubleshoot the bindings. It looks like you're handling the "None" case properly - but not sure if I'm missing some subtle issue in the bindings or if the issue is in calling code. To reduce the debug area, please write some basic Rust program that demonstrates the issue. |
I wonder if this is because of the int and byte conversion. What if in the microbench one tries to set a fixed byte of value 255 instead of random number? I am feeling lazy and don’t want to setup a working Java system on my laptop today but can try this myself in a few days.On Jan 8, 2023, at 6:27 AM, Brian Martin ***@***.***> wrote:
@beinan - if you can provide a minimum repro in pure Rust, I can take a look. I'm not going to be able to troubleshoot the bindings. It looks like you're handling the "None" case properly - but not sure if I'm missing some subtle issue in the bindings or if the issue is in calling code.
To reduce the debug area, please write some basic Rust program that demonstrates the issue.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I guess the other basic question here would be "what is the expected value"? |
The microbenchmark generates a small random integer as the filler value. I think a hex value that’s tied to the pageID or a fixed magic number might be easier for debugging.
…-Yao
On Jan 8, 2023 at 12:58 -0800, Brian Martin ***@***.***>, wrote:
I guess the other basic question here would be "what is the expected value"?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Right, but it looks like it prints expected and returned value on
mismatched. Wondering if it's consistent at all
…On Sun, Jan 8, 2023, 1:04 PM Yao Yue ***@***.***> wrote:
The microbenchmark generates a small random integer as the filler value. I
think a hex value that’s tied to the pageID or a fixed magic number might
be easier for debugging.
-Yao
On Jan 8, 2023 at 12:58 -0800, Brian Martin ***@***.***>, wrote:
> I guess the other basic question here would be "what is the expected
value"?
> —
> Reply to this email directly, view it on GitHub, or unsubscribe.
> You are receiving this because you were mentioned.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACP6HFYVJXHR4QJM7TPLAPDWRMTV7ANCNFSM6AAAAAATGDY6TM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@thinkingfish @brayniac I adjusted the hashpower from 16 to 32 and almost did not see those errors. We roughly have 1 - 10 million keys, what is a reasonable number for the hashpower? Also can you please give more details on what is hashpower and how we should tune it? Thanks! cc @beinan |
@luzhang6 - hashpower controls the number of item slots in the hashtable. If there are too few... then items might not be stored because the hastable is a fixed size on initialization. You'd get an error Result for the insert if there is no space in the hashtable to insert the item. See the docs here for the meaning of the value: https://github.com/pelikan-io/pelikan/blob/main/src/storage/seg/src/builder.rs#L29 |
@luzhang6 Oh! when you get the key in the microbench you don't check the return status, so you won't be able to differentiate a miss (which returns |
@brayniac @thinkingfish Thanks! Will take a look at the reference doc. I see, currently we just check whether the value equals the expected value in our test. |
Hi @thinkingfish @brayniac, I'd like to ask another question, currently we use the segcache in memory, how can we switch to the disk mode when using segcache? |
@luzhang6 - the current support is really focused around using Intel Persistent Memory (PMEM) and can be enabled with the "datapool path" configuration: https://github.com/pelikan-io/pelikan/blob/main/src/storage/seg/src/builder.rs#L138 It's important to note, this is based on using memory mapped file access. If the filesystem is not mounted with DAX option, then access will go through the page cache and there will not be strong benefits and could potentially be significant impacts from colocated processes. I would not currently recommend trying to use this with a standard block device. |
Closing this issue- the question around SSD is a great one and we have plans to improve support for SSD. Will make that into the roadmap when we publish and please let me know if anybody wants to collaborate on it. |
Thank you @thinkingfish ! Is it possible to pull me into the ssd project? I think this would be a very decent replacement for presto/trino's build-in cache store, since we're suffering from the GC issues for the java implementation now. thanks! |
I just created a Discord server, you are welcome to join! https://discord.gg/EuSQn6TQh7 It's just created so still quiet right now but hopefully will become more active soon. |
Hi @brayniac @thinkingfish @kevyang, recently I am testing segcache, and I would like to ask about the eviction policy for segcache, are both ttl and segment size factors that will trigger eviction for cache item?
For example, I set 500s ttl, and my test would take about 1 minute to run. If I am still getting 0s for a cached key, is that caused by the cache segment is full and that value is kicked out? Thank you.
The text was updated successfully, but these errors were encountered: