Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: bug in segcache cas implementation #68

Merged
merged 10 commits into from
Feb 19, 2023

Conversation

hderms
Copy link
Contributor

@hderms hderms commented Feb 9, 2023

fixes #59

Pretty sure this is the cause of the flakiness because if you set:

diff --git a/src/server/segcache/tests/common.rs b/src/server/segcache/tests/common.rs
index 9f33681..e551a35 100644
--- a/src/server/segcache/tests/common.rs
+++ b/src/server/segcache/tests/common.rs
@@ -235,7 +235,7 @@ fn test(name: &str, data: &[(&str, Option<&str>)]) {
             }
         }

-        std::thread::sleep(Duration::from_millis(10));
+        std::thread::sleep(Duration::from_millis(1000));
         let mut buf = vec![0; 4096];

         if let Some(response) = response {

so you enforce a long wait in between running commands, then the test will fail every time on my machine (x86_64 Ubuntu WSL2).

I think the reason it doesn't fail all the time is due to machine load/architecture performance differences that cause the GET to occur either before or after the relatively immediate expiration of the key. If Pelikan was slower we'd probably have found this bug already.

The test that specifically fails (or at least the one that seems to be the cause of the nondeterministic failures) is here:

test(
"cas stored",
&[
// store the key
("set 4 0 0 1\r\n4\r\n", Some("STORED\r\n")),
// cas with the correct cas value
("cas 4 0 0 1 1\r\n0\r\n", Some("STORED\r\n")),
// check that the value was updated
("get 4\r\n", Some("VALUE 4 0 1\r\n0\r\nEND\r\n")),
],
);

If I run that sequence of commands on memcached, the get command will always succeed, regardless of how much time has passed.

However, when I run those commands against segcache, the key is expired by the time I try to get it (at least that's the likely behavior I see, based on the GET returning nothing)

@swlynch99
Copy link
Contributor

To check this in CI can you revert #58 here as well? It won't give 100% certainty but at least if that test fails then we know the issue is still there.

@hderms
Copy link
Contributor Author

hderms commented Feb 9, 2023

To check this in CI can you revert #58 here as well? It won't give 100% certainty but at least if that test fails then we know the issue is still there.

ok I reverted it as part of this PR

@brayniac
Copy link
Collaborator

brayniac commented Feb 9, 2023

Per discussion on discord... this is actually a little more buggy than it appeared. Will edit the diff

Fix the CAS behavior for negative TTLs and improves the comments in the code for this weird case.
@brayniac brayniac changed the title fix: flakey macOS build caused by CAS operation setting 0 expiration to 1 fix: bug in segcache cas implementation Feb 9, 2023
Copy link
Contributor

@swlynch99 swlynch99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM. I don't know enough to know if this is the best way to do things but it seems good to me.

BTW you still have a rustfmt failure.

@brayniac brayniac merged commit c33ae7d into pelikan-io:main Feb 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segcache integration test is flaky within MacOS CI
4 participants