-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Larger tests become flaky #114
Comments
@bitspittle friendly poke in case your email settings are still swallowing issues ^^ |
Thanks for the poke! I did indeed miss it, and you are welcome to poke like this anytime I don't respond within 48 hours. I probably won't be able to take a close look at this until after my talk on June 7 (well and a few days after, which I'll be spending with friends in the Bay Area), but my intention is to prioritize after that. Worst case there's actually a threading issue in core Kotter itself, which would be worth understanding. I do appreciate you bringing this up; I won't push out 1.1.3 until we either fix this or understand it better. |
Just an FYI, finally back from my talk and getting back into working on stuff again. Will probably be busy today and tomorrow but plan on looking into this flaky issue this week. Thanks for your patience! |
I am able to repro this issue locally. Coincidentally enough, it may be related to the same bug opened by another user around this same time, see also: #116 Hopefully we can figure things out tomorrow. It sure seems like occasionally the keyboard input flow from the previous section doesn't shut down early enough and therefore the next section tries to start up and therefore briefly sees some of its keys getting swallowed. "Good news" is that this is the sort of thing that in practice would only happen at the speeds of a computer running tests in a tight loop; it would not be expected for any user to notice this themselves (and even if they somehow did, it would be a single character missed at the beginning of what they were typing, which would be easy to retype) |
Been a week since my last reply, but I got hit with a bunch of Kobweb stuff, sorry for yet another delay! Finally had some time to dig today. So looking into this, this is really tricky. The issue is fundamentally dealing with subhuman speeds and coroutine timing. Basically when you call
With the flake you are seeing, what's happening is:
I'm still hoping to fix this in a clean way, but so far my efforts have all failed. I tried tying the "byte to If you want a temporary fix, you can do something like this: onInputEntered {
signal()
userInput = input
}
+ delay(16) // Give Kotter some time to set up `input` machinery
callback() |
Hello @bitspittle, More specifically, on every of our Most of the time it ran through, but in one of six runs it got stuck. (Usually the "Enter" key gets eaten, similar to the strings in the original issue). Do you have an idea on how to continue?
|
Nuts! That's disappointing to hear. You're using 1.2.0? It's been a long time since we had this discussion last. Can you either share your project with me or share exact code so I don't have to piece together code from old comments? |
Yeah, only started yesterday looking into this branch again. It's kinda a mess. On this branch we are still using
Sure, branch: https://github.com/MaibornWolff/codecharta/tree/tech/3483/integrate-kotter You need to disable caching in the gradle config and then spam build runs, then the error usually occurs rather quickly.
|
So I added onInputEntered {
/* ... */
}
delay(20) // <--------
onInputReady() and I ran Maybe you can share your exact repro steps? I guess you're using Gradle? |
Thank you again for looking into it :)
You need to run the tests for the dialog dialog test file, as they have multiple inputs back to back, and those tests are what we are trying to fix.
So step-by-step:
|
I'm getting ktlint failures
|
AFK for a few hours, will follow up later! |
Okay 👍 I will go to sleep now, its 1 AM here. I will check back tomorrow. I just pushed the formatting changes and the delay as discussed previously. Should be easier to test now |
To reproduce pretty quickly...
@Test
fun `run until frozen`() {
var i = 0
while (true) {
println("Test #${i++}...")
`should prompt user twice for input file when first input file is invalid`()
}
}
Note that this freezes even without any delay. Hmmmmm... |
I've going to have to stop for a while here to cook dinner, but it looks like the issue may be something I didn't expect. Basically this code:
must be sometimes dropping a key press? Because sometimes it ends up only selecting "Move nodes" and not "Remove nodes" (implying one of the "down" presses was lost). Then, because you didn't register any callback for move nodes, your test hangs. |
Oh wait, there are a BUNCH of |
Totally unrelated to the bug, here's two optional suggestions which you can ignore:
onKeyPressed {
when (key) {
Keys.UP ->
if (selection > 0) {
selection -= 1
}
Keys.DOWN ->
if (selection < choices.size - 1) {
selection += 1
}
Keys.HOME -> {
selection = 0
}
Keys.END -> {
selection = choices.size - 1
}
Keys.ENTER -> {
result = choices[selection]
signal()
}
}
}
// BEFORE
if (selection < choices.size - 1) {
selection += 1
}
// AFTER
if (selection < choices.lastIndex) {
selection += 1
} |
BTW I was going to recommend upgrading to 2.0.0 which should have been identical (although some extension method imports had changed, still a pretty trivial fix), but now a test is failing that wasn't in 1.1.3. Until I figure it out, I'd stick with 1.1.3... |
I think the version I pushed eight hours included them already. But maybe I missed one... |
I will test that.
I thought the delay should help with the freezes and dropped input? Or am I misunderstanding something here 😲 |
The infinite test doesn't really stop for me (commit d063c96d8554522396af67eb7c61173f5e03453f) (Edit: Which is a good thing, with 1000 successful runs, the test would definitely be safe enough) |
So when I wrote this comment, I actually didn't have
In my case, once the infinite test ran forever, that should mean all tests will pass. You're not seeing that? |
No, sadly even with a two minute time out, the tests sometimes gets stuck, if I build the complete project. The infinite test is not similar enough to a complete project build ( Edit: It failed after three successful runs
|
I think what you may want to do is change all your default callbacks in all methods that use Your tests will still fail but they shouldn't hang anymore and maybe you'll get a more meaningful error message. I'm deep diving into the 1.1.3 vs 1.2.0 bug. It's very hard to understand and debug so far, and I see the different output but I have no idea what the cause is! I'm a little worried 1.2.0 has some issue that got introduced somehow, so I'm prioritizing it at the moment. |
Sorry I'm not sure if i understand that correctly, can you expand on this? 🤔
No worries, that's very understandable. |
Ah, it's not as straightforward as I first thought. But maybe something like this: // Before
internal fun Session.myCollectParserArgs(
fileCallback: suspend RunScope.() -> Unit = {},
actionCallback: suspend RunScope.() -> Unit = {},
printCallback: suspend RunScope.() -> Unit = {},
setRootCallback: suspend RunScope.() -> Unit = {},
moveFromCallback: suspend RunScope.() -> Unit = {},
moveToCallback: suspend RunScope.() -> Unit = {},
removeNodesCallback: suspend RunScope.() -> Unit = {},
outFileCallback: suspend RunScope.() -> Unit = {},
): List<String> {
// After
internal fun Session.myCollectParserArgs(
defaultCallback: suspend RunScope.() -> Unit = {},
fileCallback: suspend RunScope.() -> Unit = defaultCallback,
actionCallback: suspend RunScope.() -> Unit = defaultCallback,
printCallback: suspend RunScope.() -> Unit = defaultCallback,
setRootCallback: suspend RunScope.() -> Unit = defaultCallback,
moveFromCallback: suspend RunScope.() -> Unit = defaultCallback,
moveToCallback: suspend RunScope.() -> Unit = defaultCallback,
removeNodesCallback: suspend RunScope.() -> Unit = defaultCallback,
outFileCallback: suspend RunScope.() -> Unit = defaultCallback,
): List<String> { Then, in test code: session.myCollectParserArgs(
defaultCallback = { terminal.press(ENTER) },
/* ... */
) That way, if an unexpected option is ever chosen (due to a dropped keypress), you won't end up on an option that just does nothing. |
I will check back tomorrow, thank you for your time :) |
(FYI you have a debug println in |
Here's a test I think is a fairly concise way to break input. I usually see a failure within 20-30 tests. @Test
fun `input should not break across multiple sections`() = testSession { terminal ->
var testRunCount = 1
while (true) {
println("Test run #$testRunCount...")
terminal.clear()
val numCharsToType = testRunCount % 20
section {
input()
}.runUntilInputEntered {
// delay(20) // uncomment me and test stops crashing
terminal.type(testRunCount.toString())
terminal.type(": ")
for (i in 0 until numCharsToType) terminal.type('a')
terminal.press(Keys.ENTER)
}
terminal.assertMatches {
text("${testRunCount}: ${"a".repeat(numCharsToType)} ")
}
testRunCount++
}
} |
Heads up, I'm narrowing down on a fix. I already have something locally that works already, but I'm seeing if I can get a better solution in the next hour or two. If not, I'll go back to my first approach. Ideally, once in, your tests should just work without needing any more |
... instead of the section lifecycle. This is unfortunate but eliminates a very, very difficult timing bug that occurs because the base flow (which converts system input into Kotter `Key` instances) is always running while the consuming flows are started and stopped per section, which due to factors out of our control, can result with their handlers taking milliseconds to connect, at which point any test that creates multiple sections that all have inputs in them have a chance to fail. Ideally we want to make all the input jobs tied to the Section lifecycle (so if a section doesn't do any work with input, then no resources will be used), but this will still have the tricky timing issues mentioned above, so it may not be trivial. Fixes #114
... instead of the section lifecycle. This is unfortunate but eliminates a very, very difficult timing bug that occurs because the base flow (which converts system input into Kotter `Key` instances) is always running while the consuming flows are started and stopped per section, which due to factors out of our control, can result with their handlers taking milliseconds to connect, at which point any test that creates multiple sections that all have inputs in them have a chance to fail. Ideally we want to make all the input jobs tied to the Section lifecycle (so if a section doesn't do any work with input, then no resources will be used), but this will still have the tricky timing issues mentioned above, so it may not be trivial. Fixes #114
OK @phanlezz , if you set your kotter version to 1.2.1-SNAPSHOT (and apply the patch from this comment), your tests should start passing, with no Let me know if you are (or are not) seeing that! If you already set your version to 1.2.1-SNAPSHOT recently, note that it normally takes like 24 hours before Gradle will check again. You should be able to tell by seeing kotter artifacts get downloaded when you run something like At the moment, I needed to keep the "worse" solution mentioned in the comment above.... which is not that terrible, really. Basically, I expanded the lifecycle of all input processing to the whole session. So any section that uses any input functionality will essentially spin stuff up that stays alive for the whole program. This isn't that bad. Still, I feel like I got SO CLOSE with the version which actually keeps the lifecycles of input logic tied to the sections that use them, but there are so many moving parts that I could never come up with something that didn't leave this or that edge case broken. I may try again tomorrow after sleeping on things a bit more. Coroutine flows are awesome but also a nightmare :) |
Sorry for the late response, i was pretty busy on the weekend/sunday. |
Oh that sounds good, I think we disabled caching, because errors which happen during high CPU situations would not show up, till an other person tested it. We did not find a good way to actually rebuild everything, so this is good to know. We might consider re-enabling it now...
Nice catch :) Thank you for the kind words. I just felt bad, because I asked for help, and the branch was not in a directly runnable state.
I will try out the snapshot with the patch and will report back the results!
Thanks for your work! |
Now some testing. I added your test (temporarily), limited to 1k iterations, to
I tested the snapshot with the patch you provided, but the dialog still got stuck 🤯 Hopefully, I missed something and its an easy fix. I pushed my latest state to the branch, so that you can check it out.
|
I looked up a bunch of stuff about structured concurrency and I have some ideas to try today. After sleeping on it, I think what is happening is that test code is moving so fast that we end up requesting a thread from the I/O thread pool before previous tests/ sections have finished cleaning theirs up. If correct, this might be an easy fix, but no promises |
This is unfortunate but eliminates a very, very difficult timing bug that occurs because the base flow (which converts system input into Kotter `Key` instances) is always running while the consuming flows are started and stopped per section, which due to factors out of our control, can result with their handlers taking milliseconds to connect, at which point any test that creates multiple sections that all have inputs in them have a chance to fail. Another approach might be to tie `KeyFlowKey` to the Section lifecycle instead, as .... Fixes #114
This is unfortunate but eliminates a very, very difficult timing bug that occurs because the base flow (which converts system input into Kotter `Key` instances) is always running while the consuming flows are started and stopped per section, which due to factors out of our control, can result with their handlers taking milliseconds to connect, at which point any test that creates multiple sections that all have inputs in them have a chance to fail. Another approach might be to tie `KeyFlowKey` to the Section lifecycle instead, as .... Fixes #114
This now means that if a section doesn't use any input machinery, that the user won't pay the price for it. This was a very complicated change, which should be 100% invisible to users. It all exists in service of letting developers write tests that aren't flaky. Before this change, calling `terminal.type` inside your run block had a chance of happening so fast that those keypresses would get dropped because the input machinery wasn't hooked up yet. Now, we have hooks in place that notify us when the input machinery is hooked up, and we block progress in various places to make sure that by the time the user starts typing some stuff, we'll be ready for them, even in tests. Fixes #114
OK @phanlezz try the latest 1.2.1-SNAPSHOT and let me know if that works! Today was insane, a deep dive into hell and back. Local testing on my end seems good so far, but of course feel free to confirm or deny. The current version should be a lot more robust / correct, and I was able to get things working where lifecycles for input logic are now only tied to the sections that use it. However, now that I know a bit more, I think I can do better, so I'm going to keep poking at this a bit tomorrow. |
Sounds awesome! I will start the testing now.
|
Good news: The
Suppressed: java.lang.InterruptedException
at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:94)
at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:69)
at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:48)
at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
at com.varabyte.kotter.platform.concurrent.locks.ReentrantReadWriteLock$ReaderLock.lock(ReentrantReadWriteLock.kt:44)
at com.varabyte.kotter.runtime.concurrent.ConcurrentScopedData.isActive(ConcurrentScopedData.kt:402)
at com.varabyte.kotter.runtime.Session.assertNoActiveSections$kotter(Session.kt:49)
at com.varabyte.kotter.foundation.SessionKt.session(Session.kt:83)
at com.varabyte.kotter.foundation.SessionKt.session$default(Session.kt:67)
at com.varabyte.kotterx.test.foundation.TestSessionKt.testSession(TestSession.kt:36)
at com.varabyte.kotterx.test.foundation.TestSessionKt.testSession$default(TestSession.kt:30)
at de.maibornwolff.codecharta.filter.structuremodifier.ParserDialogTest.should output correct arguments when print structure is selected(ParserDialogTest.kt:29)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
... 77 more
|
Before, we tried to let various jobs ask nicely if the upstream flow would wait for them to be ready. However, it turns out it is easier (and more robust) just to block progress until these jobs finish connecting to the flows before continuing. For a concrete example, here is a test that used to fail: ``` section { }.runUntilSignal { onKeyPressed { if (key == Keys.Q) signal() } terminal.press(Keys.Q) } ``` The reason this would fail is often the `terminal.press(Keys.Q)` line would get called SO QUICKLY and at JUST SUCH A TIME that its key would get processed when the upstream flow was ready but the downstream "key processing job" was not. Now, what happens is `onKeyPressed` simply blocks, inperceptibly to the human eye, so that by the time `terminal.press(Keys.Q)` is called, everything is guaranteed to be hooked up and ready to go. Fixes #114
Alright my friend @phanlezz, I have now ate, drank, and slept in this code for three days, and I think I finally got it. After exploring probably a dozen of intermediate approaches that never saw the light of day, I think I have something that is both very robust and relatively simple (probably not a coincidence). I have run tests on your code a bunch locally now, as well as my own, and so far it is holding up. I hope you'll be able to confirm the same on your end. We shall see! See also: P.S. Hopefully you will download 1.2.1-SNAPSHOT automatically since it's been ~24 hours, but if you ever need to force it, here is some advice I got from ChatGPT (which looks right but you'll have to confirm) |
I didn't get notified that you answered, GitHub swallowed the notification. Sorry for letting you wait. |
What is "auto run"? I'd like to run it myself to see if I can reproduce locally. Also, did you see which test exactly failed in ParserDialogTest? I woke up this morning and realized there's one more place I can probably simplify. If it works it might be the place the input event is getting missed. Or maybe we're running into a system issue where you're running out of I/O threads due to how hard you are hammering the system with tests (especially if things are running massively in parallel)? Not likely but maybe worth mentioning. |
#!/bin/bash
counter=1
correctRuns=0
while [ $counter -le 20 ];
do
echo "Iteration:" $counter
returnCode="1"
./gradlew test --rerun-tasks -q
returnCode="$?"
if [ $returnCode = "0" ]; then
correctRuns=$((correctRuns+1))
else
# replace with echo if test should continue on failure
exit
fi
counter=$((counter+1))
done
echo "Correct runs:" $correctRuns
Sadly, I didn't save every log, but I have those two here:
_> I increased the timeout after the first exit, as I realized that 10 s might be a little too short.
👍
It could definitely be something like that. That's why suggested that maybe it is fine as is, because I/O stuff should not happen on GitHub. I mentioned a few times, that is seems like a CPU busy thing to me, as gradle spawns 1 (?) worker per thread and, although the Dialog test itself runs in sequence, many other tests are executed simultaneously. |
New 1.2.1-SNAPSHOT is published. I simplifed kotter a bit further. (I had introduced an intermediate buffer at one point while flailing, and I realized this morning with the final approach I went with that I didn't need it anymore). I'm not sure I can go any simpler at this point. Thank you for the autorun script. I did a quick test run with two iterations and looks like it's working. I'll give it a try on my machine with 20 iterations when I eat lunch today.
Yeah, it could be something like that! The current test timeout defaults are chosen assuming a fairly normal machine under normal load. It would be interesting to tell the difference between a super delayed run because the system is still just waiting for a worker on the I/O thread pool to get its chance to run or if I have a really sensitive race condition at this point, and my input system is truly left waiting forever on a concurrency latch that will never open. Either way, the system is in WAY bettter shape than it was in 1.2.0, so thank you! |
BTW @phanlezz are you the same user as Nereboss? I'd like to share thanks in the release notes when I publish 1.2.1, and I'm wondering if I should just be calling out your account or both. |
I just ran both these tests in a loop, 1000 times each, without issue. So I'm going to assume any timeout that occurs after the latest snapshot will be related to thread starvation. |
We are different persons :) You can mention us both. |
Yeah, I don't think that the tests fail in isolation at all. In the current test suit there is still one test in |
So I went off today for three hours and kickstarted an autorun. So sure enough it failed on the second iteration, of course it did :P But not in Kotter? Once in This was with my local version of your code, so I ended up going back to BTW, it's up to you, but I don't think you have to have Finally, I'll say it again in the release notes, but really, thanks so much for the codecharta codebase. Having a clone of it was really helpful to make progress on this issue. |
Got to iteration 17 this time. Failure was in: So far no Kotter on my end :) (unless these libraries are somehow delegating to Kotter and I didn't realize it!) |
Describe the bug
Tests for a function that calls multiple extension methods become flaky.
In our application we have multiple parsers that each have a dialog which consists of multiple questions.
All questions are tested separately from another and no test there has ever been flaky.
Now the dialog of a parser is one function that calls multiple questions, a bit like this:
Dialog {
result1 = question1()
result2 = question1()
return result 1 + result 2
}
However, the tests for that dialog function become flaky and it looks like the terminal is somehow "swallowing keys" with the starting keys of the user input missing.
To Reproduce
I wrote a minimal example to make this issue reproducible:
We have extension functions for the different questions like this
Tests for this function alone are not flaky, here is a test to try it out:
Now we build a dialog from multiple questions with a function like this (the code of question2 is identical to the above shown question1):
Tests for this dialog function are flaky with the start of the simulated input being cut off (more often for the 2nd question than the first but it does happen for the first as well). Here is such a test to try it out:
It seems like the tests fail more often when the CPU is busy. They also fail more on weaker hardware.
To make it more easily reproducible, we pasted the test for the dialog 6 times into a testclass. That way it most of the time fails within the first 10 executions.
Expected behavior
The tests are not flaky.
Screenshots
Example of the first 2 letters of the input for the 2nd question being "swallowed"
Desktop (please complete the following information):
Additional context
As a testing environment we used Junit5, Gradle8 and Kotter version 1.9.23.
Annotating the tests with
@Execution(ExecutionMode.SAME_THREAD)
also did not influence the result.The text was updated successfully, but these errors were encountered: