Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Larger tests become flaky #114

Open
Nereboss opened this issue May 13, 2024 · 55 comments
Open

Larger tests become flaky #114

Nereboss opened this issue May 13, 2024 · 55 comments
Labels
bug Something isn't working

Comments

@Nereboss
Copy link

Describe the bug

Tests for a function that calls multiple extension methods become flaky.

In our application we have multiple parsers that each have a dialog which consists of multiple questions.
All questions are tested separately from another and no test there has ever been flaky.

Now the dialog of a parser is one function that calls multiple questions, a bit like this:

Dialog {
result1 = question1()
result2 = question1()
return result 1 + result 2
}

However, the tests for that dialog function become flaky and it looks like the terminal is somehow "swallowing keys" with the starting keys of the user input missing.

To Reproduce

I wrote a minimal example to make this issue reproducible:

We have extension functions for the different questions like this

fun Session.question1(callback: suspend RunScope.() -> Unit): String {
    var userInput = "-1"
    section {
        textLine("render some text and then display the input:")
        input()
    }.runUntilSignal {
        onInputEntered {
            signal()
            userInput = input
        }
        callback()
    }
    return userInput
}

Tests for this function alone are not flaky, here is a test to try it out:

@Test
    fun `does question1 ever become flaky`() {
        val testString = "Long input that should test if the length has any effect on anything"
        var result: String
        testSession { terminal ->
            result = question1(callback = {
                terminal.type(testString)
                terminal.press(Keys.ENTER)
            })
            assertThat(result).isEqualTo(testString)
        }
    }

Now we build a dialog from multiple questions with a function like this (the code of question2 is identical to the above shown question1):

fun Session.dialog(
    callback: suspend RunScope.() -> Unit = {},
): String {
    val result1: String = question1(callback)
    val result2: String = question2(callback)

    return "$result1|||$result2"
}

Tests for this dialog function are flaky with the start of the simulated input being cut off (more often for the 2nd question than the first but it does happen for the first as well). Here is such a test to try it out:

@Test
fun `test if calling the function that calls fun1 from here becomes flaky`() {
    val testString = "Long input that should test if the length has any effect on anything"
    var result: String
    testSession { terminal ->
        result = dialog(callback = {
            terminal.type(testString)
            terminal.press(Keys.ENTER)
        })
        assertThat(result).isEqualTo("$testString|||$testString")
    }
}

It seems like the tests fail more often when the CPU is busy. They also fail more on weaker hardware.
To make it more easily reproducible, we pasted the test for the dialog 6 times into a testclass. That way it most of the time fails within the first 10 executions.

Expected behavior
The tests are not flaky.

Screenshots
Example of the first 2 letters of the input for the 2nd question being "swallowed"
image

Desktop (please complete the following information):

  • OS: MacOS (also tested on Windows with the same result)
  • Version 1.1.3-SNAPSHOT

Additional context
As a testing environment we used Junit5, Gradle8 and Kotter version 1.9.23.
Annotating the tests with @Execution(ExecutionMode.SAME_THREAD) also did not influence the result.

@Nereboss Nereboss added the bug Something isn't working label May 13, 2024
@Nereboss
Copy link
Author

@bitspittle friendly poke in case your email settings are still swallowing issues ^^

@bitspittle
Copy link
Contributor

Thanks for the poke! I did indeed miss it, and you are welcome to poke like this anytime I don't respond within 48 hours.

I probably won't be able to take a close look at this until after my talk on June 7 (well and a few days after, which I'll be spending with friends in the Bay Area), but my intention is to prioritize after that. Worst case there's actually a threading issue in core Kotter itself, which would be worth understanding.

I do appreciate you bringing this up; I won't push out 1.1.3 until we either fix this or understand it better.

@bitspittle
Copy link
Contributor

Just an FYI, finally back from my talk and getting back into working on stuff again. Will probably be busy today and tomorrow but plan on looking into this flaky issue this week. Thanks for your patience!

@bitspittle
Copy link
Contributor

I am able to repro this issue locally. Coincidentally enough, it may be related to the same bug opened by another user around this same time, see also: #116

Hopefully we can figure things out tomorrow. It sure seems like occasionally the keyboard input flow from the previous section doesn't shut down early enough and therefore the next section tries to start up and therefore briefly sees some of its keys getting swallowed.


"Good news" is that this is the sort of thing that in practice would only happen at the speeds of a computer running tests in a tight loop; it would not be expected for any user to notice this themselves (and even if they somehow did, it would be a single character missed at the beginning of what they were typing, which would be easy to retype)

@bitspittle
Copy link
Contributor

bitspittle commented Jun 22, 2024

Been a week since my last reply, but I got hit with a bunch of Kobweb stuff, sorry for yet another delay! Finally had some time to dig today.

So looking into this, this is really tricky. The issue is fundamentally dealing with subhuman speeds and coroutine timing. Basically when you call input we do two things:

  1. create a Kotlin flow that read bytes from the terminal and convert them to Kotter Key instances. This flow is tied to the session lifecycle.
  2. launch a Kotlin coroutine job that collects input Key events and updates a bunch of state that ends up in the final output that you see rendered on the screen, e.g. this is where cursor presses, the ENTER key, and etc. are handled. This work is tied to the section lifecycle.

With the flake you are seeing, what's happening is:

  1. You start the first section, which spins up the "byte to Key" flow and shortly after launches the "process Key" job.
  2. Eventually, the first section exits. The "byte to Key" flow is left alive while the "process Key" job is cancelled.
  3. You run the second section, which launches a second "process Key" job. During this time, you immediately start sending keypresses.
  4. The "byte to Key" flow handles these key presses, converting them to Key instances and sending them off. However...
  5. The "process Key" job is not ready yet, so some of those initial Key events are dropped in that brief window.

I'm still hoping to fix this in a clean way, but so far my efforts have all failed.

I tried tying the "byte to Key" flow to the section lifecycle instead of the session lifecycle, but it's not easy -- it's tricky to cancel and then restart the flow. Meanwhile, tying the "process Key" job to the session lifecycle caused at least one of my other input tests to hang. Neat.

If you want a temporary fix, you can do something like this:

        onInputEntered {
            signal()
            userInput = input
        }
+       delay(16) // Give Kotter some time to set up `input` machinery
        callback()

@phanlezz
Copy link

phanlezz commented Dec 6, 2024

Hello @bitspittle,
I tried adding a delay(20) to all our prompts, but now the test sometimes get stuck indefinitely.

More specifically, on every of our Session.myPrompt([...]) I added, as you described, the delay before callback() inside the runUntilSignal block after section.

Most of the time it ran through, but in one of six runs it got stuck. (Usually the "Enter" key gets eaten, similar to the strings in the original issue).

Do you have an idea on how to continue?

edit:
I was not able to recreate the original example while using the delays, currently it just gets stuck.

@bitspittle
Copy link
Contributor

Nuts! That's disappointing to hear. You're using 1.2.0?

It's been a long time since we had this discussion last. Can you either share your project with me or share exact code so I don't have to piece together code from old comments?

@phanlezz
Copy link

phanlezz commented Dec 6, 2024

Nuts! That's disappointing to hear. You're using 1.2.0?

It's been a long time since we had this discussion last.

Yeah, only started yesterday looking into this branch again. It's kinda a mess. On this branch we are still using 1.1.3-SNAPSHOT, do you reckon the new version can help with this issue?

Can you either share your project with me or share exact code so I don't have to piece together code from old comments?

Sure, branch: https://github.com/MaibornWolff/codecharta/tree/tech/3483/integrate-kotter
Current test file: dialog, customPrompts

You need to disable caching in the gradle config and then spam build runs, then the error usually occurs rather quickly.

Sorry for all this extra work, I just started working on this issue.

@bitspittle
Copy link
Contributor

So I added delay(20) at this line

onInputEntered {
   /* ... */
}
delay(20) // <--------
onInputReady()

and I ran InquirerTest.kt and ParserDialogTest.kt a bunch of times (using the IntelliJ UI, so e.g. go to InquirerTest.kt and click on the "run" arrow next to the class InquirerTest line), and I can't hit any instances of things hanging.

Maybe you can share your exact repro steps? I guess you're using Gradle?

@phanlezz
Copy link

phanlezz commented Dec 6, 2024

Thank you again for looking into it :)
There are two important things:

  1. The Inquirer only provides the base templates for prompts. Those base prompts will then be used in multiple dialogs (e.g. ParserDialog.kt) to ask the user for the necessary information. The tests (InquirerTest), that directly cover the prompts, never failed iirc.
  2. The problem has a way higher chance of occurring when the whole project is being build. (./gradlew clean build in analysis/) It seems like a high CPU load is required to reproduce it. ( If I remember correctly I only got it to crash with the dialog tests directly one or two times )

You need to run the tests for the dialog dialog test file, as they have multiple inputs back to back, and those tests are what we are trying to fix.

with "to run the tests for the dialog" I mean you need to build the whole project which includes those tests, because the Dialog tests alone do not stress the CPU enough.

So step-by-step:

  1. Disable gradle caching, reload gradle model, clean cache manually if necessary
  2. Build the project back to back

I'm also using Intellij, the build and clean task in the side bar are usually enough.

You can dm me on discord if you want to, just joined your server

@bitspittle
Copy link
Contributor

I'm getting ktlint failures

> Task :filter:StructureModifier:ktlintMainSourceSetCheck FAILED
/Users/d9n/tmp/codecharta/analysis/filter/StructureModifier/src/main/kotlin/de/maibornwolff/codecharta/filter/structuremodifier/ParserDialog.kt:25:13 Newline expected after opening parenthesis
/Users/d9n/tmp/codecharta/analysis/filter/StructureModifier/src/main/kotlin/de/maibornwolff/codecharta/filter/structuremodifier/ParserDialog.kt:26:13 Parameter should start on a newline

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

Try ktlintFormat under formatting. I'm so sorry, this branch is a mess 😢
grafik

@bitspittle
Copy link
Contributor

AFK for a few hours, will follow up later!

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

AFK for a few hours, will follow up later!

Okay 👍 I will go to sleep now, its 1 AM here. I will check back tomorrow. I just pushed the formatting changes and the delay as discussed previously. Should be easier to test now

@bitspittle
Copy link
Contributor

bitspittle commented Dec 7, 2024

To reproduce pretty quickly...

  • In ParserDialogTest.kt
  • Add this infinite test:
    @Test
    fun `run until frozen`() {
        var i = 0
        while (true) {
            println("Test #${i++}...")
            `should prompt user twice for input file when first input file is invalid`()
        }
    }
  • See how far you get before things freeze. (Not very in my experience)

Note that this freezes even without any delay. Hmmmmm...

@bitspittle
Copy link
Contributor

bitspittle commented Dec 7, 2024

I've going to have to stop for a while here to cook dinner, but it looks like the issue may be something I didn't expect. Basically this code:

                actionCallback = {
                    terminal.press(Keys.DOWN)
                    terminal.press(Keys.DOWN)
                    terminal.press(Keys.DOWN)
                    terminal.press(Keys.ENTER)
                },

must be sometimes dropping a key press? Because sometimes it ends up only selecting "Move nodes" and not "Remove nodes" (implying one of the "down" presses was lost). Then, because you didn't register any callback for move nodes, your test hangs.

@bitspittle
Copy link
Contributor

bitspittle commented Dec 7, 2024

Oh wait, there are a BUNCH of onInputReady calls in Inquirer.kt. Search for ALL of them and add a delay(20) before them.

@bitspittle
Copy link
Contributor

Totally unrelated to the bug, here's two optional suggestions which you can ignore:

  1. Add support for HOME and END keys
onKeyPressed {
    when (key) {
        Keys.UP ->
        if (selection > 0) {
        selection -= 1
        }
        Keys.DOWN ->
        if (selection < choices.size - 1) {
        selection += 1
        }
        Keys.HOME -> {
            selection = 0
        }
        Keys.END -> {
            selection = choices.size - 1
        }
        Keys.ENTER -> {
        result = choices[selection]
        signal()
        }
    }
}
  1. Use the lastIndex extension property for readability:
// BEFORE
if (selection < choices.size - 1) {
selection += 1
}

// AFTER
if (selection < choices.lastIndex) {
selection += 1
}

@bitspittle
Copy link
Contributor

BTW I was going to recommend upgrading to 2.0.0 which should have been identical (although some extension method imports had changed, still a pretty trivial fix), but now a test is failing that wasn't in 1.1.3. Until I figure it out, I'd stick with 1.1.3...

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

Oh wait, there are a BUNCH of onInputReady calls in Inquirer.kt. Search for ALL of them and add a delay(20) before them.

I think the version I pushed eight hours included them already. But maybe I missed one...

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

To reproduce pretty quickly...

* In `ParserDialogTest.kt`

* Add this infinite test:
    @Test
    fun `run until frozen`() {
        var i = 0
        while (true) {
            println("Test #${i++}...")
            `should prompt user twice for input file when first input file is invalid`()
        }
    }
* See how far you get before things freeze. (Not very in my experience)

I will test that.

Note that this freezes even without any delay. Hmmmmm...

I thought the delay should help with the freezes and dropped input? Or am I misunderstanding something here 😲

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

The infinite test doesn't really stop for me (commit d063c96d8554522396af67eb7c61173f5e03453f)
grafik

(Edit: Which is a good thing, with 1000 successful runs, the test would definitely be safe enough)

@bitspittle
Copy link
Contributor

I thought the delay should help with the freezes and dropped input? Or am I misunderstanding something here

So when I wrote this comment, I actually didn't have delays everywhere yet. I only had them in one place, without realizing that the pattern was duplicated across many functions and I had to add them everywhere. Once I did, all your tests passed for me.

The infinite test doesn't really stop for me

In my case, once the infinite test ran forever, that should mean all tests will pass. You're not seeing that?

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

grafik

No, sadly even with a two minute time out, the tests sometimes gets stuck, if I build the complete project. The infinite test is not similar enough to a complete project build (./gradlew clean build).

Edit: It failed after three successful runs

Don't mind the repetition, it was something I also tried

@bitspittle
Copy link
Contributor

bitspittle commented Dec 7, 2024

I think what you may want to do is change all your default callbacks in all methods that use input from {} to { press(ENTER) }

Your tests will still fail but they shouldn't hang anymore and maybe you'll get a more meaningful error message.

I'm deep diving into the 1.1.3 vs 1.2.0 bug. It's very hard to understand and debug so far, and I see the different output but I have no idea what the cause is! I'm a little worried 1.2.0 has some issue that got introduced somehow, so I'm prioritizing it at the moment.

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

I think what you may want to do is change all your default callbacks in all methods that use input from {} to { press(ENTER) }

Your tests will still fail but they shouldn't hang anymore and maybe you'll get a more meaningful error message.

Sorry I'm not sure if i understand that correctly, can you expand on this? 🤔

I'm deep diving into the 1.1.3 vs 1.2.0 bug. It's very hard to understand and debug so far, and I see the different output but I have no idea what the cause is! I'm a little worried 1.2.0 has some issue that got introduced somehow, so I'm prioritizing it at the moment.

No worries, that's very understandable.

@bitspittle
Copy link
Contributor

bitspittle commented Dec 7, 2024

Sorry I'm not sure if i understand that correctly, can you expand on this? 🤔

Ah, it's not as straightforward as I first thought. But maybe something like this:

// Before
internal fun Session.myCollectParserArgs(
fileCallback: suspend RunScope.() -> Unit = {},
actionCallback: suspend RunScope.() -> Unit = {},
printCallback: suspend RunScope.() -> Unit = {},
setRootCallback: suspend RunScope.() -> Unit = {},
moveFromCallback: suspend RunScope.() -> Unit = {},
moveToCallback: suspend RunScope.() -> Unit = {},
removeNodesCallback: suspend RunScope.() -> Unit = {},
outFileCallback: suspend RunScope.() -> Unit = {},
): List<String> {

// After
internal fun Session.myCollectParserArgs(
defaultCallback: suspend RunScope.() -> Unit = {},
fileCallback: suspend RunScope.() -> Unit = defaultCallback,
actionCallback: suspend RunScope.() -> Unit = defaultCallback,
printCallback: suspend RunScope.() -> Unit = defaultCallback,
setRootCallback: suspend RunScope.() -> Unit = defaultCallback,
moveFromCallback: suspend RunScope.() -> Unit = defaultCallback,
moveToCallback: suspend RunScope.() -> Unit = defaultCallback,
removeNodesCallback: suspend RunScope.() -> Unit = defaultCallback,
outFileCallback: suspend RunScope.() -> Unit = defaultCallback,
): List<String> {

Then, in test code:

session.myCollectParserArgs(
   defaultCallback = { terminal.press(ENTER) },
   /* ... */
)

That way, if an unexpected option is ever chosen (due to a dropped keypress), you won't end up on an option that just does nothing.

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

I will check back tomorrow, thank you for your time :)

@bitspittle
Copy link
Contributor

(FYI you have a debug println in myPromptConfirm. Also, you've said multiple times you feel bad about the state of the branch, but honestly, your code has been just fine and easy to navigate. So I hope when I leave this comment, it doesn't make you feel bad! At first I assumed I had left the println in kotter actually!)

@bitspittle
Copy link
Contributor

It actually got stuck again, 😲 grafik

    @RepeatedTest(value = REPS)
    fun `should output correct arguments when extract path is selected`() {
        // given
        val pathToBeExtracted = "/root/src/main"
        val outputFileName = "output"

        testSession { terminal ->
            val parserArguments =
            myCollectParserArgs(
            defaultCallback = { terminal.press(Keys.ENTER) },
                fileCallback = {
                    terminal.type(sampleProjectPath.toString())
                    terminal.press(Keys.ENTER)
                },
                actionCallback = {
                    terminal.press(Keys.DOWN)
                    terminal.press(Keys.ENTER)
                },
                setRootCallback = {
                    terminal.type(pathToBeExtracted)
                    terminal.press(Keys.ENTER)
                },
                outFileCallback = {
                    terminal.type(outputFileName)
                    terminal.press(Keys.ENTER)
                },
            )
 internal fun Session.myCollectParserArgs(
        defaultCallback: suspend RunScope.() -> Unit = {},
        fileCallback: suspend RunScope.() -> Unit = defaultCallback,
        actionCallback: suspend RunScope.() -> Unit = defaultCallback,
        printCallback: suspend RunScope.() -> Unit = defaultCallback,
        setRootCallback: suspend RunScope.() -> Unit = defaultCallback,
        moveFromCallback: suspend RunScope.() -> Unit = defaultCallback,
        moveToCallback: suspend RunScope.() -> Unit = defaultCallback,
        removeNodesCallback: suspend RunScope.() -> Unit = defaultCallback,
        outFileCallback: suspend RunScope.() -> Unit = defaultCallback,
        ): List<String> {

Nuts! I guess the enter key press got dropped. I bet this is just a really really nasty timing issue / race condition, especially between one section getting torn down and a new one starting.

There's gotta be a way to guarantee that the input callback won't get triggered too early but I'm not seeing it yet. I'll sleep on it.

@bitspittle
Copy link
Contributor

Here's a test I think is a fairly concise way to break input. I usually see a failure within 20-30 tests.

@Test
fun `input should not break across multiple sections`() = testSession { terminal ->
    var testRunCount = 1
    while (true) {
        println("Test run #$testRunCount...")
        terminal.clear()

        val numCharsToType = testRunCount % 20
        section {
            input()
        }.runUntilInputEntered {
            // delay(20) // uncomment me and test stops crashing
            terminal.type(testRunCount.toString())
            terminal.type(": ")
            for (i in 0 until numCharsToType) terminal.type('a')
            terminal.press(Keys.ENTER)
        }

        terminal.assertMatches {
            text("${testRunCount}: ${"a".repeat(numCharsToType)} ")
        }

        testRunCount++
    }
}

@bitspittle
Copy link
Contributor

Heads up, I'm narrowing down on a fix. I already have something locally that works already, but I'm seeing if I can get a better solution in the next hour or two. If not, I'll go back to my first approach.

Ideally, once in, your tests should just work without needing any more delay tricks.

bitspittle added a commit that referenced this issue Dec 9, 2024
... instead of the section lifecycle.

This is unfortunate but eliminates a very, very difficult timing
bug that occurs because the base flow (which converts system input
into Kotter `Key` instances) is always running while the consuming
flows are started and stopped per section, which due to factors out
of our control, can result with their handlers taking milliseconds to
connect, at which point any test that creates multiple sections that
all have inputs in them have a chance to fail.

Ideally we want to make all the input jobs tied to the Section
lifecycle (so if a section doesn't do any work with input, then
no resources will be used), but this will still have the tricky
timing issues mentioned above, so it may not be trivial.

Fixes #114
bitspittle added a commit that referenced this issue Dec 9, 2024
... instead of the section lifecycle.

This is unfortunate but eliminates a very, very difficult timing
bug that occurs because the base flow (which converts system input
into Kotter `Key` instances) is always running while the consuming
flows are started and stopped per section, which due to factors out
of our control, can result with their handlers taking milliseconds to
connect, at which point any test that creates multiple sections that
all have inputs in them have a chance to fail.

Ideally we want to make all the input jobs tied to the Section
lifecycle (so if a section doesn't do any work with input, then
no resources will be used), but this will still have the tricky
timing issues mentioned above, so it may not be trivial.

Fixes #114
@bitspittle
Copy link
Contributor

OK @phanlezz , if you set your kotter version to 1.2.1-SNAPSHOT (and apply the patch from this comment), your tests should start passing, with no delay required anymore (so please delete those lines).

Let me know if you are (or are not) seeing that!

If you already set your version to 1.2.1-SNAPSHOT recently, note that it normally takes like 24 hours before Gradle will check again. You should be able to tell by seeing kotter artifacts get downloaded when you run something like ./gradle test or ./gradlew build.


At the moment, I needed to keep the "worse" solution mentioned in the comment above.... which is not that terrible, really. Basically, I expanded the lifecycle of all input processing to the whole session. So any section that uses any input functionality will essentially spin stuff up that stays alive for the whole program. This isn't that bad.

Still, I feel like I got SO CLOSE with the version which actually keeps the lifecycles of input logic tied to the sections that use them, but there are so many moving parts that I could never come up with something that didn't leave this or that edge case broken. I may try again tomorrow after sleeping on things a bit more. Coroutine flows are awesome but also a nightmare :)

@phanlezz
Copy link

phanlezz commented Dec 9, 2024

Sorry for the late response, i was pretty busy on the weekend/sunday.
I will check out all your comments now :)

@phanlezz
Copy link

phanlezz commented Dec 9, 2024

BTW, you may find this convenient. Without disabling caching, you can rerun all tests:

$ ./gradlew test --rerun-tasks

Oh that sounds good, I think we disabled caching, because errors which happen during high CPU situations would not show up, till an other person tested it. We did not find a good way to actually rebuild everything, so this is good to know. We might consider re-enabling it now...

(FYI you have a debug println in myPromptConfirm. Also, you've said multiple times you feel bad about the state of the branch, but honestly, your code has been just fine and easy to navigate. So I hope when I leave this comment, it doesn't make you feel bad! At first I assumed I had left the println in kotter actually!)

Nice catch :) Thank you for the kind words. I just felt bad, because I asked for help, and the branch was not in a directly runnable state.

OK @phanlezz , if you set your kotter version to 1.2.1-SNAPSHOT (and apply the patch from this comment), your tests should start passing, with no delay required anymore (so please delete those lines).

Let me know if you are (or are not) seeing that!

If you already set your version to 1.2.1-SNAPSHOT recently, note that it normally takes like 24 hours before Gradle will check again. You should be able to tell by seeing kotter artifacts get downloaded when you run something like ./gradle test or ./gradlew build.

At the moment, I needed to keep the "worse" solution mentioned in the comment above.... which is not that terrible, really. Basically, I expanded the lifecycle of all input processing to the whole session. So any section that uses any input functionality will essentially spin stuff up that stays alive for the whole program. This isn't that bad.

I will try out the snapshot with the patch and will report back the results!

Still, I feel like I got SO CLOSE with the version which actually keeps the lifecycles of input logic tied to the sections that use them, but there are so many moving parts that I could never come up with something that didn't leave this or that edge case broken. I may try again tomorrow after sleeping on things a bit more. Coroutine flows are awesome but also a nightmare :)

Thanks for your work!

@phanlezz
Copy link

phanlezz commented Dec 9, 2024

Now some testing. I added your test (temporarily), limited to 1k iterations, to InquirerTest.kt
#114 (comment)

With 1.1.3-SNAPSHOT this test got stuck isolated with no delay, and with delay after 3 full build runs

I tested the snapshot with the patch you provided, but the dialog still got stuck 🤯
grafik

Hopefully, I missed something and its an easy fix. I pushed my latest state to the branch, so that you can check it out.
Even isolated the tests now get stuck randomly. It gets stuck more often than not.

The test you provided does not get stuck
Different tests from the ParserDialogTest get stuck, it is not the same one every time, but should output correct arguments when print structure is selected seems to have a higher failure rate. When I only execute the ParserDialogTest the change of getting stuck seems to be around 90%. If I disable the aforementioned test the rate drops to ~30%.

@bitspittle
Copy link
Contributor

I looked up a bunch of stuff about structured concurrency and I have some ideas to try today.

After sleeping on it, I think what is happening is that test code is moving so fast that we end up requesting a thread from the I/O thread pool before previous tests/ sections have finished cleaning theirs up. If correct, this might be an easy fix, but no promises

bitspittle added a commit that referenced this issue Dec 10, 2024
This is unfortunate but eliminates a very, very difficult timing
bug that occurs because the base flow (which converts system input
into Kotter `Key` instances) is always running while the consuming
flows are started and stopped per section, which due to factors out
of our control, can result with their handlers taking milliseconds to
connect, at which point any test that creates multiple sections that
all have inputs in them have a chance to fail.

Another approach might be to tie `KeyFlowKey` to the Section lifecycle
instead, as ....

Fixes #114
bitspittle added a commit that referenced this issue Dec 10, 2024
This is unfortunate but eliminates a very, very difficult timing
bug that occurs because the base flow (which converts system input
into Kotter `Key` instances) is always running while the consuming
flows are started and stopped per section, which due to factors out
of our control, can result with their handlers taking milliseconds to
connect, at which point any test that creates multiple sections that
all have inputs in them have a chance to fail.

Another approach might be to tie `KeyFlowKey` to the Section lifecycle
instead, as ....

Fixes #114
bitspittle added a commit that referenced this issue Dec 10, 2024
This now means that if a section doesn't use any input
machinery, that the user won't pay the price for it.

This was a very complicated change, which should be 100%
invisible to users. It all exists in service of letting
developers write tests that aren't flaky.

Before this change, calling `terminal.type` inside your
run block had a chance of happening so fast that those
keypresses would get dropped because the input machinery
wasn't hooked up yet.

Now, we have hooks in place that notify us when the input
machinery is hooked up, and we block progress in various
places to make sure that by the time the user starts typing
some stuff, we'll be ready for them, even in tests.

Fixes #114
@bitspittle
Copy link
Contributor

bitspittle commented Dec 10, 2024

OK @phanlezz try the latest 1.2.1-SNAPSHOT and let me know if that works! Today was insane, a deep dive into hell and back.

Local testing on my end seems good so far, but of course feel free to confirm or deny.

The current version should be a lot more robust / correct, and I was able to get things working where lifecycles for input logic are now only tied to the sections that use it. However, now that I know a bit more, I think I can do better, so I'm going to keep poking at this a bit tomorrow.

@phanlezz
Copy link

phanlezz commented Dec 10, 2024

OK @phanlezz try the latest 1.2.1-SNAPSHOT and let me know if that works! Today was insane, a deep dive into hell and back.

Local testing on my end seems good so far, but of course feel free to confirm or deny.

The current version should be a lot more robust / correct, and I was able to get things working where lifecycles for input logic are now only tied to the sections that use it. However, now that I know a bit more, I think I can do better, so I'm going to keep poking at this a bit tomorrow.

Sounds awesome! I will start the testing now. Should I swap the inmemory versions back to the other ones?

Edit: The previous function doesn't exist anymore, solving the question 😄

@phanlezz
Copy link

grafik

Good news: The ParserDialogTest.kt runs in isolation pretty good. I was not able to crash it.
Sadly, when running all tests, the ParserDialogTest.kt sometimes fails. ~ Once every five runs?
It timed out again, with a 10 second timeout.

I'm not sure if this is any helpful:

   Suppressed: java.lang.InterruptedException
   	at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:94)
   	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:69)
   	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
   	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:48)
   	at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
   	at com.varabyte.kotter.platform.concurrent.locks.ReentrantReadWriteLock$ReaderLock.lock(ReentrantReadWriteLock.kt:44)
   	at com.varabyte.kotter.runtime.concurrent.ConcurrentScopedData.isActive(ConcurrentScopedData.kt:402)
   	at com.varabyte.kotter.runtime.Session.assertNoActiveSections$kotter(Session.kt:49)
   	at com.varabyte.kotter.foundation.SessionKt.session(Session.kt:83)
   	at com.varabyte.kotter.foundation.SessionKt.session$default(Session.kt:67)
   	at com.varabyte.kotterx.test.foundation.TestSessionKt.testSession(TestSession.kt:36)
   	at com.varabyte.kotterx.test.foundation.TestSessionKt.testSession$default(TestSession.kt:30)
   	at de.maibornwolff.codecharta.filter.structuremodifier.ParserDialogTest.should output correct arguments when print structure is selected(ParserDialogTest.kt:29)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
   	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
   	at org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
   	... 77 more

bitspittle added a commit that referenced this issue Dec 10, 2024
Before, we tried to let various jobs ask nicely if the upstream flow
would wait for them to be ready.

However, it turns out it is easier (and more robust) just to block
progress until these jobs finish connecting to the flows before
continuing.

For a concrete example, here is a test that used to fail:

```
section { }.runUntilSignal {
  onKeyPressed {
    if (key == Keys.Q) signal()
  }
  terminal.press(Keys.Q)
}
```

The reason this would fail is often the `terminal.press(Keys.Q)` line would
get called SO QUICKLY and at JUST SUCH A TIME that its key would get processed
when the upstream flow was ready but the downstream "key processing job" was
not.

Now, what happens is `onKeyPressed` simply blocks, inperceptibly to the human
eye, so that by the time `terminal.press(Keys.Q)` is called, everything is
guaranteed to be hooked up and ready to go.

Fixes #114
@bitspittle
Copy link
Contributor

Alright my friend @phanlezz, I have now ate, drank, and slept in this code for three days, and I think I finally got it.

After exploring probably a dozen of intermediate approaches that never saw the light of day, I think I have something that is both very robust and relatively simple (probably not a coincidence).

I have run tests on your code a bunch locally now, as well as my own, and so far it is holding up. I hope you'll be able to confirm the same on your end. We shall see!

See also:
https://bsky.app/profile/bitspittle.bsky.social/post/3lcy4vycluc2a

P.S. Hopefully you will download 1.2.1-SNAPSHOT automatically since it's been ~24 hours, but if you ever need to force it, here is some advice I got from ChatGPT (which looks right but you'll have to confirm)

@phanlezz
Copy link

I didn't get notified that you answered, GitHub swallowed the notification. Sorry for letting you wait.
I will test it right now and will report back! Regarding the SNAPSHOTs, I usually deleted the .gradle cache and re-downloaded everything, and then compared the md5sum of the *.jar file before and after to make sure that they are different. Not sure if this is reliable at all, but it worked in the past 😄 I will test the options given by gpt

@phanlezz
Copy link

phanlezz commented Dec 11, 2024

I ... I don't really know how to say it, but it failed again 😢
After 39 successful runs the ParserDialogTest got stuck again, with a 60 second timeout.
grafik

And after another 19 successful runs an assertion in InquirerTest failed with output (img below):

grafik

I think we reached a point where it runs good enough, and I can continue with the integration. I need to check if those errors also happen during our GitHub workflows.

I noticed a strange behavior with the run time of one of the dialog tests. should output correct arguments when print structure is selected() usually take 9 times longer than the second longest. (9 s vs 1 s). I haven't looked into it at all, because it don't have much time, so it could be that that is just some weird difference in implementation on our side.

I triple checked to make sure that if have the correct snapshot version, and the md5sum from the repository matches the md5sum of my local jar, that gradle used. Maybe it is just a weird behavior on my side, if you can recreate it... ? 🤔

@bitspittle
Copy link
Contributor

What is "auto run"? I'd like to run it myself to see if I can reproduce locally. Also, did you see which test exactly failed in ParserDialogTest?

I woke up this morning and realized there's one more place I can probably simplify. If it works it might be the place the input event is getting missed.

Or maybe we're running into a system issue where you're running out of I/O threads due to how hard you are hammering the system with tests (especially if things are running massively in parallel)? Not likely but maybe worth mentioning.

@phanlezz
Copy link

What is "auto run"? I'd like to run it myself to see if I can reproduce locally. Also, did you see which test exactly failed in ParserDialogTest?

autorun.sh

#!/bin/bash

counter=1
correctRuns=0

while [ $counter -le 20 ];
do
  echo "Iteration:" $counter
  returnCode="1"
  ./gradlew test --rerun-tasks -q
  returnCode="$?"

  if [ $returnCode = "0" ]; then
    correctRuns=$((correctRuns+1))
  else
   # replace with echo if test should continue on failure
    exit
  fi

  counter=$((counter+1))
done

echo "Correct runs:" $correctRuns

Sadly, I didn't save every log, but I have those two here:

  • java.util.concurrent.TimeoutException: should output correct arguments when print structure is selected() timed out after 10 seconds
  • java.util.concurrent.TimeoutException: should output correct arguments when move nodes is selected() timed out after 60 seconds

_> I increased the timeout after the first exit, as I realized that 10 s might be a little too short.

I woke up this morning and realized there's one more place I can probably simplify. If it works it might be the place the input event is getting missed.

👍

Or maybe we're running into a system issue where you're running out of I/O threads due to how hard you are hammering the system with tests (especially if things are running massively in parallel)? Not likely but maybe worth mentioning.

It could definitely be something like that. That's why suggested that maybe it is fine as is, because I/O stuff should not happen on GitHub. I mentioned a few times, that is seems like a CPU busy thing to me, as gradle spawns 1 (?) worker per thread and, although the Dialog test itself runs in sequence, many other tests are executed simultaneously.

@bitspittle
Copy link
Contributor

New 1.2.1-SNAPSHOT is published. I simplifed kotter a bit further. (I had introduced an intermediate buffer at one point while flailing, and I realized this morning with the final approach I went with that I didn't need it anymore). I'm not sure I can go any simpler at this point.

Thank you for the autorun script. I did a quick test run with two iterations and looks like it's working. I'll give it a try on my machine with 20 iterations when I eat lunch today.

I mentioned a few times, that is seems like a CPU busy thing to me, as gradle spawns 1 (?) worker per thread and, although the Dialog test itself runs in sequence, many other tests are executed simultaneously.

Yeah, it could be something like that! The current test timeout defaults are chosen assuming a fairly normal machine under normal load. It would be interesting to tell the difference between a super delayed run because the system is still just waiting for a worker on the I/O thread pool to get its chance to run or if I have a really sensitive race condition at this point, and my input system is truly left waiting forever on a concurrency latch that will never open.

Either way, the system is in WAY bettter shape than it was in 1.2.0, so thank you!

@bitspittle
Copy link
Contributor

BTW @phanlezz are you the same user as Nereboss? I'd like to share thanks in the release notes when I publish 1.2.1, and I'm wondering if I should just be calling out your account or both.

@bitspittle
Copy link
Contributor

java.util.concurrent.TimeoutException: should output correct arguments when print structure is selected() timed out after 10 seconds
java.util.concurrent.TimeoutException: should output correct arguments when move nodes is selected() timed out after 60 seconds

I just ran both these tests in a loop, 1000 times each, without issue. So I'm going to assume any timeout that occurs after the latest snapshot will be related to thread starvation.

@phanlezz
Copy link

BTW @phanlezz are you the same user as Nereboss? I'd like to share thanks in the release notes when I publish 1.2.1, and I'm wondering if I should just be calling out your account or both.

We are different persons :) You can mention us both.

@phanlezz
Copy link

phanlezz commented Dec 11, 2024

java.util.concurrent.TimeoutException: should output correct arguments when print structure is selected() timed out after 10 seconds
java.util.concurrent.TimeoutException: should output correct arguments when move nodes is selected() timed out after 60 seconds

I just ran both these tests in a loop, 1000 times each, without issue. So I'm going to assume any timeout that occurs after the latest snapshot will be related to thread starvation.

Yeah, I don't think that the tests fail in isolation at all. In the current test suit there is still one test in InquirerTest.kt that runs 1k times iirc, and I let the infinite test run set with should output correct arguments when print structure is selected() till ~25k iterations.
Thread starvation might be the right term here :)

@bitspittle
Copy link
Contributor

bitspittle commented Dec 11, 2024

So I went off today for three hours and kickstarted an autorun. So sure enough it failed on the second iteration, of course it did :P

But not in Kotter? Once in ParseFileToTable.kt and a second time in ParseFolderToTable.kt

test.zip

This was with my local version of your code, so I ended up going back to tech/3483/integrate-kotter and pulling latest. Will test again shortly.

BTW, it's up to you, but I don't think you have to have input should not break across multiple sections in your tests. My own tests already it, so you can save yourself a small bit of testing time by removing it in your codebase, if you want.

Finally, I'll say it again in the release notes, but really, thanks so much for the codecharta codebase. Having a clone of it was really helpful to make progress on this issue.

@bitspittle
Copy link
Contributor

Got to iteration 17 this time. Failure was in: JavaSonarAnalyzerTest. commented out blocks of code is zero if issue is not found()

test.zip

So far no Kotter on my end :) (unless these libraries are somehow delegating to Kotter and I didn't realize it!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants