Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(common/core/web): error from early fat-finger termination due to OS interruptions #5479

Merged
merged 6 commits into from
Jul 27, 2021

Conversation

jahorton
Copy link
Contributor

@jahorton jahorton commented Jul 22, 2021

Fixes #5467.

No wonder it was an elusive little bugger. I knew something smelled like a race condition as I started investigating... turns out it's probably a race condition caused by the interaction of standard OS context-swapping (or garbage collection) with that of a timer based on the system clock. Hence why a direct repro is nigh-impossible to find or construct.

I should note that some of the Sentry issues do indicate occasions where they arose before 14.0.277, which was when #5352 landed. That said, the lion's share of the reports for the tagged issues are indeed since that release, which is why it's become as prominent now as it is.

So, while it's not exactly perfect to just give up immediately once control returns to the engine... at least we've noticed that the correction algorithm itself is fairly capable on its own. So, to keep things responsive, even though the delay is not in our algorithm but rather due to external pressure from the OS or the browser, we'll simply maintain the current logic, which thus bypasses fat-finger calculations as a result of the externally-caused delay.

User Testing

@keymanapp/testers

Much of this is taken from standard KMW acceptance testing and adapted toward the aspects of KMW modified by this PR, though we'll only worry about testing against a single platform here.

Platform: Chrome emulation / Android

Ensure that you see the "Console" tab while performing these tests.

  • TEST_CRITICAL: If any errors (red text entries in the "Console" area) or warnings appear during these tests, fail this test entry and report a screenshot of them.

Console tab:

image

It should resemble Windows' CMD prompt and the macOS Terminal.

Utilize the "Test unminified Keymanweb" testing page to ensure the following:

  • TEST_1: KMW properly handles input to the controls.
  • Attempt to add the following keyboards:
    • TEST_2: By keyboard name: sil_ipa. (For our currently-deprecated IPA keyboard.)
      • TEST_3: ŋ should be a subkey of n - ensure it works.
    • TEST_4: By language code: "km".
      • Note that if "km" doesn't return the khmer_angkor keyboard, you'll want to load that one by keyboard name for these tests.
      • TEST_5: Type in the following sequence: ស, ុ, ្រ (subkey of រ), ក. You should see ស្រុក.
      • TEST_6: Continuing from the last test, hit backspace in sequence. As this is a reorder (keyboard) rule test, you should see:
        • ស្រុ
        • ស្រ
    • TEST_7: By language name: spanish.

Utilize the "Prediction - robust testing" testing page for the following:

  • Swap to the "English - EuroLatin (SIL)" keyboard.
    • TEST_8: Use long-press . to output ' and then press e. The two characters should not combine.
    • TEST_9: Ensure that long-press p and SHIFT + long-press g displays properly and that the subkeys produce the expected output.
    • TEST_10: Type the following sequence, pressing near the center of the key each time:
      • L (shift layer)
      • k (default layer)
      • v (default layer)
      • Expected result: you should see suggestions for Love and Live, along with one other random suggestion.

Comment on lines 176 to 178
if(alternates.length == 0) {
alternates = undefined;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The true core of the fix. This PR's other changes are simply to promote robustness at other levels, just in case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(From our slack discussion)

  • I'm not entirely clear from the PR description what the root cause of the problem that we are fixing?
  • How do we get into the situation where alternates.length == 0?
  • Why is a zero length alternates array a problem?
  • Are there other code paths where we might need to do the same thing as you are doing here, or is the extra test in languageProcessor.ts that you've made in this PR sufficient to cover all cases? In which case, is this even needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root cause: if the OS pauses the thread evaluating the script between the time we start our timer (when we save TIMEOUT_THRESHOLD) and the first check against the timer, the JS thread could be "out of time" when the OS allows it to resume. So, from the JS thread's perspective... before a single iteration of the loop may occur, with near-zero thread-execution time.

Should that occur, we will have initialized the alternates array, but never been able to put anything within it. Hence, alternates.length == 0.

Why would that be a problem? Because that isn't possible to optimally handle within the lm-layer's worker thread. Other changes in this PR can at least prevent a crash, but it'll rob the predictive-text engine of all information about the current keystroke, resulting in poorer suggestions. Preventing it here allows us to at least preserve info about the actually-typed keystroke.

This is the only place where the .alternates property is set. Even then, the extra test there serves as a stop-gap to perform a similar prevention & fix check for any other paths that may exist in the future. (It's one of those "extra robustness checks" I referred to above.)

While we could just rely on the languageProcessor.ts check... I'd still like to handle it at the most likely 'source' of the error - the OS-interrupted timer.


While there are other possible ways to have a zero-length array without timing out early b/c of the OS, they're pretty niche and unlikely. For the sil_euro_latin keyboard, they're also impossible. (Requires missing rules and/or rules with BEEP.)

Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work on tracking this elusive little bug down! A few questions for my understanding...

Comment on lines 176 to 178
if(alternates.length == 0) {
alternates = undefined;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(From our slack discussion)

  • I'm not entirely clear from the PR description what the root cause of the problem that we are fixing?
  • How do we get into the situation where alternates.length == 0?
  • Why is a zero length alternates array a problem?
  • Are there other code paths where we might need to do the same thing as you are doing here, or is the extra test in languageProcessor.ts that you've made in this PR sufficient to cover all cases? In which case, is this even needed?

let transform = transcription.transform;
var promise = this.currentPromise = this.lmEngine.predict(transcription.alternates || transcription.transform, context);
var promise = this.currentPromise = this.lmEngine.predict(alternates || transcription.transform, context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternates will never be falsy per L326-332 above?

common/predictive-text/worker/model-compositor.ts Outdated Show resolved Hide resolved
Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending user testing

@jahorton
Copy link
Contributor Author

LGTM pending user testing

I, uh... since there's no easy repro for the original error, it's not super-clear exactly how to test this. Guess it's mostly a "make sure @jahorton didn't break anything"?

@jahorton jahorton changed the title fix(common/core/web): empty alternates array due to OS context swapping fix(common/core/web): error from early fat-finger termination due to OS interruptions Jul 22, 2021
@jahorton
Copy link
Contributor Author

Title changed to be slightly more user-readable. I mean, it's still not perfect, but at least it's a degree less technical.

@mcdurdin
Copy link
Member

I, uh... since there's no easy repro for the original error, it's not super-clear exactly how to test this. Guess it's mostly a "make sure @jahorton didn't break anything"?

Yeah, given I assume we will be porting this back to 14.0-stable, good idea just to do an acceptance test right? It is touching pretty core keystroke processing... I don't foresee any issues but then we never do, do we? 🤣

@jahorton
Copy link
Contributor Author

I, uh... since there's no easy repro for the original error, it's not super-clear exactly how to test this. Guess it's mostly a "make sure @jahorton didn't break anything"?

Yeah, given I assume we will be porting this back to 14.0-stable, good idea just to do an acceptance test right? It is touching pretty core keystroke processing... I don't foresee any issues but then we never do, do we? 🤣

In that case... maaaybe we delay to do a round of it whenever we feel like pushing up a new version? Acceptance testing for web is quite costly.

Though, given the nature of this specific case... we should be able to drop the cost by running it against a single platform. There's nothing platform-specific here.

@mcdurdin
Copy link
Member

In that case... maaaybe we delay to do a round of it whenever we feel like pushing up a new version? Acceptance testing for web is quite costly.

I think we want to push this out fairly soon given we are seeing many crash reports stemming from it.

Though, given the nature of this specific case... we should be able to drop the cost by running it against a single platform. There's nothing platform-specific here.

Agreed. Chrome on something?

@mcdurdin mcdurdin modified the milestones: A15S9, A15S10 Jul 26, 2021
@MakaraSok
Copy link
Collaborator

MakaraSok commented Jul 26, 2021

User Testing

@keymanapp/testers

Platform: Chrome emulation / Android

How the test was done:

  1. load the PR and build
  2. on Chrome 92.0.4515.107 (x64), in Developer mode, Pixel 2 was selected for the test on ~/keymanweb/testing/unminified.html and ~/keymanweb/testing/prediction-mtnt.html as indicated below.

  • TEST_CRITICAL: FAILED - 17 issues were shown in the Console area.

image


Utilize the "Test unminified Keymanweb" testing page to ensure the following:

  • TEST_1: PASSED - able to type as usual
  • Attempt to add the following keyboards:
    • TEST_2: PASSED - sil_ipa was added successfully
      • TEST_3: PASSED - ŋ is indeed a subkey of n and it does output the character
    • TEST_4: PASSED - khmer_angkor keyboard loaded correctly
      • TEST_5: FAILED - Type in the following sequence: ស, ុ, ្រ (subkey of រ), ក and no suggestion is given. (This test was done using Android Simulator off of Android Studio 4.2.2 running on macOS BigSur (Pixel 4, API 30, Android 11). The keyboard was installed from Keyman.com and the wordlist was installed manually from the package in this pr. Please let me know if this should be done with "Prediction - robust testing". It looks like there is no obvious way for one to enable predictive text for khmer_angkor.)
        image .
      • TEST_6: PASSED - the backspace does work as expected.
    • TEST_7: FAILED - language name: spanish cannot be found from the globe key.
    image

Utilize the "Prediction - robust testing" testing page for the following:

  • Swap to the "English - EuroLatin (SIL)" keyboard.
    • TEST_8: PASSED - long-press . to output ' and then press e, the two characters don't combine (per expectation).
    • TEST_9: PASSED - long-press p and SHIFT + long-press g displays properly; their subkeys do produce the expected output
    • TEST_10: PASSED - type Lkv pressing near the center of the key each time, the suggestions are Love and Live, along with one other random suggestion

@jahorton
Copy link
Contributor Author

jahorton commented Jul 26, 2021

@MakaraSok - Clarifications needed:

TEST_5: Your note there is not part of the test. That test PASSED. Those concerns may be valid, but that's for something completely separate and unrelated to this PR. (No part of the test said to attempt loading a lexical model with the keyboard.)

Secondly... that's a Keyman for Android screenshot. Not exactly Chrome emulation of an Android device (as in, outside of the Keyman app), but I'll let that slide - I'm not super-particular about the test environment.

TEST_7: Just to confirm - no Spanish keyboard popped up after attempting to add it using the third keyboard-adding option on the page, "Add a keyboard by language name(s)"? It won't be present by default, hence why it's in the "attempt to add" section of the tests.

@jahorton
Copy link
Contributor Author

Finally, TEST_CRITICAL: that's the "Issues" area, not the "Console" area. I did make sure to point out exactly which area I meant in that screenshot above. Those "issues" are fine and do not fail this test.

As there's only a yellow entry and no red entries in the actual main console area, this test also PASSES.

@MakaraSok
Copy link
Collaborator

Thanks a lot for the clarifications. TEST_5, TEST_7, and TEST_CRITICAL) are all PASSED.

TEST_5: Your note there is not part of the test. That test PASSED. Those concerns may be valid, but that's for something completely separate and unrelated to this PR. (No part of the test said to attempt loading a lexical model with the keyboard.)

Secondly... that's a Keyman for Android screenshot. Not exactly Chrome emulation of an Android device (as in, outside of the Keyman app), but I'll let that slide - I'm not super-particular about the test environment.

I've missed read the instructions. My fault.
Understood. Gonna read more carefully next time.

TEST_7: Just to confirm - no Spanish keyboard popped up after attempting to add it using the third keyboard-adding option on the page, "Add a keyboard by language name(s)"? It won't be present by default, hence why it's in the "attempt to add" section of the tests.

I now see the intention of the instructions vividly. The line should have read as "try and add spanish to the "Add a keyboard by language name(s)" field.
image

Finally, TEST_CRITICAL: that's the "Issues" area, not the "Console" area. I did make sure to point out exactly which area I meant in that screenshot above. Those "issues" are fine and do not fail this test.

As there's only a yellow entry and no red entries in the actual main console area, this test also PASSES.

The screenshot of the test result should have been focused on the tab:
image

The screenshot provided in the instructions was informative, but the issue shown there is only one, while there were dozen during testing; which is why the details of what was seen were reported because they were thought to be "warnings of some kinds" which may not be expected.

@jahorton
Copy link
Contributor Author

jahorton commented Jul 27, 2021

I now see the intention of the instructions vividly. The line should have read as "try and add spanish to the "Add a keyboard by language name(s)" field.

Good to know; I'll make note of this for the Web acceptance-testing instructions, which is where those items were sourced from.

The screenshot provided in the instructions was informative, but the issue shown there is only one, while there were dozen during testing; which is why the details of what was seen were reported because they were thought to be "warnings of some kinds" which may not be expected.

That's fair; I've noticed the items on the "issues" tab before and totally understand that. Unfortunately, they're unrelated, previously-noted, and not easy to resolve at this time. I actually tried pretty hard to have you ignore the "Issues" tab, but apparently didn't succeed.

@jahorton jahorton merged commit fc1251b into master Jul 27, 2021
@jahorton jahorton deleted the fix/common/core/web/empty-alternates branch July 27, 2021 01:42
@keyman-server
Copy link
Collaborator

Changes in this pull request will be available for download in Keyman version 15.0.89-alpha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug(common/models): Error in ModelCompositor.predict causes keyboard failure
4 participants