-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very poor performance in some cases #5
Comments
So, I think part of the performance hit was actually (possibly?) related to: #6. I'm not sure what I did, but after some refactoring, I'm not getting such a massive performance drain when building errors? I still feel like somehow the number of alternations should be capped somehow 🤔 |
@msujew Okay, I've narrowed this down quite a bit, and I've discovered it's not just a performance issue upon errors, but in some particular parsing paths cause hugely degraded performance in this package. (I'll happily concede that it may actually be a mistake in the grammar, but the effect of that is that no errors / validation problems are thrown, and this occurs when a file is parsed correctly.) I've frozen a branch here. If you check it out and do a I've narrowed down to a single test-case. It's a Any idea what's going on here? |
I'm going to try to take a look tomorrow to see if I can set some breakpoints and do some logging and figure out what exactly is happening in that function that is taking 12 seconds. |
@matthew-dean I cannot reproduce the performance degradation you're experiencing (on a few years old Intel). I wrote a small test for the file in question: describe.only('can parse colors.less file quickly', () => {
console.time('colors.less');
const result = fs.readFileSync(__dirname + '/css/colors.less', 'utf8')
const { cst, lexerResult, parser } = cssParser.parse(result)
expect(lexerResult.errors.length).toBe(0)
expect(parser.errors.length).toBe(0)
console.timeEnd('colors.less');
}) And received:
Seems pretty slow, but that's mostly due to reading from disk and some warmup in the lookahead. The parser can probably be optimized quite a bit. Reading the file a second time right after the first call yields:
Note that I had to change the comment terminal for a successful parse to also tokenize -['comment', '\\/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*\\/'],
+['comment', '(\\/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*\\/|\\/\\/[^\\n\\r]*)'], |
@msujew That's the wrong parser. The CSS Parser doesn't demonstrate that slow-down. It was the Less Parser. That's why comments were not parsing in Here's what I've discovered so far. This package really grinds to a halt with recursion, because it will predict paths per alt that encompass the remainder of the file. Originally, I had this, which matches the CSS spec exactly: $.RULE('declarationList', () => {
$.OR({
DEF: [
{
ALT: () => {
$.OPTION(() => $.SUBRULE($.declaration))
$.OPTION2(() => {
$.CONSUME(T.Semi)
$.SUBRULE3($.declarationList)
})
}
},
{
ALT: () => {
$.SUBRULE($.innerAtRule)
$.SUBRULE($.declarationList)
}
},
{
ALT: () => {
$.SUBRULE($.qualifiedRule, { ARGS: [{ inner: true }] })
$.SUBRULE2($.declarationList)
}
}
]
})
}) This mirrors the railroad diagrams for a declaration list. This didn't cause a problem until extending into the Less Parser. Less / Sass allow identifiers for inner qualified rules, which means long-chain ambiguous rule starts ( The combination of those two things is what caused I intuited what this package was doing when I had logging turned on and the "ambiguous alternatives" warning was not only long, but contained all tokens for the remainder of the file, which it termed as prefixes. I re-tooled the above declarationList to look like this, which has the exact same parsing outcome, but eliminates recursion: $.RULE('declarationList', () => {
let needsSemi = false
$.MANY({
GATE: () => !needsSemi || (needsSemi && $.LA(1).tokenType === T.Semi),
DEF: () => {
$.OR([
{
ALT: () => {
$.SUBRULE($.declaration)
needsSemi = true
}
},
{
ALT: () => {
$.OR2([
{ ALT: () => $.SUBRULE($.innerAtRule) },
{ ALT: () => $.SUBRULE2($.qualifiedRule, { ARGS: [{ inner: true }] }) },
{ ALT: () => $.CONSUME(T.Semi) }
])
needsSemi = false
}
}
])
}
})
}) I guess I sort of mistakenly figured that this package would handle ambiguity, because it handles longer-path prediction, but anytime ambiguity happens, The good news is that with that refactoring, speed isn't as much as an issue. Parsing of I appreciate your help! |
I see, good to know 👍 Yeah, as mentioned in the other issue, having ambiguities present (especially those with long prefixes) in your grammar can be quite the performance bottleneck. While chevrotain-allstar fixes some of the ambiguity issues appearing in the LL(k) strategy, there are some true ambiguities that just cannot be resolved until they are fully evaluated. I assume with the recursion case, you actually hit the theoretical worse case boundary of the ALL(*) algorithm. Normally, ALL(*) is supposed to operate in O(n), but the worst case makes it explode into O(n^4). It's outlined in the original paper. |
This is really good to know! Thank you! |
One thing I've noticed when using this lookahead strategy is that while the successful parses are very fast, the failures are orders of magnitude slower than Chevrotain's default. I thought maybe it was during error logging, but I built a custom error message provider, and that had no effect.
In some of my debugging, I think this is because the alternations that are built can contain hundreds of entries by the time it's sent to
buildNoViableAltMessage
, whereas Chevrotain's default lookahead strategy might produce 10 for the sameOR
block.There's obviously no benefit for an error message handler to receive hundreds or thousands of possible paths as there's no error message that could include all of them that would be useful in that circumstance, so could that be scaled back to a max limit of paths? (Maybe configurable?)
Addendum
So, I've found another side effect of this problem which is backtracking. I know Chevrotain recommends against backtracking if at all possible, but because I had a custom error message provider, I kept seeing
buildNoViableAltMessage
called but then no error thrown. I figured out that error messages are built even if they aren't used. So, backtracking, which is already warned may have poor performance, is exacerbated greatly when using backtracking + chevrotain-allstar. It takes Chevrotain a long time to recover from a backtracking rule because the thread is locked by this package building expected paths.The text was updated successfully, but these errors were encountered: