Proposed fix for potential ReDoS vulnerability in the sed lexer #2120
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While investigating the performance issue raised in Issue #2057 regarding sed syntax highlighting, I discovered what appears to be a ReDoS-like vulnerability.
The issue seems to result from a combination of factors, and I’ve identified the following two potential problems:
s
andy
commands appears to have a ReDoS-like vulnerability.:
ort
), Rouge currently requires a space between the command and the label, which is not necessary in sed.Issue with parsing
s
andy
commandsWhen
s
ory
commands are invalid, and multiple backslashes follow them, the number of backtracking operations in the regular expression increases exponentially. I believe the cause lies in theedot
regular expression defined inlib/rouge/lexers/sed.rb
.rouge/lib/rouge/lexers/sed.rb
Lines 73 to 79 in bf007b7
rouge/lib/rouge/lexers/sed.rb
Line 91 in bf007b7
The current edot is defined as
/\\.|./m
, where the\
character matches both sides of the or operator, causing backtracking to increase at an exponential rate when there are many backslashes (note that using the non-greedy*?
does not reduce the number of backtracking operations).If the number of backslashes is in the low 20s, it can be processed relatively quickly, but if there are more than 30, it seems to take a considerable amount of time.
A simple fix would be to change the definition of edot to
/\\.|[^\\]/m
. I tested this change, and it resulted in a significant reduction in processing time. Additionally, all other test cases passed successfully.Before the fix:
After the fix:
Issue with spaces after commands with labels
In sed, spaces after commands with labels like
:
ort
are optional. However, in Rouge, a space is currently required. As a result, in the reported issue, the:
command is not correctly recognized, and the label that follows it is mistakenly treated as part of the command. This causes the lexer to incorrectly parse it as an incompletes
command, leading to excessive processing time.rouge/lib/rouge/lexers/sed.rb
Lines 124 to 125 in bf007b7
This pull request does not address the optional space after label commands.