Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed fix for potential ReDoS vulnerability in the sed lexer #2120

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yoshi389111
Copy link

@yoshi389111 yoshi389111 commented Mar 22, 2025

While investigating the performance issue raised in Issue #2057 regarding sed syntax highlighting, I discovered what appears to be a ReDoS-like vulnerability.

The issue seems to result from a combination of factors, and I’ve identified the following two potential problems:

  • The parsing of s and y commands appears to have a ReDoS-like vulnerability.
  • For commands with labels (such as : or t), Rouge currently requires a space between the command and the label, which is not necessary in sed.

Issue with parsing s and y commands

When s or y commands are invalid, and multiple backslashes follow them, the number of backtracking operations in the regular expression increases exponentially. I believe the cause lies in the edot regular expression defined in lib/rouge/lexers/sed.rb.

edot = /\\.|./m
state :command do
mixin :whitespace
# subst and transliteration
rule %r/(s)(.)(#{edot}*?)(\2)(#{edot}*?)(\2)/m do |m|

rule %r/(y)(.)(#{edot}*?)(\2)(#{edot}*?)(\2)/m do |m|

The current edot is defined as /\\.|./m, where the \ character matches both sides of the or operator, causing backtracking to increase at an exponential rate when there are many backslashes (note that using the non-greedy *? does not reduce the number of backtracking operations).

If the number of backslashes is in the low 20s, it can be processed relatively quickly, but if there are more than 30, it seems to take a considerable amount of time.

A simple fix would be to change the definition of edot to /\\.|[^\\]/m. I tested this change, and it resulted in a significant reduction in processing time. Additionally, all other test cases passed successfully.

Before the fix:

$ time (echo "s/$(printf '\\a%.0s' $(seq 1 30))" | bin/rougify highlight -l sed)
s/\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a

real    1m27.914s
user    1m27.878s
sys     0m0.069s

After the fix:

$ time (echo "s/$(printf '\\a%.0s' $(seq 1 30))" | bin/rougify highlight -l sed)
s/\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a

real    0m0.422s
user    0m0.333s
sys     0m0.054s
$ time (echo "s/$(printf '\\a%.0s' $(seq 1 300))" | bin/rougify highlight -l sed)
s/\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a

real    0m0.440s
user    0m0.380s
sys     0m0.055s

Issue with spaces after commands with labels

In sed, spaces after commands with labels like : or t are optional. However, in Rouge, a space is currently required. As a result, in the reported issue, the : command is not correctly recognized, and the label that follows it is mistakenly treated as part of the command. This causes the lexer to incorrectly parse it as an incomplete s command, leading to excessive processing time.

# commands that take a label argument
rule %r/([:btT])(\s+)(\S+)/ do

This pull request does not address the optional space after label commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some sed code take very long time to be highlighted
1 participant