-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed uri regex issue #3815
base: main
Are you sure you want to change the base?
fixed uri regex issue #3815
Conversation
pkg/detectors/uri/uri.go
Outdated
@@ -23,7 +23,7 @@ var _ detectors.Detector = (*Scanner)(nil) | |||
var _ detectors.CustomFalsePositiveChecker = (*Scanner)(nil) | |||
|
|||
var ( | |||
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\S]{3,50}:([\S]{3,50})@[-.%\w\/:]+\b`) | |||
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\w-\.]{3,50}:([\w-\.]{3,50})@[-.%\w\/:]+\b`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\S
matches any non-whitespace character, which is very broad. Instead, we are now using \w
, which matches [A-Za-z0-9_]
, and extending it by adding a few special characters to suit our needs.
pkg/detectors/uri/uri.go
Outdated
@@ -23,7 +23,7 @@ var _ detectors.Detector = (*Scanner)(nil) | |||
var _ detectors.CustomFalsePositiveChecker = (*Scanner)(nil) | |||
|
|||
var ( | |||
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\S]{3,50}:([\S]{3,50})@[-.%\w\/:]+\b`) | |||
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\w-\.]{3,50}:([\w-\.]{3,50})@[-.%\w\/:]+\b`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, passwords in URIs can consist of any non-whitespace characters. Using \w limits the match to a specific set of characters, which might exclude valid ones. Are you noticing any detection issues when using \S instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. http://username:p%[email protected]" Will not be detected when using \w
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, actually as the scenario mentioned in the linked issue, it matches some special chars like "
which it should not I guess. I can add more special characters like %$^...
in the set to expand the functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add some more possible special characters for password
441760e
to
546eb5c
Compare
pkg/detectors/uri/uri.go
Outdated
@@ -23,7 +23,7 @@ var _ detectors.Detector = (*Scanner)(nil) | |||
var _ detectors.CustomFalsePositiveChecker = (*Scanner)(nil) | |||
|
|||
var ( | |||
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\S]{3,50}:([\S]{3,50})@[-.%\w\/:]+\b`) | |||
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\w-\.]{3,50}:([\w-\.%$^&#@]{3,50})@[-.%\w\/:]+\b`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good source of truth is RFC 3986. An unencoded @
isn't a valid password character because it's the delimiter between userinfo
and host
; likewise, :
isn't a valid username character.
The right-hand side of @
is also problematic. e.g., %
and /
aren't valid characters for a host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I improved the regex and removed the invalid chars as you mentioned.
pkg/detectors/uri/uri_test.go
Outdated
{ | ||
name: "valid pattern - capture two outputs", | ||
input: fmt.Sprintf("%s token = '%s'", keyword, validPattern2), | ||
want: []string{"http://username:[email protected]", "http://username:[email protected]"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While not related to #3815, the results should be deduplicated.
want: []string{"http://username:[email protected]", "http://username:[email protected]"}, | |
want: []string{"http://username:[email protected]"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -23,7 +23,7 @@ var _ detectors.Detector = (*Scanner)(nil) | |||
var _ detectors.CustomFalsePositiveChecker = (*Scanner)(nil) | |||
|
|||
var ( | |||
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\S]{3,50}:([\S]{3,50})@[-.%\w\/:]+\b`) | |||
keyPat = regexp.MustCompile(`\b(?:https?:\/\/)?[\w-\.$~!]{3,50}:([\w-\.%$^&#]{3,50})@[-.\w]+\b`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing a large number of valid characters for usernames and passwords. The host pattern is also still fairly permissive and would match things that could never be valid, e.g. @----__-2as-2
.
Description:
This PR fixes github issue #3686
Checklist:
make test-community
)?make lint
this requires golangci-lint)?