-
Notifications
You must be signed in to change notification settings - Fork 28
feat(heuristics): add Fake Email analyzer to validate maintainer email domain #1106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c6f35f7
to
8d29103
Compare
f945882
to
a7103e4
Compare
@@ -56,6 +56,11 @@ When a heuristic fails, with `HeuristicResult.FAIL`, then that is an indicator b | |||
- **Description**: Checks if the package name is suspiciously similar to any package name in a predefined list of popular packages. The similarity check incorporates the Jaro-Winkler distance and considers keyboard layout proximity to identify potential typosquatting. | |||
- **Rule**: Return `HeuristicResult.FAIL` if the similarity ratio between the package name and any popular package name meets or exceeds a defined threshold; otherwise, return `HeuristicResult.PASS`. | |||
- **Dependency**: None. | |||
|
|||
11. **Fake Email** | |||
- **Description**: Checks if the package maintainer or author has a suspicious or invalid email . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Description**: Checks if the package maintainer or author has a suspicious or invalid email . | |
- **Description**: Checks if the package maintainer or author has a suspicious or invalid email. |
|
||
11. **Fake Email** | ||
- **Description**: Checks if the package maintainer or author has a suspicious or invalid email . | ||
- **Rule**: Return `HeuristicResult.FAIL` if the email format is invalid or the email domain has no MX records ; otherwise, return `HeuristicResult.PASS`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Rule**: Return `HeuristicResult.FAIL` if the email format is invalid or the email domain has no MX records ; otherwise, return `HeuristicResult.PASS`. | |
- **Rule**: Return `HeuristicResult.FAIL` if the email format is invalid or the email domain has no MX records; otherwise, return `HeuristicResult.PASS`. |
depends_on=None, | ||
) | ||
|
||
def is_valid_email(self, email: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validating email addresses is a complex task. What we have here is more like a sanity check, verifying that the address is vaguely of the right format. That may be enough for the purpose of this check, in which case this method should be renamed and re-documented to make that clear. Alternatively, if we do really want to ensure that email addresses are valid, this method will need to be expanded considerably. @behnazh-w
See the top two answers on this stackoverflow thread for more information: https://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-an-email-address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of the above, I think the regex checking and dns resolution steps should be split into separate functions. This could also simplify the related tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative is to use the Python library email-validator
to more formally validate emails. It also uses dnspython
, but handles some of the more complicated validation aspects.
https://pypi.org/project/email-validator/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i used the email-validator approach
…l domains Signed-off-by: Amine <[email protected]>
Signed-off-by: Amine <[email protected]>
Signed-off-by: Amine <[email protected]>
…mail domain validation Signed-off-by: Amine <[email protected]>
759ab97
to
d99495c
Compare
Summary
This PR adds a new heuristic analyzer called
FakeEmailAnalyzer
. It verifies the validity of maintainer email addresses listed in a PyPI package by checking both the format and the existence of MX records for their domains. This helps detect packages with fake or throwaway emails, which are often indicative of malicious intent.Description of changes
FakeEmailAnalyzer
that:detect_malicious_metadata_check.py
to include and invoke this new analyzer.Related issues
None
Checklist
verified
label should appear next to all of your commits on GitHub.