Skip to content

feat(heuristics): add Fake Email analyzer to validate maintainer email domain #1106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

AmineRaouane
Copy link
Member

@AmineRaouane AmineRaouane commented Jun 16, 2025

Summary

This PR adds a new heuristic analyzer called FakeEmailAnalyzer. It verifies the validity of maintainer email addresses listed in a PyPI package by checking both the format and the existence of MX records for their domains. This helps detect packages with fake or throwaway emails, which are often indicative of malicious intent.

Description of changes

  • Implemented FakeEmailAnalyzer that:
    • Validates email format using a regex.
    • Verifies the existence of MX records for the email domain via DNS resolution.
  • Updated detect_malicious_metadata_check.py to include and invoke this new analyzer.
  • The analyzer handles DNS errors and skips analysis if no email is present.
  • The logical reason for combining quickUndetailed with a failed(Heuristics.FAKE_EMAIL.value) is that a package that is rushed onto a platform by someone using a fake email address points to an actor who may be trying to quickly distribute a package while obscuring their identity and avoiding being investigated.

Related issues

None

Checklist

  • I have reviewed the contribution guide.
  • My PR title and commits follow the Conventional Commits convention.
  • My commits include the "Signed-off-by" line.
  • I have signed my commits following the instructions provided by GitHub. Note that we run GitHub's commit verification tool to check the commit signatures. A green verified label should appear next to all of your commits on GitHub.
  • I have updated the relevant documentation, if applicable.
  • I have tested my changes and verified they work as expected.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jun 16, 2025
@AmineRaouane AmineRaouane force-pushed the fake-emails-heuristic branch from c6f35f7 to 8d29103 Compare June 16, 2025 21:26
@AmineRaouane AmineRaouane force-pushed the fake-emails-heuristic branch 5 times, most recently from f945882 to a7103e4 Compare July 5, 2025 18:59
@@ -56,6 +56,11 @@ When a heuristic fails, with `HeuristicResult.FAIL`, then that is an indicator b
- **Description**: Checks if the package name is suspiciously similar to any package name in a predefined list of popular packages. The similarity check incorporates the Jaro-Winkler distance and considers keyboard layout proximity to identify potential typosquatting.
- **Rule**: Return `HeuristicResult.FAIL` if the similarity ratio between the package name and any popular package name meets or exceeds a defined threshold; otherwise, return `HeuristicResult.PASS`.
- **Dependency**: None.

11. **Fake Email**
- **Description**: Checks if the package maintainer or author has a suspicious or invalid email .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Description**: Checks if the package maintainer or author has a suspicious or invalid email .
- **Description**: Checks if the package maintainer or author has a suspicious or invalid email.


11. **Fake Email**
- **Description**: Checks if the package maintainer or author has a suspicious or invalid email .
- **Rule**: Return `HeuristicResult.FAIL` if the email format is invalid or the email domain has no MX records ; otherwise, return `HeuristicResult.PASS`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Rule**: Return `HeuristicResult.FAIL` if the email format is invalid or the email domain has no MX records ; otherwise, return `HeuristicResult.PASS`.
- **Rule**: Return `HeuristicResult.FAIL` if the email format is invalid or the email domain has no MX records; otherwise, return `HeuristicResult.PASS`.

depends_on=None,
)

def is_valid_email(self, email: str) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validating email addresses is a complex task. What we have here is more like a sanity check, verifying that the address is vaguely of the right format. That may be enough for the purpose of this check, in which case this method should be renamed and re-documented to make that clear. Alternatively, if we do really want to ensure that email addresses are valid, this method will need to be expanded considerably. @behnazh-w
See the top two answers on this stackoverflow thread for more information: https://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-an-email-address

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the above, I think the regex checking and dns resolution steps should be split into separate functions. This could also simplify the related tests.

Copy link
Member

@benmss benmss Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is to use the Python library email-validator to more formally validate emails. It also uses dnspython, but handles some of the more complicated validation aspects.
https://pypi.org/project/email-validator/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i used the email-validator approach

@AmineRaouane AmineRaouane force-pushed the fake-emails-heuristic branch from 759ab97 to d99495c Compare July 12, 2025 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants