Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR][Packaging] Reference implementation in Python for the glob pattern expansion specified as attachment to PEP 639 #4299

Open
abravalheri opened this issue Mar 7, 2025 · 1 comment · Fixed by pypa/packaging.python.org#1826

Comments

@abravalheri
Copy link

Reference: https://discuss.python.org/t/pep-639-round-3-improving-license-clarity-with-better-package-metadata/53020/174

Could we please have documented somewhere a reference implementation in Python for the glob part that complies with the mandatory requirements of the PEP? (maybe an attachment? Or something in the PyPA docs?)

The original intention of "let's document whatever stdlib's glob do, so that we can implement it in other languages" was generally agreed in the Discourse thread. However, there was a significant departure from this original intention to something that require a lot more validations which are not implemented by Python's stdlib itself.

Setuptools received something similar to the following in a contribution to setuptools: Validate license-files glob patterns by cdce8p · Pull Request #4841 · pypa/setuptools · GitHub (thanks @cdce8p)

import os
import re
from glob import glob


def find_pattern(pattern: str) -> list[str]:
    """
    >>> find_pattern("/LICENSE.MIT")
    Traceback (most recent call last):
    ...
    ValueError: Pattern '/LICENSE.MIT' should be relative...
    >>> find_pattern("../LICENSE.MIT")
    Traceback (most recent call last):
    ...
    ValueError: Pattern '../LICENSE.MIT' cannot contain '..'...
    >>> find_pattern("LICEN{CSE*")
    Traceback (most recent call last):
    ...
    ValueError: Pattern 'LICEN{CSE*' contains invalid characters...
    """
    if ".." in pattern:
        raise ValueError(f"Pattern {pattern!r} cannot contain '..'")
    if pattern.startswith((os.sep, "/")) or ":\\" in pattern:
        raise ValueError(
            f"Pattern {pattern!r} should be relative and must not start with '/'"
        )
    if re.match(r'^[\w\-\.\/\*\?\[\]]+$', pattern) is None:
        raise ValueError(
            f"Pattern '{pattern}' contains invalid characters. "
            "https://packaging.python.org/en/latest/specifications/pyproject-toml/#license-files"
        )
    found = glob(pattern, recursive=True)
    if not found:
        raise ValueError(f"Pattern '{pattern}' did not match any files.")
    return found

Is it enough/complete/correct? (at first glance I would say yes by looking at the text of the PEP, but I would like a second opinion).

/cc @befeleme

@befeleme
Copy link
Contributor

Hi, I'm sorry, I overlooked the original post. The implementation looks correct compared to the specification from PEP.

I think the best way to include this in the documentation would be to create a new PyPA specification page, similar to https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization and link to it from the pyproject.toml (and core metadata?) specification, like here: https://packaging.python.org/en/latest/specifications/pyproject-toml/#name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants