-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index pip-installed packages #109
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,110 @@ | ||
import importlib.metadata | ||
import json | ||
import sys | ||
from copy import deepcopy | ||
from typing import Any | ||
from importlib.metadata import Distribution | ||
from typing import TypedDict | ||
|
||
from packaging.utils import canonicalize_name | ||
from packaging.version import Version | ||
|
||
from .._compat import REPODATA_INFO, REPODATA_PACKAGES | ||
from .._utils import fix_package_dependencies | ||
from ..package_index import query_package | ||
|
||
IN_VENV = sys.prefix != sys.base_prefix | ||
|
||
|
||
class PkgEntry(TypedDict): | ||
name: str | ||
version: str | ||
file_name: str | ||
install_dir: str | ||
sha256: str | None | ||
imports: list[str] | ||
depends: list[str] | ||
|
||
|
||
def get_pkg_entry_micropip(dist: Distribution, url: str) -> PkgEntry: | ||
name = dist.name | ||
version = dist.version | ||
sha256 = dist.read_text("PYODIDE_SHA256") | ||
assert sha256 | ||
imports = (dist.read_text("top_level.txt") or "").split() | ||
requires = dist.read_text("PYODIDE_REQUIRES") | ||
if not requires: | ||
fix_package_dependencies(name) | ||
requires = dist.read_text("PYODIDE_REQUIRES") | ||
if requires: | ||
depends = json.loads(requires) | ||
else: | ||
depends = [] | ||
|
||
return dict( | ||
name=name, | ||
version=version, | ||
file_name=url, | ||
install_dir="site", | ||
sha256=sha256, | ||
imports=imports, | ||
depends=depends, | ||
) | ||
|
||
|
||
async def get_pkg_entry_pip(dist: Distribution) -> PkgEntry | None: | ||
resp = await query_package(dist.name) | ||
ver = resp.releases.get(Version(dist.version), None) | ||
if ver is None: | ||
return None | ||
wheel = next(ver) | ||
await wheel.download({}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see this method doesn't yet exist. This call could be very expensive and would need to manage the lifecycle of the downloaded content. It may also need to re-implement things that pip does like caching or re-using pip's cache. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I said, this PR is intended as a worst possible implementation to get the discussion moving forward. I think the situation is that we only need to download the wheel if the index does not implement PEP 658. Since PyPI finally does implement PEP 658 we should do that if possible. As far as I know
Yes that's a great idea. Does pip expose any of its caching logic? If not is it reasonably stable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pip does expose a
It does appear as if that "wheel cache" is only for locally-built wheels. I haven't found a way to query for pip's http cache. Here's what pip gives for
It's possible the |
||
requires = [req.name for req in wheel.requires(set())] | ||
return dict( | ||
name=dist.name, | ||
version=dist.version, | ||
file_name=wheel.url, | ||
install_dir="site", | ||
sha256=wheel.sha256, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So IIUC, we are fetching the installed package again to get the URL and the sha256? I am not sure if is it a reliable behavior. What happens if a user installed packages from non-PyPI repositories or from URL? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It works if they installed from url (at least in principle it can be made to work) because we can get the url from @pradyunsg @henryiii is there something I'm missing? If someone has a pip.conf with an extra-index-url or passes this as a cli instruction, is it possible to reconstruct which packages in the venv came from which indices? If not maybe we could make a packaging PEP adding INDEX to package metadata. |
||
imports=[], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a bit unfortunate that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Several good examples in that thread. Also, namespace packages aren't reflected accurately either:
(jaraco.functools is one of many packages that exposes the top-level |
||
depends=requires, | ||
) | ||
|
||
|
||
async def freeze2() -> str: | ||
"""Produce a json string which can be used as the contents of the | ||
``repodata.json`` lock file. | ||
|
||
If you later load Pyodide with this lock file, you can use | ||
:js:func:`pyodide.loadPackage` to load packages that were loaded with :py:mod:`micropip` | ||
this time. Loading packages with :js:func:`~pyodide.loadPackage` is much faster | ||
and you will always get consistent versions of all your dependencies. | ||
|
||
You can use your custom lock file by passing an appropriate url to the | ||
``lockFileURL`` of :js:func:`~globalThis.loadPyodide`. | ||
""" | ||
hoodmane marked this conversation as resolved.
Show resolved
Hide resolved
|
||
packages = deepcopy(REPODATA_PACKAGES) | ||
for dist in importlib.metadata.distributions(): | ||
name = dist.name | ||
url = dist.read_text("PYODIDE_URL") | ||
if url: | ||
pkg_entry = get_pkg_entry_micropip(dist, url) | ||
elif IN_VENV: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm unsure why IN_VENV is relevant. Packages can be installed without virtualenvs (e.g. on PYTHONPATH). And virtualenvs can expose site-packages. What is the motivation for varying on IN_VENV? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it is wrong to branch on
Right... some day would be good to test our system against this, it is a good feature. The Pyodide |
||
res = await get_pkg_entry_pip(dist) | ||
if not res: | ||
continue | ||
pkg_entry = res | ||
else: | ||
continue | ||
|
||
packages[canonicalize_name(name)] = pkg_entry | ||
|
||
# Sort | ||
packages = dict(sorted(packages.items())) | ||
package_data = { | ||
"info": REPODATA_INFO, | ||
"packages": packages, | ||
} | ||
return json.dumps(package_data) | ||
hoodmane marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
def freeze() -> str: | ||
|
@@ -24,32 +122,10 @@ def freeze() -> str: | |
packages = deepcopy(REPODATA_PACKAGES) | ||
for dist in importlib.metadata.distributions(): | ||
name = dist.name | ||
version = dist.version | ||
url = dist.read_text("PYODIDE_URL") | ||
if url is None: | ||
continue | ||
|
||
sha256 = dist.read_text("PYODIDE_SHA256") | ||
assert sha256 | ||
imports = (dist.read_text("top_level.txt") or "").split() | ||
requires = dist.read_text("PYODIDE_REQUIRES") | ||
if not requires: | ||
fix_package_dependencies(name) | ||
requires = dist.read_text("PYODIDE_REQUIRES") | ||
if requires: | ||
depends = json.loads(requires) | ||
else: | ||
depends = [] | ||
|
||
pkg_entry: dict[str, Any] = dict( | ||
name=name, | ||
version=version, | ||
file_name=url, | ||
install_dir="site", | ||
sha256=sha256, | ||
imports=imports, | ||
depends=depends, | ||
) | ||
pkg_entry = get_pkg_entry_micropip(dist, url) | ||
packages[canonicalize_name(name)] = pkg_entry | ||
|
||
# Sort | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach seems risky. What if the package doesn't have a wheel, but only an sdist? What if the sdist is chosen over the wheel? What if the package was installed from a local checkout that advertises the same version (but is different)? What if the package was installed from a local checkout that advertises a local version (that's not present in PyPI)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I agree with these concerns. I agree that there needs to be some sort of architecture to our packaging system and currently there is no architecture merely an accumulation of behaviors. I will answer your questions in defense of this approach because I think the answers contain interesting information about our domain logic and not because this approach is any good.
This is a great question. It will fail. If the wheel is platformed, then the pip build frontend cannot build it. The only way to build a Pyodide platformed wheel is with our own
pyodide build
frontend. I looked into patching pip so that it uses our frontend but I am pretty sure it is not a reasonable task without support from the pip team. But for a non-platformedpy3-none-none
package, it can just upload an sdist and then pip will download it and install it. The problem is that this whole lock file idea is predicated on the idea that the packages came from somewhere, and that we didn't do a local build. Probably if we find packages that were installed from sdist, thenfreeze()
should raise an error complaining specifically about this.Same issue, can only install from a url. So yeah we would indeed freeze a different wheel than was actually installed. We should probably check for this? Note that
pip freeze
also can't save you from this situation, although in normal Python it is fine if pip itself installed the package from sdist. But the fact that if you dothen
.venv-new
can end up being different than the current environment is indicating thatpip freeze
does not generate a true lock but something weaker. Our aspiration is to generate a lock and we should error with an explanation if we cannot generate one.