Skip to content

Commit

Permalink
[FEA]: Introduce Python module with CCCL headers (#3201)
Browse files Browse the repository at this point in the history
* Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative

* Run `copy_cccl_headers_to_aude_include()` before `setup()`

* Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path.

* Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel

* Bug fix: cuda/_include only exists after shutil.copytree() ran.

* Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py

* Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions)

* Replace := operator (needs Python 3.8+)

* Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md

* Restore original README.md: `pip3 install -e` now works on first pass.

* cuda_cccl/README.md: FOR INTERNAL USE ONLY

* Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under #3201 (comment))

Command used: ci/update_version.sh 2 8 0

* Modernize pyproject.toml, setup.py

Trigger for this change:

* #3201 (comment)

* #3201 (comment)

* Install CCCL headers under cuda.cccl.include

Trigger for this change:

* #3201 (comment)

Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely.

* Factor out cuda_cccl/cuda/cccl/include_paths.py

* Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative

* Add missing Copyright notice.

* Add missing __init__.py (cuda.cccl)

* Add `"cuda.cccl"` to `autodoc.mock_imports`

* Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.)

* Add # TODO: move this to a module-level import

* Modernize cuda_cooperative/pyproject.toml, setup.py

* Convert cuda_cooperative to use hatchling as build backend.

* Revert "Convert cuda_cooperative to use hatchling as build backend."

This reverts commit 61637d6.

* Move numpy from [build-system] requires -> [project] dependencies

* Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH

* Remove copy_license() and use license_files=["../../LICENSE"] instead.

* Further modernize cuda_cccl/setup.py to use pathlib

* Trivial simplifications in cuda_cccl/pyproject.toml

* Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code

* Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml

* Add taplo-pre-commit to .pre-commit-config.yaml

* taplo-pre-commit auto-fixes

* Use pathlib in cuda_cooperative/setup.py

* CCCL_PYTHON_PATH in cuda_cooperative/setup.py

* Modernize cuda_parallel/pyproject.toml, setup.py

* Use pathlib in cuda_parallel/setup.py

* Add `# TOML lint & format` comment.

* Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml

* Use pathlib in cuda/cccl/include_paths.py

* pre-commit autoupdate (EXCEPT clang-format, which was manually restored)

* Fixes after git merge main

* Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result'

```
=========================================================================== warnings summary ===========================================================================
tests/test_reduce.py::test_reduce_non_contiguous
  /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080>

  Traceback (most recent call last):
    File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__
      bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result))
                                                       ^^^^^^^^^^^^^^^^^
  AttributeError: '_Reduce' object has no attribute 'build_result'

    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ==============================================================
```

* Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy`

* Introduce cuda_cooperative/constraints.txt

* Also add cuda_parallel/constraints.txt

* Add `--constraint constraints.txt` in ci/test_python.sh

* Update Copyright dates

* Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024)

For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI.

* Remove unused cuda_parallel jinja2 dependency (noticed by chance).

* Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead.

* Make cuda_cooperative, cuda_parallel testing completely independent.

* Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Fix sign-compare warning (#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]"

This reverts commit ea33a21.

Error message: #3201 (comment)

* Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Restore original ci/matrix.yaml [skip-rapids]

* Use for loop in test_python.sh to avoid code duplication.

* Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]

* Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]"

This reverts commit ec206fd.

* Implement suggestion by @shwina (#3201 (review))

* Address feedback by @leofang

---------

Co-authored-by: Bernhard Manfred Gruber <[email protected]>
  • Loading branch information
rwgk and bernhardmgruber authored Jan 17, 2025
1 parent a128d86 commit 3e1e6e0
Show file tree
Hide file tree
Showing 24 changed files with 288 additions and 271 deletions.
11 changes: 11 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,17 @@ repos:
hooks:
- id: ruff # linter
- id: ruff-format # formatter

# TOML lint & format
- repo: https://github.com/ComPWA/taplo-pre-commit
rev: v0.9.3
hooks:
# See https://github.com/NVIDIA/cccl/issues/3426
# - id: taplo-lint
# exclude: "^docs/"
- id: taplo-format
exclude: "^docs/"

- repo: https://github.com/codespell-project/codespell
rev: v2.3.0
hooks:
Expand Down
33 changes: 18 additions & 15 deletions ci/test_python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,28 @@ print_environment_details

fail_if_no_gpu

readonly prefix="${BUILD_DIR}/python/"
export PYTHONPATH="${prefix}:${PYTHONPATH:-}"
begin_group "⚙️ Existing site-packages"
pip freeze
end_group "⚙️ Existing site-packages"

pushd ../python/cuda_cooperative >/dev/null
for module in cuda_parallel cuda_cooperative; do

run_command "⚙️ Pip install cuda_cooperative" pip install --force-reinstall --upgrade --target "${prefix}" .[test]
run_command "🚀 Pytest cuda_cooperative" python -m pytest -v ./tests
pushd "../python/${module}" >/dev/null

popd >/dev/null
TEMP_VENV_DIR="/tmp/${module}_venv"
rm -rf "${TEMP_VENV_DIR}"
python -m venv "${TEMP_VENV_DIR}"
. "${TEMP_VENV_DIR}/bin/activate"
echo 'cuda-cccl @ file:///home/coder/cccl/python/cuda_cccl' > /tmp/cuda-cccl_constraints.txt
run_command "⚙️ Pip install ${module}" pip install -c /tmp/cuda-cccl_constraints.txt .[test]
begin_group "⚙️ ${module} site-packages"
pip freeze
end_group "⚙️ ${module} site-packages"
run_command "🚀 Pytest ${module}" python -m pytest -v ./tests
deactivate

pushd ../python/cuda_parallel >/dev/null
popd >/dev/null

# Temporarily install the package twice to populate include directory as part of the first installation
# and to let manifest discover these includes during the second installation. Do not forget to remove the
# second installation after https://github.com/NVIDIA/cccl/issues/2281 is addressed.
run_command "⚙️ Pip install cuda_parallel once" pip install --force-reinstall --upgrade --target "${prefix}" .[test]
run_command "⚙️ Pip install cuda_parallel twice" pip install --force-reinstall --upgrade --target "${prefix}" .[test]
run_command "🚀 Pytest cuda_parallel" python -m pytest -v ./tests

popd >/dev/null
done

print_time_summary
2 changes: 2 additions & 0 deletions ci/update_version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ CUB_CMAKE_VERSION_FILE="lib/cmake/cub/cub-config-version.cmake"
LIBCUDACXX_CMAKE_VERSION_FILE="lib/cmake/libcudacxx/libcudacxx-config-version.cmake"
THRUST_CMAKE_VERSION_FILE="lib/cmake/thrust/thrust-config-version.cmake"
CUDAX_CMAKE_VERSION_FILE="lib/cmake/cudax/cudax-config-version.cmake"
CUDA_CCCL_VERSION_FILE="python/cuda_cccl/cuda/cccl/_version.py"
CUDA_COOPERATIVE_VERSION_FILE="python/cuda_cooperative/cuda/cooperative/_version.py"
CUDA_PARALLEL_VERSION_FILE="python/cuda_parallel/cuda/parallel/_version.py"

Expand Down Expand Up @@ -110,6 +111,7 @@ update_file "$CUDAX_CMAKE_VERSION_FILE" "set(cudax_VERSION_MAJOR \([0-9]\+\))" "
update_file "$CUDAX_CMAKE_VERSION_FILE" "set(cudax_VERSION_MINOR \([0-9]\+\))" "set(cudax_VERSION_MINOR $minor)"
update_file "$CUDAX_CMAKE_VERSION_FILE" "set(cudax_VERSION_PATCH \([0-9]\+\))" "set(cudax_VERSION_PATCH $patch)"

update_file "$CUDA_CCCL_VERSION_FILE" "^__version__ = \"\([0-9.]\+\)\"" "__version__ = \"$major.$minor.$patch\""
update_file "$CUDA_COOPERATIVE_VERSION_FILE" "^__version__ = \"\([0-9.]\+\)\"" "__version__ = \"$pymajor.$pyminor.$major.$minor.$patch\""
update_file "$CUDA_PARALLEL_VERSION_FILE" "^__version__ = \"\([0-9.]\+\)\"" "__version__ = \"$pymajor.$pyminor.$major.$minor.$patch\""

Expand Down
1 change: 1 addition & 0 deletions docs/repo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,7 @@ autodoc.mock_imports = [
"numba",
"pynvjitlink",
"cuda.bindings",
"cuda.cccl",
"llvmlite",
"numpy",
]
Expand Down
2 changes: 2 additions & 0 deletions python/cuda_cccl/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
cuda/cccl/include
*egg-info
3 changes: 3 additions & 0 deletions python/cuda_cccl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Note

This package is currently FOR INTERNAL USE ONLY and not meant to be used/installed explicitly.
8 changes: 8 additions & 0 deletions python/cuda_cccl/cuda/cccl/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED.
#
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

from cuda.cccl._version import __version__
from cuda.cccl.include_paths import get_include_paths

__all__ = ["__version__", "get_include_paths"]
7 changes: 7 additions & 0 deletions python/cuda_cccl/cuda/cccl/_version.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED.
#
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

# This file is generated by ci/update_version.sh
# Do not edit this file manually.
__version__ = "2.8.0"
63 changes: 63 additions & 0 deletions python/cuda_cccl/cuda/cccl/include_paths.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED.
#
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

import os
import shutil
from dataclasses import dataclass
from functools import lru_cache
from pathlib import Path
from typing import Optional


def _get_cuda_path() -> Optional[Path]:
cuda_path = os.environ.get("CUDA_PATH")
if cuda_path:
cuda_path = Path(cuda_path)
if cuda_path.exists():
return cuda_path

nvcc_path = shutil.which("nvcc")
if nvcc_path:
return Path(nvcc_path).parent.parent

default_path = Path("/usr/local/cuda")
if default_path.exists():
return default_path

return None


@dataclass
class IncludePaths:
cuda: Optional[Path]
libcudacxx: Optional[Path]
cub: Optional[Path]
thrust: Optional[Path]

def as_tuple(self):
# Note: higher-level ... lower-level order:
return (self.thrust, self.cub, self.libcudacxx, self.cuda)


@lru_cache()
def get_include_paths() -> IncludePaths:
# TODO: once docs env supports Python >= 3.9, we
# can move this to a module-level import.
from importlib.resources import as_file, files

cuda_incl = None
cuda_path = _get_cuda_path()
if cuda_path is not None:
cuda_incl = cuda_path / "include"

with as_file(files("cuda.cccl.include")) as f:
cccl_incl = Path(f)
assert cccl_incl.exists()

return IncludePaths(
cuda=cuda_incl,
libcudacxx=cccl_incl / "libcudacxx",
cub=cccl_incl,
thrust=cccl_incl,
)
29 changes: 29 additions & 0 deletions python/cuda_cccl/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED.
#
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

[build-system]
requires = ["setuptools>=61.0.0"]
build-backend = "setuptools.build_meta"

[project]
name = "cuda-cccl"
description = "Experimental Package with CCCL headers to support JIT compilation"
authors = [{ name = "NVIDIA Corporation" }]
classifiers = [
"Programming Language :: Python :: 3 :: Only",
"Environment :: GPU :: NVIDIA CUDA",
"License :: OSI Approved :: Apache Software License",
]
requires-python = ">=3.9"
dynamic = ["version", "readme"]

[project.urls]
Homepage = "https://github.com/NVIDIA/cccl"

[tool.setuptools.dynamic]
version = { attr = "cuda.cccl._version.__version__" }
readme = { file = ["README.md"], content-type = "text/markdown" }

[tool.setuptools.package-data]
cuda = ["cccl/include/**/*"]
51 changes: 51 additions & 0 deletions python/cuda_cccl/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED.
#
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

import shutil
from pathlib import Path

from setuptools import setup
from setuptools.command.build_py import build_py

PROJECT_PATH = Path(__file__).resolve().parent
CCCL_PATH = PROJECT_PATH.parents[1]


class CustomBuildPy(build_py):
"""Copy CCCL headers BEFORE super().run()
Note that the CCCL headers cannot be referenced directly:
setuptools (and pyproject.toml) does not support relative paths that
reference files outside the package directory (like ../../).
This is a restriction designed to avoid inadvertently packaging files
that are outside the source tree.
"""

def run(self):
cccl_headers = [
("cub", "cub"),
("libcudacxx", "include"),
("thrust", "thrust"),
]

inc_path = PROJECT_PATH / "cuda" / "cccl" / "include"
inc_path.mkdir(parents=True, exist_ok=True)

for proj_dir, header_dir in cccl_headers:
src_path = CCCL_PATH / proj_dir / header_dir
dst_path = inc_path / proj_dir
if dst_path.exists():
shutil.rmtree(dst_path)
shutil.copytree(src_path, dst_path)

init_py_path = inc_path / "__init__.py"
init_py_path.write_text("# Intentionally empty.\n")

super().run()


setup(
license_files=["../../LICENSE"],
cmdclass={"build_py": CustomBuildPy},
)
1 change: 0 additions & 1 deletion python/cuda_cooperative/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
cuda/_include
env
*egg-info
1 change: 0 additions & 1 deletion python/cuda_cooperative/MANIFEST.in

This file was deleted.

1 change: 1 addition & 0 deletions python/cuda_cooperative/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Please visit the documentation here: https://nvidia.github.io/cccl/python.html.
## Local development

```bash
pip3 install -e ../cuda_cccl
pip3 install -e .[test]
pytest -v ./tests/
```
46 changes: 9 additions & 37 deletions python/cuda_cooperative/cuda/cooperative/experimental/_nvrtc.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

import functools
import importlib.resources as pkg_resources
import os
import shutil

from cuda.bindings import nvrtc
from cuda.cooperative.experimental._caching import disk_cache
Expand All @@ -20,22 +17,6 @@ def CHECK_NVRTC(err, prog):
raise RuntimeError(f"NVRTC error: {log.decode('ascii')}")


def get_cuda_path():
cuda_path = os.environ.get("CUDA_PATH", "")
if os.path.exists(cuda_path):
return cuda_path

nvcc_path = shutil.which("nvcc")
if nvcc_path is not None:
return os.path.dirname(os.path.dirname(nvcc_path))

default_path = "/usr/local/cuda"
if os.path.exists(default_path):
return default_path

return None


# cpp is the C++ source code
# cc = 800 for Ampere, 900 Hopper, etc
# rdc is true or false
Expand All @@ -47,24 +28,15 @@ def compile_impl(cpp, cc, rdc, code, nvrtc_path, nvrtc_version):
check_in("rdc", rdc, [True, False])
check_in("code", code, ["lto", "ptx"])

with pkg_resources.path("cuda", "_include") as include_path:
# Using `.parent` for compatibility with pip install --editable:
include_path = pkg_resources.files("cuda.cooperative").parent.joinpath(
"_include"
)
cub_path = include_path
thrust_path = include_path
libcudacxx_path = os.path.join(include_path, "libcudacxx")
cuda_include_path = os.path.join(get_cuda_path(), "include")

opts = [
b"--std=c++17",
bytes(f"--include-path={cub_path}", encoding="ascii"),
bytes(f"--include-path={thrust_path}", encoding="ascii"),
bytes(f"--include-path={libcudacxx_path}", encoding="ascii"),
bytes(f"--include-path={cuda_include_path}", encoding="ascii"),
bytes(f"--gpu-architecture=compute_{cc}", encoding="ascii"),
]
opts = [b"--std=c++17"]

# TODO: move this to a module-level import (after docs env modernization).
from cuda.cccl import get_include_paths

for path in get_include_paths().as_tuple():
if path is not None:
opts += [f"--include-path={path}".encode("ascii")]
opts += [f"--gpu-architecture=compute_{cc}".encode("ascii")]
if rdc:
opts += [b"--relocatable-device-code=true"]

Expand Down
34 changes: 32 additions & 2 deletions python/cuda_cooperative/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,11 +1,41 @@
# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED.
# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED.
#
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

[build-system]
requires = ["packaging", "setuptools>=61.0.0", "wheel"]
requires = ["setuptools>=61.0.0"]
build-backend = "setuptools.build_meta"

[project]
name = "cuda-cooperative"
description = "Experimental Core Library for CUDA Python"
authors = [{ name = "NVIDIA Corporation" }]
classifiers = [
"Programming Language :: Python :: 3 :: Only",
"Environment :: GPU :: NVIDIA CUDA",
"License :: OSI Approved :: Apache Software License",
]
requires-python = ">=3.9"
dependencies = [
"cuda-cccl",
"numpy",
"numba>=0.60.0",
"pynvjitlink-cu12>=0.2.4",
"cuda-python==12.*",
"jinja2",
]
dynamic = ["version", "readme"]

[project.optional-dependencies]
test = ["pytest", "pytest-xdist"]

[project.urls]
Homepage = "https://developer.nvidia.com/"

[tool.setuptools.dynamic]
version = { attr = "cuda.cooperative._version.__version__" }
readme = { file = ["README.md"], content-type = "text/markdown" }

[tool.ruff]
extend = "../../pyproject.toml"

Expand Down
Loading

0 comments on commit 3e1e6e0

Please sign in to comment.