legate-boost

GBM implementation on Legate. The primary goals of legate-boost is to provide a state-of-the-art distributed GBM implementation on Legate, capable of running on CPUs or GPUs at supercomputer scale.

API Documentation

For developers - see contributing

Installation

Install using conda.

# stable release
conda install -c legate -c conda-forge -c nvidia legate-boost

# nightly release
conda install -c legate/label/experimental -c legate -c conda-forge -c nvidia legate-boost

On systems without a GPU, the CPU-only package should automatically be installed. On systems with a GPU and compatible CUDA version, the GPU package should automatically be installed.

To force conda to prefer one, pass the build strings *_cpu* or *_gpu*, for example:

# nightly release (CPU-only)
conda install --dry-run -c legate/label/experimental -c legate -c conda-forge -c nvidia \
    'legate-boost=*=*_cpu*'

For more details on building from source and setting up a development environment, see contributing.md.

Simple example

Run with the legate launcher

legate example_script.py

>>> import cupynumeric as cn
>>> import legateboost as lb

>>> X = cn.random.random((1000, 10))
>>> y = cn.random.random(X.shape[0])
>>> model = lb.LBRegressor().fit(X, y)

Features

Model ensembling

legate-boost can create models from linear combinations of other models. Ensembling is as easy as:

>>> import cupynumeric as cn
>>> import legateboost as lb

>>> X = cn.random.random((1000, 10))
>>> X_train_a = X[:500]
>>> X_train_b = X[500:]
>>> y = cn.random.random(X.shape[0])
>>> y_train_a = y[:500]
>>> y_train_b = y[500:]

>>> model_a = lb.LBRegressor().fit(X_train_a, y_train_a)
>>> len(model_a)
100
>>> model_b = lb.LBRegressor().fit(X_train_b, y_train_b)
>>> len(model_b)
100
>>> model_c = (model_a + model_b) * 0.5
>>> len(model_c)
200

Probabilistic regression

legate-boost can learn distributions for continuous data. This is useful in cases where simply predicting the mean does not carry enough information about the training data:

The above example can be found here: examples/probabilistic_regression.

Batch training

legate-boost can train on datasets that do not fit into memory by splitting the dataset into batches and training the model with partial_fit.

>>> import cupynumeric as cn
>>> import legateboost as lb
>>> from sklearn.utils import gen_even_slices
>>> X = cn.random.random((1000, 10))
>>> y = cn.random.random(X.shape[0])

>>> total_estimators = 100
>>> estimators_per_batch = 10
>>> n_batches = total_estimators // estimators_per_batch

>>> train_batches = [(X[i], y[i]) for i in gen_even_slices(X.shape[0], n_batches)]
>>> model = lb.LBRegressor(n_estimators=estimators_per_batch)
>>> for i in range(total_estimators // estimators_per_batch):
...     X_batch, y_batch = train_batches[i % n_batches]
...     model = model.partial_fit(X_batch, y_batch)

The above example can be found here: examples/batch_training.

Different model types

legate-boost supports tree models, linear models, kernel ridge regression models, custom user models and any combinations of these models.

The following example shows a model combining linear and decision tree base learners on a synthetic dataset.

model = lb.LBRegressor(base_models=(lb.models.Linear(), lb.models.Tree(max_depth=1),), **params).fit(X, y)

The second example shows a model combining kernel ridge regression and decision tree base learners on the wine quality dataset.

model = lb.LBRegressor(base_models=(lb.models.KRR(sigma=0.5), lb.models.Tree(max_depth=5),), **params).fit(X, y)

Name		Name	Last commit message	Last commit date
Latest commit History 307 Commits
.github		.github
benchmark		benchmark
ci		ci
cmake		cmake
conda		conda
docs		docs
examples		examples
legateboost		legateboost
src		src
thirdparty/LICENSES		thirdparty/LICENSES
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
VERSION		VERSION
build.sh		build.sh
contributing.md		contributing.md
dependencies.yaml		dependencies.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

legate-boost

Installation

Simple example

Features

Model ensembling

Probabilistic regression

Batch training

Different model types

About

Releases

Packages

Contributors 10

Languages

License

rapidsai/legate-boost

Folders and files

Latest commit

History

Repository files navigation

legate-boost

Installation

Simple example

Features

Model ensembling

Probabilistic regression

Batch training

Different model types

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages