This Kedro project was showcased at PyCon Ireland in November 2024 to demonstrate the integration of MLOps tools, including Kedro, MLflow, and Airflow. The demo ML pipeline addresses a common ML problem: collecting and preprocessing data from multiple sources, training and evaluating a model, and deploying it.
- Install dependencies from
requirements.txt
. - Set up the Kedro VS Code extension to visualize your pipelines in the IDE. Kedro VS Code Extension
- Use
kedro run
to execute and test your pipeline locally. - Install
kedro-mlflow
to track artifacts and runs, and to leverage the model registry. Kedro-MLflow Documentation - Install
kedro-airflow
or explore other deployment plugins to convert and deploy your pipeline to different platforms. Kedro-Airflow Documentation | Deployment Plugins
In order to get the best out of the template:
- Don't remove any lines from the
.gitignore
file we provide - Make sure your results can be reproduced by following a data engineering convention
- Don't commit data to your repository
- Don't commit any credentials or your local configuration to your repository. Keep all your credentials and local configuration in
conf/local/
Declare any dependencies in requirements.txt
for pip
installation.
To install them, run:
pip install -r requirements.txt
You can run your Kedro project with:
kedro run
Have a look at the files src/tests/test_run.py
and src/tests/pipelines/data_science/test_pipeline.py
for instructions on how to write your tests. Run the tests as follows:
pytest
To configure the coverage threshold, look at the .coveragerc
file.
To see and update the dependency requirements for your project use requirements.txt
. You can install the project requirements with pip install -r requirements.txt
.
Further information about project dependencies
Note: Using
kedro jupyter
orkedro ipython
to run your notebook provides these variables in scope:catalog
,context
,pipelines
andsession
.Jupyter, JupyterLab, and IPython are already included in the project requirements by default, so once you have run
pip install -r requirements.txt
you will not need to take any extra steps before you use them.
To use Jupyter notebooks in your Kedro project, you need to install Jupyter:
pip install jupyter
After installing Jupyter, you can start a local notebook server:
kedro jupyter notebook
To use JupyterLab, you need to install it:
pip install jupyterlab
You can also start JupyterLab:
kedro jupyter lab
And if you want to run an IPython session:
kedro ipython
To automatically strip out all output cell contents before committing to git
, you can use tools like nbstripout
. For example, you can add a hook in .git/config
with nbstripout --install
. This will run nbstripout
before anything is committed to git
.
Note: Your output cells will be retained locally.
Further information about building project documentation and packaging your project