Skip to content

Commit

Permalink
Corrected links in documentation (#784)
Browse files Browse the repository at this point in the history
  • Loading branch information
yetudada authored Sep 9, 2020
1 parent 8e6e8b2 commit b5099ce
Show file tree
Hide file tree
Showing 14 changed files with 65 additions and 60 deletions.
2 changes: 1 addition & 1 deletion docs/source/02_get_started/01_prerequisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

- Kedro supports macOS, Linux and Windows (7 / 8 / 10 and Windows Server 2016+). If you encounter any problems on these platforms, please check the [frequently asked questions](../11_faq/01_faq.md), and / or the Kedro community support on [Stack Overflow](https://stackoverflow.com/questions/tagged/kedro).

- To work with Kedro, we highly recommend that you [download and install Anaconda](https://www.anaconda.com/download/#macos) (Python 3.x version).
- To work with Kedro, we highly recommend that you [download and install Anaconda](https://www.anaconda.com/download/) (Python 3.x version).

- If you are using PySpark, you will also need to [install Java](https://www.oracle.com/technetwork/java/javase/downloads/index.html). If you are a Windows user, you will need admin rights to complete the installation.

Expand Down
13 changes: 6 additions & 7 deletions docs/source/02_get_started/03_hello_kedro.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@ Here, the `return_greeting` function is wrapped by a node called `return_greetin
```python
from kedro.pipeline import node


# Prepare first node
def return_greeting():
# Prepare first node
return "Hello"


Expand All @@ -28,8 +29,8 @@ return_greeting_node = node(
The `join_statements` function is wrapped by a node called `join_statements_node`, which names a single input (`my_salutation`) and a single output (`my_message`):

```python
# Prepare second node
def join_statements(greeting):
# Prepare second node
return f"{greeting} Kedro!"


Expand Down Expand Up @@ -83,27 +84,25 @@ The Runner is an object that runs the pipeline. Kedro resolves the order in whic
It's now time to stitch the code together. Here is the full example:

```python
"""Content of hello_kedro.py"""
"""Contents of hello_kedro.py"""
from kedro.io import DataCatalog, MemoryDataSet
from kedro.pipeline import node, Pipeline
from kedro.runner import SequentialRunner

# Prepare a data catalog
data_catalog = DataCatalog({"example_data": MemoryDataSet()})


# Prepare first node
def return_greeting():
# Prepare first node
return "Hello"


return_greeting_node = node(
return_greeting, inputs=None, outputs="my_salutation"
)


# Prepare second node
def join_statements(greeting):
# Prepare second node
return f"{greeting} Kedro!"


Expand Down
2 changes: 1 addition & 1 deletion docs/source/02_get_started/05_example_project.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ For project-specific settings to share across different installations (for examp

The folder contains three files for the example, but you can add others as you require:

- `catalog.yml` - [Configures the Data Catalog](../04_data_catalog/04_data_catalog#using-the-data-catalog-within-kedro-configuration) with the file paths and load/save configuration required for different datasets
- `catalog.yml` - [Configures the Data Catalog](../05_data/01_data_catalog#using-the-data-catalog-within-kedro-configuration) with the file paths and load/save configuration required for different datasets
- `logging.yml` - Uses Python's default [`logging`](https://docs.python.org/3/library/logging.html) library to set up logging
- `parameters.yml` - Allows you to define parameters for machine learning experiments e.g. train / test split and number of iterations

Expand Down
3 changes: 1 addition & 2 deletions docs/source/02_get_started/06_starters.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Kedro starters


Kedro starters are used to create projects that contain code to run as-is, or to adapt and extend. They provide pre-defined example code and configuration that can be reused, for example:

* As example code for a typical Kedro project
Expand Down Expand Up @@ -65,7 +64,7 @@ Under the hood, the value will be passed to the [`--checkout` flag in Cookiecutt

## Use a starter in interactive mode

By default, when you create a new project using a starter, `kedro new` launches in [interactive mode](./04_new_project.md). You will be prompted to provide the following variables:
By default, when you create a new project using a starter, `kedro new` launches [by asking a few questions](./04_new_project.md#create-a-new-project-interactively). You will be prompted to provide the following variables:

* `project_name` - A human readable name for your new project
* `repo_name` - A name for the directory that holds your project repository
Expand Down
4 changes: 2 additions & 2 deletions docs/source/03_tutorial/02_tutorial_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ In this section, we discuss the project set-up phase, which is the first part of

## Create a new project

Navigate to your chosen working directory and run the following to [create a new empty Kedro project](../02_get_started/04_new_project.md) using the default interactive prompts:
Navigate to your chosen working directory and run the following to [create a new empty Kedro project](../02_get_started/04_new_project.md#create-a-new-project-interactively) using the default interactive prompts:

```bash
kedro new
Expand All @@ -34,7 +34,7 @@ isort>=4.3.21, <5.0 # Used for linting code with `kedro lint`
jupyter>=1.0.0, <2.0 # Used to open a Kedro-session in Jupyter Notebook & Lab
jupyter_client>=5.1.0, <7.0 # Used to open a Kedro-session in Jupyter Notebook & Lab
jupyterlab==0.31.1 # Used to open a Kedro-session in Jupyter Lab
kedro==0.16.3
kedro==0.16.5
nbstripout==0.3.3 # Strips the output of a Jupyter Notebook and writes the outputless version to the original file
pytest-cov>=2.5, <3.0 # Produces test coverage reports
pytest-mock>=1.7.1,<2.0 # Wrapper around the mock package for easier use with pytest
Expand Down
6 changes: 4 additions & 2 deletions docs/source/03_tutorial/04_create_pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -421,7 +421,7 @@ test_size: 0.2
random_state: 3
```
These are the parameters fed into the `DataCatalog` when the pipeline is executed. More information about [parameters](../04_kedro_project_setup/01_configuration.md#parameters) is available in later documentation for advanced usage.
These are the parameters fed into the `DataCatalog` when the pipeline is executed. More information about [parameters](../04_kedro_project_setup/02_configuration.md#Parameters) is available in later documentation for advanced usage.

### Register the dataset
The next step is to register the dataset that will save the trained model, by adding the following definition to `conf/base/catalog.yml`:
Expand All @@ -433,7 +433,9 @@ regressor:
versioned: true
```

> *Note:* Versioning is enabled for `regressor`, which means that the pickled output of the `regressor` will be versioned and saved every time the pipeline is run. This allows us to keep the history of the models built using this pipeline. Further details can be found in the [Versioning](../05_data/02_kedro_io.md#versioning).
> *Note:* Versioning is enabled for `regressor`, which means that the pickled output of the `regressor` will be
> versioned and saved every time the pipeline is run. This allows us to keep the history of the models built using
> this pipeline. Further details can be found in the [Versioning](../05_data/02_kedro_io.md#versioning) section.

### Assemble the data science pipeline
To create a pipeline for the price prediction model, add the following to the top of `src/kedro_tutorial/pipelines/data_science/pipeline.py`:
Expand Down
9 changes: 5 additions & 4 deletions docs/source/06_nodes_and_pipelines/02_pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ Modular pipelines serve the following main purposes:

### How do I create modular pipelines?

For projects created using Kedro version 0.16.0 or later, Kedro ships a [project-specific CLI command](../07_extend_kedro/05_plugins.md#global-and-project-commands) `kedro pipeline create <pipeline_name>`, which does the following for you:
For projects created using Kedro version 0.16.0 or later, Kedro ships a [project-specific CLI command](../09_development/03_commands_reference.md) `kedro pipeline create <pipeline_name>`, which does the following for you:
1. Adds a new modular pipeline in a `src/<python_package>/pipelines/<pipeline_name>/` directory
2. Creates boilerplate configuration files, `catalog.yml` and `parameters.yml`, in `conf/<env>/pipelines/<pipeline_name>/`, where `<env>` defaults to `base`
3. Makes a placeholder for the pipeline unit tests in `src/tests/pipelines/<pipeline_name>/`
Expand All @@ -184,7 +184,7 @@ You can manually delete all the files that belong to a modular pipeline. However

* All the modular pipeline code in `src/<python_package>/pipelines/<pipeline_name>/`
* Configuration files in `conf/<env>/pipelines/<pipeline_name>/`, where `<env>` defaults to `base`. If the files are located in a different config environment, run `kedro pipeline delete <pipeline_name> --env <env_name>`.
* Pipeline unit tests in `src/tests/pipelines/<pipeline_name>/`
* Pipeline unit tests in `src/tests/pipelines/<pipeline_name>/`

### Modular pipeline structure

Expand All @@ -207,7 +207,8 @@ pipeline = my_modular_pipeline_1.create_pipeline()
Here is a list of recommendations for developing a modular pipeline:

* A modular pipeline should include a `README.md`, with all the information regarding the execution of the pipeline for the end users
* A modular pipeline _may_ have external dependencies specified in `requirements.txt`. These dependencies are _not_ currently installed by the [`kedro install`](../04_kedro_project_setup/01_dependencies.md#kedro-install) command, so the users of your pipeline would have to run `pip install -r src/<python_package>/pipelines/<pipeline_name>/requirements.txt`
* A modular pipeline _may_ have external dependencies specified in `requirements.txt`. These dependencies are _not_
currently installed by the [`kedro install`](../09_development/03_commands_reference.md#Install-all-package-dependencies) command, so the users of your pipeline would have to run `pip install -r src/<python_package>/pipelines/<pipeline_name>/requirements.txt`
* To ensure portability, modular pipelines should use relative imports when accessing their own objects and absolute imports otherwise. Look at an example from `src/new_kedro_project/pipelines/modular_pipeline_1/pipeline.py` below:

<details>
Expand Down Expand Up @@ -265,7 +266,7 @@ project_hooks = ProjectHooks()
### How do I share a modular pipeline?

#### Packaging a modular pipeline
Since Kedro 0.16.4 you can package a modular pipeline by executing `kedro pipeline package <pipeline_name>` command, which will generate a new [wheel file](https://pythonwheels.com/) for it. By default, the wheel file will be saved into `src/dist` directory inside your project, however this can be changed using `--destination` (`-d`) option.
Since Kedro 0.16.4 you can package a modular pipeline by executing `kedro pipeline package <pipeline_name>` command, which will generate a new [wheel file](https://pythonwheels.com/) for it. By default, the wheel file will be saved into `src/dist` directory inside your project, however this can be changed using the `--destination` (`-d`) option.

When packaging your modular pipeline, Kedro will also automatically package files from 3 locations :

Expand Down
16 changes: 9 additions & 7 deletions docs/source/07_extend_kedro/01_custom_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ src/kedro_pokemon/extras

## Implement the `_load` method with `fsspec`

Many of the built-in Kedro datasets rely on [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) as a consistent interface to different data sources, as described earlier in the section about the [Data Catalog](./04_data_catalog.html#specifying-the-location-of-the-dataset). In this example, it's particularly convenient to use `fsspec` in conjunction with `Pillow` to read image data, since it allows the dataset to work flexibly with different image locations and formats.
Many of the built-in Kedro datasets rely on [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) as a consistent interface to different data sources, as described earlier in the section about the [Data Catalog](../05_data/01_data_catalog.md#specifying-the-location-of-the-dataset). In this example, it's particularly convenient to use `fsspec` in conjunction with `Pillow` to read image data, since it allows the dataset to work flexibly with different image locations and formats.

Here is the implementation of the `_load` method using `fsspec` and `Pillow` to read the data of a single image into a `numpy` array:

Expand Down Expand Up @@ -193,7 +193,7 @@ You can open the file to verify that the data was written back correctly.

## Implement the `_describe` method

The `_describe` method is used for printing purposes. The convention in Kedro is for the method to return a dictionary describing the attributes of the dataset .
The `_describe` method is used for printing purposes. The convention in Kedro is for the method to return a dictionary describing the attributes of the dataset.

```python
from kedro.io import AbstractDataSet
Expand Down Expand Up @@ -284,7 +284,7 @@ class ImageDataSet(AbstractDataSet):

Currently, the `ImageDataSet` only works with a single image, but this example needs to load all Pokemon images from the raw data directory for further processing.

Kedro's [`PartitionedDataSet`](./07_kedro_io/01_advanced_io.html#partitioned-dataset) is a convenient way to load multiple separate data files of the same underlying dataset type into a directory.
Kedro's [`PartitionedDataSet`](../05_data/02_kedro_io.md#partitioned-dataset) is a convenient way to load multiple separate data files of the same underlying dataset type into a directory.

To use `PartitionedDataSet` with `ImageDataSet` to load all Pokemon PNG images, add this to the data catalog YAML so that `PartitionedDataSet` loads all PNG files from the data directory using `ImageDataSet`:

Expand Down Expand Up @@ -315,7 +315,8 @@ $ ls -la data/01_raw/pokemon-images-and-types/images/images/*.png | wc -l

> *Note*: Versioning doesn't work with PartitionedDataSet. You can't use both of them at the same time.

To add [Versioning](./05_data/02_kedro_io.md#versioning) support to the new dataset we need to extend the [AbstractVersionedDataSet](/kedro.io.AbstractVersionedDataSet) to:
To add [Versioning](../05_data/02_kedro_io.md#versioning) support to the new dataset we need to extend the
[AbstractVersionedDataSet](/kedro.io.AbstractVersionedDataSet) to:

* Accept a `version` keyword argument as part of the constructor
* Adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively
Expand Down Expand Up @@ -397,7 +398,7 @@ class ImageDataSet(AbstractVersionedDataSet):
The graphic shows the differences between the original `ImageDataSet` and the versioned `ImageDataSet`:

![Visual code diff graphic](../meta/images/diffs-graphic.png)
![](../meta/images/diffs-graphic.png)

To test the code, you need to enable versioning support in the data catalog:

Expand Down Expand Up @@ -439,7 +440,7 @@ In [2]: context.catalog.save('pikachu', data=img)
Inspect the content of the data directory to find a new version of the data, written by `save`.

You may also want to consult the [in-depth documentation about the Versioning API](./05_data/kedro#versioning).
You may also want to consult the [in-depth documentation about the Versioning API](../05_data/02_kedro_io.md#versioning).

## Thread-safety

Expand Down Expand Up @@ -505,7 +506,8 @@ We provide additional examples of [how to use parameters through the data catalo
## How to contribute a custom dataset implementation
One of the easiest ways to contribute back to Kedro is to share a custom dataset. Kedro has a `kedro.extras.datasets` sub-package where you can add a new custom dataset implementation to share it with others. You can find out more in the [Kedro contribution guide](https://github.com/quantumblacklabs/kedro/blob/develop/CONTRIBUTING.md) on Github.
One of the easiest ways to contribute back to Kedro is to share a custom dataset. Kedro has a `kedro.extras.datasets` sub-package where you can add a new custom dataset implementation to share it with others. You can find out more in
the [Kedro contribution guide](https://github.com/quantumblacklabs/kedro/blob/master/CONTRIBUTING.md) on Github.
To contribute your custom dataset:
Expand Down
18 changes: 9 additions & 9 deletions docs/source/09_development/01_set_up_vscode.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@
Start by opening a new project directory in VS Code and installing the Python plugin under **Tools and languages**:

![Tools and languages graphic](../meta/images/vscode_startup.png)
![](../meta/images/vscode_startup.png)

Python is an interpreted language; to run Python code you must tell VS Code which interpreter to use. From within VS Code, select a Python 3 interpreter by opening the **Command Palette** (`Cmd + Shift + P` for macOS), start typing the **Python: Select Interpreter** command to search, then select the command.

At this stage, you should be able to see the `conda` environment that you have created. Select the environment:

![Conda environment graphic](../meta/images/vscode_setup_interpreter.png)
![](../meta/images/vscode_setup_interpreter.png)

### Advanced: For those using `venv` / `virtualenv`

Expand Down Expand Up @@ -95,7 +95,7 @@ We're going to need you to modify your `tasks.json`. To do this, go to **Termina

To start a build, go to **Terminal > Run Build Task...** or press `Cmd + Shift + B` for macOS. You can run other tasks by going to **Terminal > Run** and choosing which task you want to run.

![Terminal run graphic](../meta/images/vscode_run.png)
![](../meta/images/vscode_run.png)


## Debugging
Expand Down Expand Up @@ -138,19 +138,19 @@ Edit the `launch.json` that opens in the editor with:

To add a breakpoint in your `pipeline.py` script, for example, click on the left hand side of the line of code:

![Click on code line graphic](../meta/images/vscode_set_breakpoint.png)
![](../meta/images/vscode_set_breakpoint.png)

Click on **Debug** button on the left pane:

![Debug graphic](../meta/images/vscode_debug_button.png)
![](../meta/images/vscode_debug_button.png)

Then select the debug config **Python: Kedro Run** and click **Debug** (the green play button):

![Debug config graphic](../meta/images/vscode_run_debug.png)
![](../meta/images/vscode_run_debug.png)

Execution should stop at the breakpoint:

![Execution stopped at breakpoint graphic](../meta/images/vscode_breakpoint.png)
![](../meta/images/vscode_breakpoint.png)

### Advanced: Remote Interpreter / Debugging

Expand Down Expand Up @@ -233,7 +233,7 @@ ssh -vNL 3000:127.0.0.1:3000 <your_username>@<remote_server>

Go to the **Debugging** section in VS Code and select the newly created remote debugger profile:

![Select Kedro remote debugger graphic](../meta/images/vscode_remote_debugger.png)
![](../meta/images/vscode_remote_debugger.png)

You will need to set a breakpoint in VS Code as described [above](#debugging) and start the debugger by clicking the green play triangle:

Expand All @@ -255,4 +255,4 @@ Enter the following in your `settings.json` file:

and start editing your `catalog` files.

> Different schemas for different Kedro versions can be found [here](https://github.com/quantumblacklabs/kedro/tree/develop/static/jsonschema).
> Different schemas for different Kedro versions can be found [here](https://github.com/quantumblacklabs/kedro/tree/master/static/jsonschema).
Loading

0 comments on commit b5099ce

Please sign in to comment.