Creating a training definition

A training definition is your manifest providing various details about the training you'd be running. On the right panel, click on the Add training definition link and create a New training definition.

Provide details about the experiment name and description.
Upload your training model code by dropping off the zip file found here in the section marked to upload the model file. In our case we will be using the code from here, which is a sample program written in python using PyTorch to train a model on the CIFAR-10 dataset using various models. In this example, we will be using VGG-16
Select the framework as pytorch 0.3 from the dropdown.
Copy the execution command as

tar -xf ${DATA_DIR}/cifar-10-python.tar.gz ; python3 main.py --cifar_path ./ --checkpoint_path ${RESULT_DIR} --epochs 10

The execution command is your entry point, which is used to trigger your program.

On the Training definition attributes section on the right panel, select Compute plan as 1/2 x NVIDIA® Tesla® K80 (1 GPU) . This specifies the GPUs that you will be able to use for your training. Running on more powerful hardware is as easy as selecting a different compute configuration.
We will not be doing Hyperparameter optimization method as a part of this lab, so select the option as none.
Click on create button, this will create the training definition and will bring you back to the create experiment page.

Click on create and run button and this should start the training.

Hurray!! You have a training running. Follow the direction here to see the progress of the training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

step_three.md

step_three.md

Creating a training definition

Files

step_three.md

Latest commit

History

step_three.md

File metadata and controls

Creating a training definition