A training definition is your manifest providing various details about the training you'd be running. On the right panel, click on the Add training definition
link and create a New training definition
.
-
Provide details about the experiment name and description.
-
Upload your training model code by dropping off the zip file found here in the section marked to upload the model file. In our case we will be using the code from here, which is a sample program written in python using PyTorch to train a model on the CIFAR-10 dataset using various models. In this example, we will be using VGG-16
-
Select the framework as pytorch 0.3 from the dropdown.
-
Copy the execution command as
tar -xf ${DATA_DIR}/cifar-10-python.tar.gz ; python3 main.py --cifar_path ./ --checkpoint_path ${RESULT_DIR} --epochs 10
The execution command is your entry point, which is used to trigger your program.
-
On the
Training definition attributes
section on the right panel, selectCompute plan
as1/2 x NVIDIA® Tesla® K80 (1 GPU)
. This specifies the GPUs that you will be able to use for your training. Running on more powerful hardware is as easy as selecting a different compute configuration. -
We will not be doing
Hyperparameter optimization method
as a part of this lab, so select the option asnone
. -
Click on create button, this will create the training definition and will bring you back to the create experiment page.
- Click on
create and run
button and this should start the training.
- Hurray!! You have a training running. Follow the direction here to see the progress of the training.