-
Notifications
You must be signed in to change notification settings - Fork 201
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
- Loading branch information
1 parent
18b53b2
commit 9cca43a
Showing
4 changed files
with
82 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,62 +1,103 @@ | ||
--- | ||
hide: | ||
- toc | ||
--- | ||
# Quick Start Guide | ||
|
||
# Quick Start | ||
This article serves as a straightforward manual for users to leverage AI Lab throughout the development and training process involving datasets, Notebooks, and job training. | ||
|
||
This document provides a simple guide for users to use the DCE 5.0 AI Lab platform for | ||
the entire development and training process of datasets, Notebooks, and job training. | ||
## Preparing Your Dataset | ||
|
||
1. Click **Data Management** -> **Datasets** in the navigation bar, | ||
then click **Create**. Create three datasets as follows: | ||
Start by clicking on **Data Management** -> **Datasets**, and then select the **Create** button to set up the three datasets outlined below. | ||
|
||
- Code: [https://github.com/d-run/drun-samples](https://github.com/d-run/drun-samples/tree/main/tensorflow/tf-fashion-mnist-sample) | ||
- For faster access in China, use Gitee: [https://gitee.com/samzong_lu/training-sample-code.git](https://gitee.com/samzong_lu/training-sample-code.git) | ||
- Data: [https://github.com/zalandoresearch/fashion-mnist](https://github.com/zalandoresearch/fashion-mnist) | ||
- For faster access in China, use Gitee: [https://gitee.com/samzong_lu/fashion-mnist.git](https://gitee.com/samzong_lu/fashion-mnist.git) | ||
- Empty PVC: Create an empty PVC to output the trained model and logs after training. | ||
### Dataset: Training Code | ||
|
||
!!! note | ||
- **Code Source:** [https://github.com/samzong/training-sample-code.git](https://github.com/samzong/training-sample-code.git). This repository contains a simple TensorFlow code sample. | ||
- If you're located in China, you can access it more quickly via Gitee: [https://gitee.com/samzong_lu/training-sample-code.git](https://gitee.com/samzong_lu/training-sample-code.git) | ||
- The code can be found at: `tensorflow/tf-fashion-mnist-sample` | ||
|
||
|
||
!!! note | ||
|
||
Currently, only the `StorageClass` with read-write mode `ReadWriteMany` is supported. Please use NFS or the recommended [JuiceFS](https://juicefs.com/en/). | ||
|
||
### Dataset: Training Data | ||
|
||
For this training session, we will use the Fashion-MNIST dataset, which can be found at [https://github.com/zalandoresearch/fashion-mnist.git](https://github.com/zalandoresearch/fashion-mnist.git). | ||
|
||
If you're in China, you can use Gitee for a quicker download: [https://gitee.com/samzong_lu/fashion-mnist.git](https://gitee.com/samzong_lu/fashion-mnist.git) | ||
|
||
|
||
!!! note | ||
|
||
If the training data dataset isn't created beforehand, it will be automatically downloaded during the training script execution. Preparing the dataset in advance can help speed up the training process. | ||
|
||
### Dataset: Empty Dataset | ||
|
||
AI Lab allows you to use `PVC` as the data source type for datasets. After creating an empty PVC bound to the dataset, you can utilize this empty dataset to store the output datasets from future training jobs, including models and logs. | ||
|
||
Currently, only `StorageClass` with `ReadWriteMany` mode is supported. Please use NFS or the recommended [JuiceFS](https://juicefs.com/zh-cn/). | ||
|
||
<!-- add screenshot later --> | ||
## Environment Dependency: TensorFlow | ||
|
||
<!-- add screenshot later --> | ||
When running the script, you'll need the `TensorFlow` Python library. You can use AI Lab's environment dependency management feature to download and prepare the necessary Python libraries in advance, eliminating the need for image builds. | ||
|
||
<!-- add screenshot later --> | ||
> Check out the [Environment Dependency](./dataset/environments.md) guide to add a `CONDA` environment. | ||
2. Prepare the development environment by clicking **Notebooks** in the navigation bar, | ||
then click **Create**. Associate the three datasets created in the previous step and | ||
fill in the mount paths as shown in the image below: | ||
```yaml | ||
name: tensorflow | ||
channels: | ||
- defaults | ||
- conda-forge | ||
dependencies: | ||
- python=3.12 | ||
- tensorflow | ||
prefix: /opt/conda/envs/tensorflow | ||
``` | ||
<!-- add screenshot later --> | ||
!!! note | ||
3. Wait for the Notebook to be created successfully, click the access link in | ||
the list to enter the Notebook. Execute the following command in the Notebook terminal to start the job training. | ||
After the environment is successfully set up, you only need to mount this environment to the Notebook or training jobs, using the base image provided by AI Lab. | ||
```shell | ||
python /home/jovyan/code/tensorflow/tf-fashion-mnist-sample/train.py | ||
``` | ||
## Using a Notebook to Debug Your Script | ||
<!-- add screenshot later --> | ||
Prepare your development environment by clicking on **Notebooks** in the navigation bar, then hit **Create**. | ||
4. Click **Job Center** -> **Jobs** in the navigation bar, create a `Tensorflow Single` job. | ||
Refer to the image below for job configuration and enable the **Job Analysis (Tensorboard)** feature. | ||
Click **Create** and wait for the status to complete. | ||
- Associate the [three datasets](#preparing-your-dataset) you prepared earlier, filling in the mount paths as shown in the image below. Make sure to configure the empty dataset in the output dataset location. | ||
- Image address: `release.daocloud.io/baize/jupyter-tensorflow-full:v1.8.0-baize` | ||
- Command: `python` | ||
- Arguments: `/home/jovyan/code/tensorflow/tf-fashion-mnist-sample/train.py` | ||
- Select and bind the [environment dependency package](#tensorflow). | ||
Wait for the Notebook to be successfully created, then click the access link in the list to enter the Notebook. In the Notebook terminal, run the following command to start the training job: | ||
![Enter Notebook](../images/baize-05.png) | ||
!!! note | ||
The script uses TensorFlow; if you forget to associate the dependency library, you can temporarily install it using `pip install tensorflow`. | ||
|
||
```shell | ||
python /home/jovyan/code/tensorflow/tf-fashion-mnist-sample/train.py | ||
``` | ||
|
||
## Creating a Training Job | ||
|
||
1. Click on **Job Center** -> **Training Jobs** in the navigation bar to create a standalone `TensorFlow` job. | ||
2. Fill in the basic parameters and click **Next**. | ||
3. In the job resource configuration, correctly set up the job resources and click **Next**. | ||
|
||
- **Image:** If you prepared the environment dependency package earlier, you can use the default image. Otherwise, make sure the image includes the `TensorFlow` Python library. | ||
- **Shell:** Use `bash`. | ||
- **Enable Command:** | ||
|
||
```bash | ||
python /home/jovyan/code/tensorflow/tf-fashion-mnist-sample/train.py | ||
``` | ||
|
||
4. In the advanced configuration, enable **Job Analysis (TensorBoard)**, and click **OK**. | ||
|
||
!!! note | ||
|
||
For large datasets or models, it is recommended to enable GPU configuration in the resource configuration step. | ||
Logs will be saved in the output dataset at `/home/jovyan/model/train/logs/`. | ||
|
||
|
||
<!-- add screenshot later --> | ||
5. Return to the training job list and wait for the status to change to **Success**. Click on the **┇** icon on the right side of the list to view details, clone jobs, update priority, view logs, and delete jobs, among other options. | ||
|
||
5. In the job created in the previous step, you can click the specific job analysis to | ||
view the job status and optimize the job training. | ||
6. Once the job is successfully created, click on **Job Analysis** in the left navigation bar to check the job status and fine-tune your training. | ||
|
||
<!-- add screenshot later --> | ||
![View Job](../images/baize-07.png) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters