Improved Docs models Usage examples (ultralytics#4214)

CV4EcologySchool · Aug 7, 2023 · ff5fa57 · ff5fa57
1 parent 9a2c069
commit ff5fa57
Show file tree

Hide file tree

Showing 15 changed files with 420 additions and 223 deletions.
diff --git a/.github/workflows/greetings.yml b/.github/workflows/greetings.yml
@@ -32,7 +32,7 @@ jobs:
 
             If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our [Tips for Best Training Results](https://docs.ultralytics.com/yolov5/tutorials/tips_for_best_training_results/).
 
-            Join the vibrant [Ultralytics Discord](https://discord.gg/YVsATxj6wr) 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.
+            Join the vibrant [Ultralytics Discord](https://ultralytics.com/discord) 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.
 
             ## Install
 

diff --git a/docs/models/fast-sam.md b/docs/models/fast-sam.md
@@ -155,15 +155,19 @@ Additionally, you can try FastSAM through a [Colab demo](https://colab.research.
 
 We would like to acknowledge the FastSAM authors for their significant contributions in the field of real-time instance segmentation:
 
-```bibtex
-@misc{zhao2023fast,
-      title={Fast Segment Anything},
-      author={Xu Zhao and Wenchao Ding and Yongqi An and Yinglong Du and Tao Yu and Min Li and Ming Tang and Jinqiao Wang},
-      year={2023},
-      eprint={2306.12156},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV}
-}
-```
+!!! note ""
+
+    === "BibTeX"
+
+      ```bibtex
+      @misc{zhao2023fast,
+            title={Fast Segment Anything},
+            author={Xu Zhao and Wenchao Ding and Yongqi An and Yinglong Du and Tao Yu and Min Li and Ming Tang and Jinqiao Wang},
+            year={2023},
+            eprint={2306.12156},
+            archivePrefix={arXiv},
+            primaryClass={cs.CV}
+      }
+      ```
 
 The original FastSAM paper can be found on [arXiv](https://arxiv.org/abs/2306.12156). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/CASIA-IVA-Lab/FastSAM). We appreciate their efforts in advancing the field and making their work accessible to the broader community.
diff --git a/docs/models/index.md b/docs/models/index.md
@@ -17,32 +17,51 @@ In this documentation, we provide information on four major models:
 5. [YOLOv7](./yolov7.md): Updated YOLO models released in 2022 by the authors of YOLOv4.
 6. [YOLOv8](./yolov8.md): The latest version of the YOLO family, featuring enhanced capabilities such as instance segmentation, pose/keypoints estimation, and classification.
 7. [Segment Anything Model (SAM)](./sam.md): Meta's Segment Anything Model (SAM).
-7. [Mobile Segment Anything Model (MobileSAM)](./mobile-sam.md): MobileSAM for mobile applications by Kyung Hee University.
-8. [Fast Segment Anything Model (FastSAM)](./fast-sam.md): FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
-9. [YOLO-NAS](./yolo-nas.md): YOLO Neural Architecture Search (NAS) Models.
-10. [Realtime Detection Transformers (RT-DETR)](./rtdetr.md): Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.
+8. [Mobile Segment Anything Model (MobileSAM)](./mobile-sam.md): MobileSAM for mobile applications by Kyung Hee University.
+9. [Fast Segment Anything Model (FastSAM)](./fast-sam.md): FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
+10. [YOLO-NAS](./yolo-nas.md): YOLO Neural Architecture Search (NAS) Models.
+11. [Realtime Detection Transformers (RT-DETR)](./rtdetr.md): Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.
 
 You can use many of these models directly in the Command Line Interface (CLI) or in a Python environment. Below are examples of how to use the models with CLI and Python:
 
-## CLI Example
+## Usage
 
-Use the `model` argument to pass a model YAML such as `model=yolov8n.yaml` or a pretrained *.pt file such as `model=yolov8n.pt`
+You can use RT-DETR for object detection tasks using the `ultralytics` pip package. The following is a sample code snippet showing how to use RT-DETR models for training and inference:
 
-```bash
-yolo task=detect mode=train model=yolov8n.pt data=coco128.yaml epochs=100
-```
+!!! example ""
 
-## Python Example
+    This example provides simple inference code for YOLO, SAM and RTDETR models. For more options including handling inference results see [Predict](../modes/predict.md) mode. For using models with additional modes see [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md).
 
-PyTorch pretrained models as well as model YAML files can also be passed to the `YOLO()`, `SAM()`, `NAS()` and `RTDETR()` classes to create a model instance in python:
+    === "Python"
 
-```python
-from ultralytics import YOLO
+        PyTorch pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLO()`, `SAM()`, `NAS()` and `RTDETR()` classes to create a model instance in python:
 
-model = YOLO("yolov8n.pt")  # load a pretrained YOLOv8n model
+        ```python
+        from ultralytics import YOLO
 
-model.info()  # display model information
-model.train(data="coco128.yaml", epochs=100)  # train the model
-```
+        # Load a COCO-pretrained YOLOv8n model
+        model = YOLO('yolov8n.pt')
+
+        # Display model information (optional)
+        model.info()
+
+        # Train the model on the COCO8 example dataset for 100 epochs
+        results model.train(data='coco8.yaml', epochs=100, imgsz=640)
+
+        # Run inference with the YOLOv8n model on the 'bus.jpg' image
+        results = model('path/to/bus.jpg')
+        ```
+
+    === "CLI"
+
+        CLI commands are available to directly run the models:
+
+        ```bash
+        # Load a COCO-pretrained YOLOv8n model and train it on the COCO8 example dataset for 100 epochs
+        yolo train model=yolov8n.pt data=coco8.yaml epochs=100 imgsz=640
+
+        # Load a COCO-pretrained YOLOv8n model and run inference on the 'bus.jpg' image
+        yolo predict model=yolov8n.pt source=path/to/bus.jpg
+        ```
 
 For more details on each model, their supported tasks, modes, and performance, please visit their respective documentation pages linked above.
diff --git a/docs/models/mobile-sam.md b/docs/models/mobile-sam.md
@@ -12,7 +12,7 @@ The MobileSAM paper is now available on [arXiv](https://arxiv.org/pdf/2306.14289
 
 A demonstration of MobileSAM running on a CPU can be accessed at this [demo link](https://huggingface.co/spaces/dhkim2810/MobileSAM). The performance on a Mac i5 CPU takes approximately 3 seconds. On the Hugging Face demo, the interface and lower-performance CPUs contribute to a slower response, but it continues to function effectively.
 
-MobileSAM is implemented in various projects including [Grounding-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything), [AnyLabeling](https://github.com/vietanhdev/anylabeling), and [SegmentAnythingin3D](https://github.com/Jumpat/SegmentAnythingin3D).
+MobileSAM is implemented in various projects including [Grounding-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything), [AnyLabeling](https://github.com/vietanhdev/anylabeling), and [Segment Anything in 3D](https://github.com/Jumpat/SegmentAnythingin3D).
 
 MobileSAM is trained on a single GPU with a 100k dataset (1% of the original images) in less than a day. The code for this training will be made available in the future.
 
@@ -85,15 +85,19 @@ model.predict('ultralytics/assets/zidane.jpg', bboxes=[439, 437, 524, 709])
 
 We have implemented `MobileSAM` and `SAM` using the same API. For more usage information, please see the [SAM page](./sam.md).
 
-### Citing MobileSAM
+## Citations and Acknowledgements
 
 If you find MobileSAM useful in your research or development work, please consider citing our paper:
 
-```bibtex
-@article{mobile_sam,
-  title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
-  author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
-  journal={arXiv preprint arXiv:2306.14289},
-  year={2023}
-}
-```
+!!! note ""
+
+    === "BibTeX"
+
+        ```bibtex
+        @article{mobile_sam,
+          title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
+          author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
+          journal={arXiv preprint arXiv:2306.14289},
+          year={2023}
+        }
+        ```
diff --git a/docs/models/rtdetr.md b/docs/models/rtdetr.md
@@ -15,7 +15,7 @@ Real-Time Detection Transformer (RT-DETR), developed by Baidu, is a cutting-edge
 
 ### Key Features
 
-- **Efficient Hybrid Encoder:** Baidu's RT-DETR uses an efficient hybrid encoder that processes multi-scale features by decoupling intra-scale interaction and cross-scale fusion. This unique Vision Transformers-based design reduces computational costs and allows for real-time object detection.
+- **Efficient Hybrid Encoder:** Baidu's RT-DETR uses an efficient hybrid encoder that processes multiscale features by decoupling intra-scale interaction and cross-scale fusion. This unique Vision Transformers-based design reduces computational costs and allows for real-time object detection.
 - **IoU-aware Query Selection:** Baidu's RT-DETR improves object query initialization by utilizing IoU-aware query selection. This allows the model to focus on the most relevant objects in the scene, enhancing the detection accuracy.
 - **Adaptable Inference Speed:** Baidu's RT-DETR supports flexible adjustments of inference speed by using different decoder layers without the need for retraining. This adaptability facilitates practical application in various real-time object detection scenarios.
 
@@ -28,16 +28,39 @@ The Ultralytics Python API provides pre-trained PaddlePaddle RT-DETR models with
 
 ## Usage
 
-### Python API
+You can use RT-DETR for object detection tasks using the `ultralytics` pip package. The following is a sample code snippet showing how to use RT-DETR models for training and inference:
 
-```python
-from ultralytics import RTDETR
+!!! example ""
 
-model = RTDETR("rtdetr-l.pt")
-model.info()  # display model information
-model.train(data="coco8.yaml")   # train
-model.predict("path/to/image.jpg")  # predict
-```
+    This example provides simple inference code for RT-DETR. For more options including handling inference results see [Predict](../modes/predict.md) mode. For using RT-DETR with additional modes see [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md).
+
+    === "Python"
+
+        ```python
+        from ultralytics import RTDETR
+
+        # Load a COCO-pretrained RT-DETR-l model
+        model = RTDETR('rtdetr-l.pt')
+
+        # Display model information (optional)
+        model.info()
+
+        # Train the model on the COCO8 example dataset for 100 epochs
+        results model.train(data='coco8.yaml', epochs=100, imgsz=640)
+
+        # Run inference with the RT-DETR-l model on the 'bus.jpg' image
+        results = model('path/to/bus.jpg')
+        ```
+
+    === "CLI"
+
+        ```bash        
+        # Load a COCO-pretrained RT-DETR-l model and train it on the COCO8 example dataset for 100 epochs
+        yolo train model=rtdetr-l.pt data=coco8.yaml epochs=100 imgsz=640
+
+        # Load a COCO-pretrained RT-DETR-l model and run inference on the 'bus.jpg' image
+        yolo predict model=rtdetr-l.pt source=path/to/bus.jpg
+        ```
 
 ### Supported Tasks
 
@@ -54,20 +77,24 @@ model.predict("path/to/image.jpg")  # predict
 | Validation | :heavy_check_mark: |
 | Training   | :heavy_check_mark: |
 
-# Citations and Acknowledgements
+## Citations and Acknowledgements
 
 If you use Baidu's RT-DETR in your research or development work, please cite the [original paper](https://arxiv.org/abs/2304.08069):
 
-```bibtex
-@misc{lv2023detrs,
-      title={DETRs Beat YOLOs on Real-time Object Detection},
-      author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
-      year={2023},
-      eprint={2304.08069},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV}
-}
-```
+!!! note ""
+
+    === "BibTeX"
+
+        ```bibtex
+        @misc{lv2023detrs,
+              title={DETRs Beat YOLOs on Real-time Object Detection},
+              author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
+              year={2023},
+              eprint={2304.08069},
+              archivePrefix={arXiv},
+              primaryClass={cs.CV}
+        }
+        ```
 
 We would like to acknowledge Baidu and the [PaddlePaddle](https://github.com/PaddlePaddle/PaddleDetection) team for creating and maintaining this valuable resource for the computer vision community. Their contribution to the field with the development of the Vision Transformers-based real-time object detector, RT-DETR, is greatly appreciated.
 

diff --git a/docs/models/sam.md b/docs/models/sam.md
@@ -72,6 +72,7 @@ The Segment Anything Model can be employed for a multitude of downstream tasks t
         # Run inference
         model('path/to/image.jpg')
         ```
+
     === "CLI"
 
         ```bash
@@ -99,6 +100,7 @@ The Segment Anything Model can be employed for a multitude of downstream tasks t
         predictor.set_image(cv2.imread("ultralytics/assets/zidane.jpg"))  # set with np.ndarray
         results = predictor(bboxes=[439, 437, 524, 709])
         results = predictor(points=[900, 370], labels=[1])
+
         # Reset image
         predictor.reset_image()
         ```
@@ -114,9 +116,8 @@ The Segment Anything Model can be employed for a multitude of downstream tasks t
         overrides = dict(conf=0.25, task='segment', mode='predict', imgsz=1024, model="mobile_sam.pt")
         predictor = SAMPredictor(overrides=overrides)
 
-        # segment with additional args
+        # Segment with additional args
         results = predictor(source="ultralytics/assets/zidane.jpg", crop_n_layers=1, points_stride=64)
-
         ```
 
 - More additional args for `Segment everything` see [`Predictor/generate` Reference](../reference/models/sam/predict.md).
@@ -140,11 +141,11 @@ The Segment Anything Model can be employed for a multitude of downstream tasks t
 
 Here we compare Meta's smallest SAM model, SAM-b, with Ultralytics smallest segmentation model, [YOLOv8n-seg](../tasks/segment.md):
 
-| Model                                          | Size                       | Parameters             | Speed (CPU)             |
-|------------------------------------------------|----------------------------|------------------------|-------------------------|
-| Meta's SAM-b                                   | 358 MB                     | 94.7 M                 | 51096 ms/im             |
-| [MobileSAM](mobile-sam.md)                     | 40.7 MB                    | 10.1 M                 | 46122 ms/im             |
-| [FastSAM-s](fast-sam.md) with YOLOv8 backbone  | 23.7 MB                    | 11.8 M                 | 115 ms/im               |
+| Model                                          | Size                       | Parameters             | Speed (CPU)                |
+|------------------------------------------------|----------------------------|------------------------|----------------------------|
+| Meta's SAM-b                                   | 358 MB                     | 94.7 M                 | 51096 ms/im                |
+| [MobileSAM](mobile-sam.md)                     | 40.7 MB                    | 10.1 M                 | 46122 ms/im                |
+| [FastSAM-s](fast-sam.md) with YOLOv8 backbone  | 23.7 MB                    | 11.8 M                 | 115 ms/im                  |
 | Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7 MB** (53.4x smaller) | **3.4 M** (27.9x less) | **59 ms/im** (866x faster) |
 
 This comparison shows the order-of-magnitude differences in the model sizes and speeds between models. Whereas SAM presents unique capabilities for automatic segmenting, it is not a direct competitor to YOLOv8 segment models, which are smaller, faster and more efficient.
@@ -205,16 +206,20 @@ Auto-annotation with pre-trained models can dramatically cut down the time and e
 
 If you find SAM useful in your research or development work, please consider citing our paper:
 
-```bibtex
-@misc{kirillov2023segment,
-      title={Segment Anything},
-      author={Alexander Kirillov and Eric Mintun and Nikhila Ravi and Hanzi Mao and Chloe Rolland and Laura Gustafson and Tete Xiao and Spencer Whitehead and Alexander C. Berg and Wan-Yen Lo and Piotr Dollár and Ross Girshick},
-      year={2023},
-      eprint={2304.02643},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV}
-}
-```
+!!! note ""
+
+    === "BibTeX"
+
+        ```bibtex
+        @misc{kirillov2023segment,
+              title={Segment Anything},
+              author={Alexander Kirillov and Eric Mintun and Nikhila Ravi and Hanzi Mao and Chloe Rolland and Laura Gustafson and Tete Xiao and Spencer Whitehead and Alexander C. Berg and Wan-Yen Lo and Piotr Dollár and Ross Girshick},
+              year={2023},
+              eprint={2304.02643},
+              archivePrefix={arXiv},
+              primaryClass={cs.CV}
+        }
+        ```
 
 We would like to express our gratitude to Meta AI for creating and maintaining this valuable resource for the computer vision community.