Skip to content

Commit

Permalink
Improved Docs models Usage examples (ultralytics#4214)
Browse files Browse the repository at this point in the history
  • Loading branch information
glenn-jocher authored Aug 7, 2023
1 parent 9a2c069 commit ff5fa57
Show file tree
Hide file tree
Showing 15 changed files with 420 additions and 223 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/greetings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our [Tips for Best Training Results](https://docs.ultralytics.com/yolov5/tutorials/tips_for_best_training_results/).
Join the vibrant [Ultralytics Discord](https://discord.gg/YVsATxj6wr) 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.
Join the vibrant [Ultralytics Discord](https://ultralytics.com/discord) 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.
## Install
Expand Down
24 changes: 14 additions & 10 deletions docs/models/fast-sam.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,15 +155,19 @@ Additionally, you can try FastSAM through a [Colab demo](https://colab.research.

We would like to acknowledge the FastSAM authors for their significant contributions in the field of real-time instance segmentation:

```bibtex
@misc{zhao2023fast,
title={Fast Segment Anything},
author={Xu Zhao and Wenchao Ding and Yongqi An and Yinglong Du and Tao Yu and Min Li and Ming Tang and Jinqiao Wang},
year={2023},
eprint={2306.12156},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
!!! note ""

=== "BibTeX"

```bibtex
@misc{zhao2023fast,
title={Fast Segment Anything},
author={Xu Zhao and Wenchao Ding and Yongqi An and Yinglong Du and Tao Yu and Min Li and Ming Tang and Jinqiao Wang},
year={2023},
eprint={2306.12156},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

The original FastSAM paper can be found on [arXiv](https://arxiv.org/abs/2306.12156). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/CASIA-IVA-Lab/FastSAM). We appreciate their efforts in advancing the field and making their work accessible to the broader community.
53 changes: 36 additions & 17 deletions docs/models/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,32 +17,51 @@ In this documentation, we provide information on four major models:
5. [YOLOv7](./yolov7.md): Updated YOLO models released in 2022 by the authors of YOLOv4.
6. [YOLOv8](./yolov8.md): The latest version of the YOLO family, featuring enhanced capabilities such as instance segmentation, pose/keypoints estimation, and classification.
7. [Segment Anything Model (SAM)](./sam.md): Meta's Segment Anything Model (SAM).
7. [Mobile Segment Anything Model (MobileSAM)](./mobile-sam.md): MobileSAM for mobile applications by Kyung Hee University.
8. [Fast Segment Anything Model (FastSAM)](./fast-sam.md): FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
9. [YOLO-NAS](./yolo-nas.md): YOLO Neural Architecture Search (NAS) Models.
10. [Realtime Detection Transformers (RT-DETR)](./rtdetr.md): Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.
8. [Mobile Segment Anything Model (MobileSAM)](./mobile-sam.md): MobileSAM for mobile applications by Kyung Hee University.
9. [Fast Segment Anything Model (FastSAM)](./fast-sam.md): FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
10. [YOLO-NAS](./yolo-nas.md): YOLO Neural Architecture Search (NAS) Models.
11. [Realtime Detection Transformers (RT-DETR)](./rtdetr.md): Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.

You can use many of these models directly in the Command Line Interface (CLI) or in a Python environment. Below are examples of how to use the models with CLI and Python:

## CLI Example
## Usage

Use the `model` argument to pass a model YAML such as `model=yolov8n.yaml` or a pretrained *.pt file such as `model=yolov8n.pt`
You can use RT-DETR for object detection tasks using the `ultralytics` pip package. The following is a sample code snippet showing how to use RT-DETR models for training and inference:

```bash
yolo task=detect mode=train model=yolov8n.pt data=coco128.yaml epochs=100
```
!!! example ""

## Python Example
This example provides simple inference code for YOLO, SAM and RTDETR models. For more options including handling inference results see [Predict](../modes/predict.md) mode. For using models with additional modes see [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md).

PyTorch pretrained models as well as model YAML files can also be passed to the `YOLO()`, `SAM()`, `NAS()` and `RTDETR()` classes to create a model instance in python:
=== "Python"

```python
from ultralytics import YOLO
PyTorch pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLO()`, `SAM()`, `NAS()` and `RTDETR()` classes to create a model instance in python:

model = YOLO("yolov8n.pt") # load a pretrained YOLOv8n model
```python
from ultralytics import YOLO

model.info() # display model information
model.train(data="coco128.yaml", epochs=100) # train the model
```
# Load a COCO-pretrained YOLOv8n model
model = YOLO('yolov8n.pt')

# Display model information (optional)
model.info()

# Train the model on the COCO8 example dataset for 100 epochs
results model.train(data='coco8.yaml', epochs=100, imgsz=640)

# Run inference with the YOLOv8n model on the 'bus.jpg' image
results = model('path/to/bus.jpg')
```

=== "CLI"

CLI commands are available to directly run the models:

```bash
# Load a COCO-pretrained YOLOv8n model and train it on the COCO8 example dataset for 100 epochs
yolo train model=yolov8n.pt data=coco8.yaml epochs=100 imgsz=640

# Load a COCO-pretrained YOLOv8n model and run inference on the 'bus.jpg' image
yolo predict model=yolov8n.pt source=path/to/bus.jpg
```

For more details on each model, their supported tasks, modes, and performance, please visit their respective documentation pages linked above.
24 changes: 14 additions & 10 deletions docs/models/mobile-sam.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The MobileSAM paper is now available on [arXiv](https://arxiv.org/pdf/2306.14289

A demonstration of MobileSAM running on a CPU can be accessed at this [demo link](https://huggingface.co/spaces/dhkim2810/MobileSAM). The performance on a Mac i5 CPU takes approximately 3 seconds. On the Hugging Face demo, the interface and lower-performance CPUs contribute to a slower response, but it continues to function effectively.

MobileSAM is implemented in various projects including [Grounding-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything), [AnyLabeling](https://github.com/vietanhdev/anylabeling), and [SegmentAnythingin3D](https://github.com/Jumpat/SegmentAnythingin3D).
MobileSAM is implemented in various projects including [Grounding-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything), [AnyLabeling](https://github.com/vietanhdev/anylabeling), and [Segment Anything in 3D](https://github.com/Jumpat/SegmentAnythingin3D).

MobileSAM is trained on a single GPU with a 100k dataset (1% of the original images) in less than a day. The code for this training will be made available in the future.

Expand Down Expand Up @@ -85,15 +85,19 @@ model.predict('ultralytics/assets/zidane.jpg', bboxes=[439, 437, 524, 709])

We have implemented `MobileSAM` and `SAM` using the same API. For more usage information, please see the [SAM page](./sam.md).

### Citing MobileSAM
## Citations and Acknowledgements

If you find MobileSAM useful in your research or development work, please consider citing our paper:

```bibtex
@article{mobile_sam,
title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
journal={arXiv preprint arXiv:2306.14289},
year={2023}
}
```
!!! note ""

=== "BibTeX"

```bibtex
@article{mobile_sam,
title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
journal={arXiv preprint arXiv:2306.14289},
year={2023}
}
```
67 changes: 47 additions & 20 deletions docs/models/rtdetr.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Real-Time Detection Transformer (RT-DETR), developed by Baidu, is a cutting-edge

### Key Features

- **Efficient Hybrid Encoder:** Baidu's RT-DETR uses an efficient hybrid encoder that processes multi-scale features by decoupling intra-scale interaction and cross-scale fusion. This unique Vision Transformers-based design reduces computational costs and allows for real-time object detection.
- **Efficient Hybrid Encoder:** Baidu's RT-DETR uses an efficient hybrid encoder that processes multiscale features by decoupling intra-scale interaction and cross-scale fusion. This unique Vision Transformers-based design reduces computational costs and allows for real-time object detection.
- **IoU-aware Query Selection:** Baidu's RT-DETR improves object query initialization by utilizing IoU-aware query selection. This allows the model to focus on the most relevant objects in the scene, enhancing the detection accuracy.
- **Adaptable Inference Speed:** Baidu's RT-DETR supports flexible adjustments of inference speed by using different decoder layers without the need for retraining. This adaptability facilitates practical application in various real-time object detection scenarios.

Expand All @@ -28,16 +28,39 @@ The Ultralytics Python API provides pre-trained PaddlePaddle RT-DETR models with

## Usage

### Python API
You can use RT-DETR for object detection tasks using the `ultralytics` pip package. The following is a sample code snippet showing how to use RT-DETR models for training and inference:

```python
from ultralytics import RTDETR
!!! example ""

model = RTDETR("rtdetr-l.pt")
model.info() # display model information
model.train(data="coco8.yaml") # train
model.predict("path/to/image.jpg") # predict
```
This example provides simple inference code for RT-DETR. For more options including handling inference results see [Predict](../modes/predict.md) mode. For using RT-DETR with additional modes see [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md).

=== "Python"

```python
from ultralytics import RTDETR

# Load a COCO-pretrained RT-DETR-l model
model = RTDETR('rtdetr-l.pt')

# Display model information (optional)
model.info()

# Train the model on the COCO8 example dataset for 100 epochs
results model.train(data='coco8.yaml', epochs=100, imgsz=640)

# Run inference with the RT-DETR-l model on the 'bus.jpg' image
results = model('path/to/bus.jpg')
```

=== "CLI"

```bash
# Load a COCO-pretrained RT-DETR-l model and train it on the COCO8 example dataset for 100 epochs
yolo train model=rtdetr-l.pt data=coco8.yaml epochs=100 imgsz=640

# Load a COCO-pretrained RT-DETR-l model and run inference on the 'bus.jpg' image
yolo predict model=rtdetr-l.pt source=path/to/bus.jpg
```

### Supported Tasks

Expand All @@ -54,20 +77,24 @@ model.predict("path/to/image.jpg") # predict
| Validation | :heavy_check_mark: |
| Training | :heavy_check_mark: |

# Citations and Acknowledgements
## Citations and Acknowledgements

If you use Baidu's RT-DETR in your research or development work, please cite the [original paper](https://arxiv.org/abs/2304.08069):

```bibtex
@misc{lv2023detrs,
title={DETRs Beat YOLOs on Real-time Object Detection},
author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
year={2023},
eprint={2304.08069},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
!!! note ""

=== "BibTeX"

```bibtex
@misc{lv2023detrs,
title={DETRs Beat YOLOs on Real-time Object Detection},
author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
year={2023},
eprint={2304.08069},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

We would like to acknowledge Baidu and the [PaddlePaddle](https://github.com/PaddlePaddle/PaddleDetection) team for creating and maintaining this valuable resource for the computer vision community. Their contribution to the field with the development of the Vision Transformers-based real-time object detector, RT-DETR, is greatly appreciated.

Expand Down
39 changes: 22 additions & 17 deletions docs/models/sam.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ The Segment Anything Model can be employed for a multitude of downstream tasks t
# Run inference
model('path/to/image.jpg')
```

=== "CLI"

```bash
Expand Down Expand Up @@ -99,6 +100,7 @@ The Segment Anything Model can be employed for a multitude of downstream tasks t
predictor.set_image(cv2.imread("ultralytics/assets/zidane.jpg")) # set with np.ndarray
results = predictor(bboxes=[439, 437, 524, 709])
results = predictor(points=[900, 370], labels=[1])

# Reset image
predictor.reset_image()
```
Expand All @@ -114,9 +116,8 @@ The Segment Anything Model can be employed for a multitude of downstream tasks t
overrides = dict(conf=0.25, task='segment', mode='predict', imgsz=1024, model="mobile_sam.pt")
predictor = SAMPredictor(overrides=overrides)

# segment with additional args
# Segment with additional args
results = predictor(source="ultralytics/assets/zidane.jpg", crop_n_layers=1, points_stride=64)

```

- More additional args for `Segment everything` see [`Predictor/generate` Reference](../reference/models/sam/predict.md).
Expand All @@ -140,11 +141,11 @@ The Segment Anything Model can be employed for a multitude of downstream tasks t

Here we compare Meta's smallest SAM model, SAM-b, with Ultralytics smallest segmentation model, [YOLOv8n-seg](../tasks/segment.md):

| Model | Size | Parameters | Speed (CPU) |
|------------------------------------------------|----------------------------|------------------------|-------------------------|
| Meta's SAM-b | 358 MB | 94.7 M | 51096 ms/im |
| [MobileSAM](mobile-sam.md) | 40.7 MB | 10.1 M | 46122 ms/im |
| [FastSAM-s](fast-sam.md) with YOLOv8 backbone | 23.7 MB | 11.8 M | 115 ms/im |
| Model | Size | Parameters | Speed (CPU) |
|------------------------------------------------|----------------------------|------------------------|----------------------------|
| Meta's SAM-b | 358 MB | 94.7 M | 51096 ms/im |
| [MobileSAM](mobile-sam.md) | 40.7 MB | 10.1 M | 46122 ms/im |
| [FastSAM-s](fast-sam.md) with YOLOv8 backbone | 23.7 MB | 11.8 M | 115 ms/im |
| Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7 MB** (53.4x smaller) | **3.4 M** (27.9x less) | **59 ms/im** (866x faster) |

This comparison shows the order-of-magnitude differences in the model sizes and speeds between models. Whereas SAM presents unique capabilities for automatic segmenting, it is not a direct competitor to YOLOv8 segment models, which are smaller, faster and more efficient.
Expand Down Expand Up @@ -205,16 +206,20 @@ Auto-annotation with pre-trained models can dramatically cut down the time and e

If you find SAM useful in your research or development work, please consider citing our paper:

```bibtex
@misc{kirillov2023segment,
title={Segment Anything},
author={Alexander Kirillov and Eric Mintun and Nikhila Ravi and Hanzi Mao and Chloe Rolland and Laura Gustafson and Tete Xiao and Spencer Whitehead and Alexander C. Berg and Wan-Yen Lo and Piotr Dollár and Ross Girshick},
year={2023},
eprint={2304.02643},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
!!! note ""

=== "BibTeX"

```bibtex
@misc{kirillov2023segment,
title={Segment Anything},
author={Alexander Kirillov and Eric Mintun and Nikhila Ravi and Hanzi Mao and Chloe Rolland and Laura Gustafson and Tete Xiao and Spencer Whitehead and Alexander C. Berg and Wan-Yen Lo and Piotr Dollár and Ross Girshick},
year={2023},
eprint={2304.02643},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

We would like to express our gratitude to Meta AI for creating and maintaining this valuable resource for the computer vision community.

Expand Down
Loading

0 comments on commit ff5fa57

Please sign in to comment.