commit
fea03cf22d
501 changed files with 25367 additions and 12463 deletions
@ -0,0 +1,49 @@ |
||||
# Ultralytics YOLO 🚀, AGPL-3.0 license |
||||
# Builds ultralytics/ultralytics:latest-cpu image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics |
||||
# Image is CPU-optimized for ONNX, OpenVINO and PyTorch YOLOv8 deployments |
||||
|
||||
# Use the official Python 3.10 slim-bookworm as base image |
||||
FROM python:3.10-slim-bookworm |
||||
|
||||
# Downloads to user config dir |
||||
ADD https://ultralytics.com/assets/Arial.ttf https://ultralytics.com/assets/Arial.Unicode.ttf /root/.config/Ultralytics/ |
||||
|
||||
# Install linux packages |
||||
# g++ required to build 'tflite_support' and 'lap' packages, libusb-1.0-0 required for 'tflite_support' package |
||||
RUN apt update \ |
||||
&& apt install --no-install-recommends -y python3-pip git zip curl htop libgl1-mesa-glx libglib2.0-0 libpython3-dev gnupg g++ libusb-1.0-0 |
||||
# RUN alias python=python3 |
||||
|
||||
# Create working directory |
||||
WORKDIR /usr/src/ultralytics |
||||
|
||||
# Copy contents |
||||
# COPY . /usr/src/app (issues as not a .git directory) |
||||
RUN git clone https://github.com/ultralytics/ultralytics /usr/src/ultralytics |
||||
ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt /usr/src/ultralytics/ |
||||
|
||||
# Remove python3.11/EXTERNALLY-MANAGED or use 'pip install --break-system-packages' avoid 'externally-managed-environment' Ubuntu nightly error |
||||
# RUN rm -rf /usr/lib/python3.11/EXTERNALLY-MANAGED |
||||
|
||||
# Install pip packages |
||||
RUN python3 -m pip install --upgrade pip wheel |
||||
RUN pip install --no-cache -e ".[export]" thop --extra-index-url https://download.pytorch.org/whl/cpu |
||||
|
||||
# Run exports to AutoInstall packages |
||||
RUN yolo export model=tmp/yolov8n.pt format=edgetpu imgsz=32 |
||||
RUN yolo export model=tmp/yolov8n.pt format=ncnn imgsz=32 |
||||
# Requires <= Python 3.10, bug with paddlepaddle==2.5.0 |
||||
RUN pip install --no-cache paddlepaddle==2.4.2 x2paddle |
||||
# Remove exported models |
||||
RUN rm -rf tmp |
||||
|
||||
# Usage Examples ------------------------------------------------------------------------------------------------------- |
||||
|
||||
# Build and Push |
||||
# t=ultralytics/ultralytics:latest-python && sudo docker build -f docker/Dockerfile-python -t $t . && sudo docker push $t |
||||
|
||||
# Run |
||||
# t=ultralytics/ultralytics:latest-python && sudo docker run -it --ipc=host $t |
||||
|
||||
# Pull and Run with local volume mounted |
||||
# t=ultralytics/ultralytics:latest-python && sudo docker pull $t && sudo docker run -it --ipc=host -v "$(pwd)"/datasets:/usr/src/datasets $t |
@ -1 +1 @@ |
||||
docs.ultralytics.com |
||||
docs.ultralytics.com |
||||
|
@ -1,28 +1,26 @@ |
||||
# Security Policy |
||||
--- |
||||
description: Discover how Ultralytics ensures the safety of user data and systems. Check out the measures we have implemented, including Snyk and GitHub CodeQL Scanning. |
||||
keywords: Ultralytics, Security Policy, data security, open-source projects, Snyk scanning, CodeQL scanning, vulnerability detection, threat prevention |
||||
--- |
||||
|
||||
At [Ultralytics](https://ultralytics.com), the security of our users' data and systems is of utmost importance. To |
||||
ensure the safety and security of our [open-source projects](https://github.com/ultralytics), we have implemented |
||||
several measures to detect and prevent security vulnerabilities. |
||||
# Security Policy |
||||
|
||||
[](https://snyk.io/advisor/python/ultralytics) |
||||
At [Ultralytics](https://ultralytics.com), the security of our users' data and systems is of utmost importance. To ensure the safety and security of our [open-source projects](https://github.com/ultralytics), we have implemented several measures to detect and prevent security vulnerabilities. |
||||
|
||||
## Snyk Scanning |
||||
|
||||
We use [Snyk](https://snyk.io/advisor/python/ultralytics) to regularly scan the YOLOv8 repository for vulnerabilities |
||||
and security issues. Our goal is to identify and remediate any potential threats as soon as possible, to minimize any |
||||
risks to our users. |
||||
We use [Snyk](https://snyk.io/advisor/python/ultralytics) to regularly scan all Ultralytics repositories for vulnerabilities and security issues. Our goal is to identify and remediate any potential threats as soon as possible, to minimize any risks to our users. |
||||
|
||||
[](https://snyk.io/advisor/python/ultralytics) |
||||
|
||||
## GitHub CodeQL Scanning |
||||
|
||||
In addition to our Snyk scans, we also use |
||||
GitHub's [CodeQL](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/about-code-scanning-with-codeql) |
||||
scans to proactively identify and address security vulnerabilities. |
||||
In addition to our Snyk scans, we also use GitHub's [CodeQL](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/about-code-scanning-with-codeql) scans to proactively identify and address security vulnerabilities across all Ultralytics repositories. |
||||
|
||||
[](https://github.com/ultralytics/ultralytics/actions/workflows/codeql.yaml) |
||||
|
||||
## Reporting Security Issues |
||||
|
||||
If you suspect or discover a security vulnerability in the YOLOv8 repository, please let us know immediately. You can |
||||
reach out to us directly via our [contact form](https://ultralytics.com/contact) or |
||||
via [security@ultralytics.com](mailto:security@ultralytics.com). Our security team will investigate and respond as soon |
||||
as possible. |
||||
If you suspect or discover a security vulnerability in any of our repositories, please let us know immediately. You can reach out to us directly via our [contact form](https://ultralytics.com/contact) or via [security@ultralytics.com](mailto:security@ultralytics.com). Our security team will investigate and respond as soon as possible. |
||||
|
||||
We appreciate your help in keeping the YOLOv8 repository secure and safe for everyone. |
||||
We appreciate your help in keeping all Ultralytics open-source projects secure and safe for everyone. |
||||
|
@ -0,0 +1,81 @@ |
||||
--- |
||||
comments: true |
||||
description: Learn about the Caltech-101 dataset, its structure and uses in machine learning. Includes instructions to train a YOLO model using this dataset. |
||||
keywords: Caltech-101, dataset, YOLO training, machine learning, object recognition, ultralytics |
||||
--- |
||||
|
||||
# Caltech-101 Dataset |
||||
|
||||
The [Caltech-101](https://data.caltech.edu/records/mzrjq-6wc02) dataset is a widely used dataset for object recognition tasks, containing around 9,000 images from 101 object categories. The categories were chosen to reflect a variety of real-world objects, and the images themselves were carefully selected and annotated to provide a challenging benchmark for object recognition algorithms. |
||||
|
||||
## Key Features |
||||
|
||||
- The Caltech-101 dataset comprises around 9,000 color images divided into 101 categories. |
||||
- The categories encompass a wide variety of objects, including animals, vehicles, household items, and people. |
||||
- The number of images per category varies, with about 40 to 800 images in each category. |
||||
- Images are of variable sizes, with most images being medium resolution. |
||||
- Caltech-101 is widely used for training and testing in the field of machine learning, particularly for object recognition tasks. |
||||
|
||||
## Dataset Structure |
||||
|
||||
Unlike many other datasets, the Caltech-101 dataset is not formally split into training and testing sets. Users typically create their own splits based on their specific needs. However, a common practice is to use a random subset of images for training (e.g., 30 images per category) and the remaining images for testing. |
||||
|
||||
## Applications |
||||
|
||||
The Caltech-101 dataset is extensively used for training and evaluating deep learning models in object recognition tasks, such as Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and various other machine learning algorithms. Its wide variety of categories and high-quality images make it an excellent dataset for research and development in the field of machine learning and computer vision. |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLO model on the Caltech-101 dataset for 100 epochs, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='caltech101', epochs=100, imgsz=416) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=caltech101 model=yolov8n-cls.pt epochs=100 imgsz=416 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The Caltech-101 dataset contains high-quality color images of various objects, providing a well-structured dataset for object recognition tasks. Here are some examples of images from the dataset: |
||||
|
||||
 |
||||
|
||||
The example showcases the variety and complexity of the objects in the Caltech-101 dataset, emphasizing the significance of a diverse dataset for training robust object recognition models. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the Caltech-101 dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@article{fei2007learning, |
||||
title={Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories}, |
||||
author={Fei-Fei, Li and Fergus, Rob and Perona, Pietro}, |
||||
journal={Computer vision and Image understanding}, |
||||
volume={106}, |
||||
number={1}, |
||||
pages={59--70}, |
||||
year={2007}, |
||||
publisher={Elsevier} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge Li Fei-Fei, Rob Fergus, and Pietro Perona for creating and maintaining the Caltech-101 dataset as a valuable resource for the machine learning and computer vision research community. For more information about the Caltech-101 dataset and its creators, visit the [Caltech-101 dataset website](https://data.caltech.edu/records/mzrjq-6wc02). |
@ -0,0 +1,78 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore the Caltech-256 dataset, a diverse collection of images used for object recognition tasks in machine learning. Learn to train a YOLO model on the dataset. |
||||
keywords: Ultralytics, YOLO, Caltech-256, dataset, object recognition, machine learning, computer vision, deep learning |
||||
--- |
||||
|
||||
# Caltech-256 Dataset |
||||
|
||||
The [Caltech-256](https://data.caltech.edu/records/nyy15-4j048) dataset is an extensive collection of images used for object classification tasks. It contains around 30,000 images divided into 257 categories (256 object categories and 1 background category). The images are carefully curated and annotated to provide a challenging and diverse benchmark for object recognition algorithms. |
||||
|
||||
## Key Features |
||||
|
||||
- The Caltech-256 dataset comprises around 30,000 color images divided into 257 categories. |
||||
- Each category contains a minimum of 80 images. |
||||
- The categories encompass a wide variety of real-world objects, including animals, vehicles, household items, and people. |
||||
- Images are of variable sizes and resolutions. |
||||
- Caltech-256 is widely used for training and testing in the field of machine learning, particularly for object recognition tasks. |
||||
|
||||
## Dataset Structure |
||||
|
||||
Like Caltech-101, the Caltech-256 dataset does not have a formal split between training and testing sets. Users typically create their own splits according to their specific needs. A common practice is to use a random subset of images for training and the remaining images for testing. |
||||
|
||||
## Applications |
||||
|
||||
The Caltech-256 dataset is extensively used for training and evaluating deep learning models in object recognition tasks, such as Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and various other machine learning algorithms. Its diverse set of categories and high-quality images make it an invaluable dataset for research and development in the field of machine learning and computer vision. |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLO model on the Caltech-256 dataset for 100 epochs, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='caltech256', epochs=100, imgsz=416) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=caltech256 model=yolov8n-cls.pt epochs=100 imgsz=416 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The Caltech-256 dataset contains high-quality color images of various objects, providing a comprehensive dataset for object recognition tasks. Here are some examples of images from the dataset ([credit](https://ml4a.github.io/demos/tsne_viewer.html)): |
||||
|
||||
 |
||||
|
||||
The example showcases the diversity and complexity of the objects in the Caltech-256 dataset, emphasizing the importance of a varied dataset for training robust object recognition models. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the Caltech-256 dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@article{griffin2007caltech, |
||||
title={Caltech-256 object category dataset}, |
||||
author={Griffin, Gregory and Holub, Alex and Perona, Pietro}, |
||||
year={2007} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge Gregory Griffin, Alex Holub, and Pietro Perona for creating and maintaining the Caltech-256 dataset as a valuable resource for the machine learning and computer vision research community. For more information about the |
||||
|
||||
Caltech-256 dataset and its creators, visit the [Caltech-256 dataset website](https://data.caltech.edu/records/nyy15-4j048). |
@ -0,0 +1,80 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore the CIFAR-10 dataset, widely used for training in machine learning and computer vision, and learn how to use it with Ultralytics YOLO. |
||||
keywords: CIFAR-10, dataset, machine learning, image classification, computer vision, YOLO, Ultralytics, training, testing, deep learning, Convolutional Neural Networks, Support Vector Machines |
||||
--- |
||||
|
||||
# CIFAR-10 Dataset |
||||
|
||||
The [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) (Canadian Institute For Advanced Research) dataset is a collection of images used widely for machine learning and computer vision algorithms. It was developed by researchers at the CIFAR institute and consists of 60,000 32x32 color images in 10 different classes. |
||||
|
||||
## Key Features |
||||
|
||||
- The CIFAR-10 dataset consists of 60,000 images, divided into 10 classes. |
||||
- Each class contains 6,000 images, split into 5,000 for training and 1,000 for testing. |
||||
- The images are colored and of size 32x32 pixels. |
||||
- The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. |
||||
- CIFAR-10 is commonly used for training and testing in the field of machine learning and computer vision. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The CIFAR-10 dataset is split into two subsets: |
||||
|
||||
1. **Training Set**: This subset contains 50,000 images used for training machine learning models. |
||||
2. **Testing Set**: This subset consists of 10,000 images used for testing and benchmarking the trained models. |
||||
|
||||
## Applications |
||||
|
||||
The CIFAR-10 dataset is widely used for training and evaluating deep learning models in image classification tasks, such as Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and various other machine learning algorithms. The diversity of the dataset in terms of classes and the presence of color images make it a well-rounded dataset for research and development in the field of machine learning and computer vision. |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLO model on the CIFAR-10 dataset for 100 epochs with an image size of 32x32, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='cifar10', epochs=100, imgsz=32) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=cifar10 model=yolov8n-cls.pt epochs=100 imgsz=32 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The CIFAR-10 dataset contains color images of various objects, providing a well-structured dataset for image classification tasks. Here are some examples of images from the dataset: |
||||
|
||||
 |
||||
|
||||
The example showcases the variety and complexity of the objects in the CIFAR-10 dataset, highlighting the importance of a diverse dataset for training robust image classification models. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the CIFAR-10 dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@TECHREPORT{Krizhevsky09learningmultiple, |
||||
author={Alex Krizhevsky}, |
||||
title={Learning multiple layers of features from tiny images}, |
||||
institution={}, |
||||
year={2009} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge Alex Krizhevsky for creating and maintaining the CIFAR-10 dataset as a valuable resource for the machine learning and computer vision research community. For more information about the CIFAR-10 dataset and its creator, visit the [CIFAR-10 dataset website](https://www.cs.toronto.edu/~kriz/cifar.html). |
@ -0,0 +1,80 @@ |
||||
--- |
||||
comments: true |
||||
description: Discover how to leverage the CIFAR-100 dataset for machine learning and computer vision tasks with YOLO. Gain insights on its structure, use, and utilization for model training. |
||||
keywords: Ultralytics, YOLO, CIFAR-100 dataset, image classification, machine learning, computer vision, YOLO model training |
||||
--- |
||||
|
||||
# CIFAR-100 Dataset |
||||
|
||||
The [CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html) (Canadian Institute For Advanced Research) dataset is a significant extension of the CIFAR-10 dataset, composed of 60,000 32x32 color images in 100 different classes. It was developed by researchers at the CIFAR institute, offering a more challenging dataset for more complex machine learning and computer vision tasks. |
||||
|
||||
## Key Features |
||||
|
||||
- The CIFAR-100 dataset consists of 60,000 images, divided into 100 classes. |
||||
- Each class contains 600 images, split into 500 for training and 100 for testing. |
||||
- The images are colored and of size 32x32 pixels. |
||||
- The 100 different classes are grouped into 20 coarse categories for higher level classification. |
||||
- CIFAR-100 is commonly used for training and testing in the field of machine learning and computer vision. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The CIFAR-100 dataset is split into two subsets: |
||||
|
||||
1. **Training Set**: This subset contains 50,000 images used for training machine learning models. |
||||
2. **Testing Set**: This subset consists of 10,000 images used for testing and benchmarking the trained models. |
||||
|
||||
## Applications |
||||
|
||||
The CIFAR-100 dataset is extensively used for training and evaluating deep learning models in image classification tasks, such as Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and various other machine learning algorithms. The diversity of the dataset in terms of classes and the presence of color images make it a more challenging and comprehensive dataset for research and development in the field of machine learning and computer vision. |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLO model on the CIFAR-100 dataset for 100 epochs with an image size of 32x32, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='cifar100', epochs=100, imgsz=32) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=cifar100 model=yolov8n-cls.pt epochs=100 imgsz=32 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The CIFAR-100 dataset contains color images of various objects, providing a well-structured dataset for image classification tasks. Here are some examples of images from the dataset: |
||||
|
||||
 |
||||
|
||||
The example showcases the variety and complexity of the objects in the CIFAR-100 dataset, highlighting the importance of a diverse dataset for training robust image classification models. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the CIFAR-100 dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@TECHREPORT{Krizhevsky09learningmultiple, |
||||
author={Alex Krizhevsky}, |
||||
title={Learning multiple layers of features from tiny images}, |
||||
institution={}, |
||||
year={2009} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge Alex Krizhevsky for creating and maintaining the CIFAR-100 dataset as a valuable resource for the machine learning and computer vision research community. For more information about the CIFAR-100 dataset and its creator, visit the [CIFAR-100 dataset website](https://www.cs.toronto.edu/~kriz/cifar.html). |
@ -0,0 +1,79 @@ |
||||
--- |
||||
comments: true |
||||
description: Learn how to use the Fashion-MNIST dataset for image classification with the Ultralytics YOLO model. Covers dataset structure, labels, applications, and usage. |
||||
keywords: Ultralytics, YOLO, Fashion-MNIST, dataset, image classification, machine learning, deep learning, neural networks, training, testing |
||||
--- |
||||
|
||||
# Fashion-MNIST Dataset |
||||
|
||||
The [Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset is a database of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. |
||||
|
||||
## Key Features |
||||
|
||||
- Fashion-MNIST contains 60,000 training images and 10,000 testing images of Zalando's article images. |
||||
- The dataset comprises grayscale images of size 28x28 pixels. |
||||
- Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255. |
||||
- Fashion-MNIST is widely used for training and testing in the field of machine learning, especially for image classification tasks. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The Fashion-MNIST dataset is split into two subsets: |
||||
|
||||
1. **Training Set**: This subset contains 60,000 images used for training machine learning models. |
||||
2. **Testing Set**: This subset consists of 10,000 images used for testing and benchmarking the trained models. |
||||
|
||||
## Labels |
||||
|
||||
Each training and test example is assigned to one of the following labels: |
||||
|
||||
0. T-shirt/top |
||||
1. Trouser |
||||
2. Pullover |
||||
3. Dress |
||||
4. Coat |
||||
5. Sandal |
||||
6. Shirt |
||||
7. Sneaker |
||||
8. Bag |
||||
9. Ankle boot |
||||
|
||||
## Applications |
||||
|
||||
The Fashion-MNIST dataset is widely used for training and evaluating deep learning models in image classification tasks, such as Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and various other machine learning algorithms. The dataset's simple and well-structured format makes it an essential resource for researchers and practitioners in the field of machine learning and computer vision. |
||||
|
||||
## Usage |
||||
|
||||
To train a CNN model on the Fashion-MNIST dataset for 100 epochs with an image size of 28x28, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='fashion-mnist', epochs=100, imgsz=28) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=fashion-mnist model=yolov8n-cls.pt epochs=100 imgsz=28 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The Fashion-MNIST dataset contains grayscale images of Zalando's article images, providing a well-structured dataset for image classification tasks. Here are some examples of images from the dataset: |
||||
|
||||
 |
||||
|
||||
The example showcases the variety and complexity of the images in the Fashion-MNIST dataset, highlighting the importance of a diverse dataset for training robust image classification models. |
||||
|
||||
## Acknowledgments |
||||
|
||||
If you use the Fashion-MNIST dataset in your research or development work, please acknowledge the dataset by linking to the [GitHub repository](https://github.com/zalandoresearch/fashion-mnist). This dataset was made available by Zalando Research. |
@ -0,0 +1,83 @@ |
||||
--- |
||||
comments: true |
||||
description: Understand how to use ImageNet, an extensive annotated image dataset for object recognition research, with Ultralytics YOLO models. Learn about its structure, usage, and significance in computer vision. |
||||
keywords: Ultralytics, YOLO, ImageNet, dataset, object recognition, deep learning, computer vision, machine learning, dataset training, model training, image classification, object detection |
||||
--- |
||||
|
||||
# ImageNet Dataset |
||||
|
||||
[ImageNet](https://www.image-net.org/) is a large-scale database of annotated images designed for use in visual object recognition research. It contains over 14 million images, with each image annotated using WordNet synsets, making it one of the most extensive resources available for training deep learning models in computer vision tasks. |
||||
|
||||
## Key Features |
||||
|
||||
- ImageNet contains over 14 million high-resolution images spanning thousands of object categories. |
||||
- The dataset is organized according to the WordNet hierarchy, with each synset representing a category. |
||||
- ImageNet is widely used for training and benchmarking in the field of computer vision, particularly for image classification and object detection tasks. |
||||
- The annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) has been instrumental in advancing computer vision research. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The ImageNet dataset is organized using the WordNet hierarchy. Each node in the hierarchy represents a category, and each category is described by a synset (a collection of synonymous terms). The images in ImageNet are annotated with one or more synsets, providing a rich resource for training models to recognize various objects and their relationships. |
||||
|
||||
## ImageNet Large Scale Visual Recognition Challenge (ILSVRC) |
||||
|
||||
The annual [ImageNet Large Scale Visual Recognition Challenge (ILSVRC)](http://image-net.org/challenges/LSVRC/) has been an important event in the field of computer vision. It has provided a platform for researchers and developers to evaluate their algorithms and models on a large-scale dataset with standardized evaluation metrics. The ILSVRC has led to significant advancements in the development of deep learning models for image classification, object detection, and other computer vision tasks. |
||||
|
||||
## Applications |
||||
|
||||
The ImageNet dataset is widely used for training and evaluating deep learning models in various computer vision tasks, such as image classification, object detection, and object localization. Some popular deep learning architectures, such as AlexNet, VGG, and ResNet, were developed and benchmarked using the ImageNet dataset. |
||||
|
||||
## Usage |
||||
|
||||
To train a deep learning model on the ImageNet dataset for 100 epochs with an image size of 224x224, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='imagenet', epochs=100, imgsz=224) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo train data=imagenet model=yolov8n-cls.pt epochs=100 imgsz=224 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The ImageNet dataset contains high-resolution images spanning thousands of object categories, providing a diverse and extensive dataset for training and evaluating computer vision models. Here are some examples of images from the dataset: |
||||
|
||||
 |
||||
|
||||
The example showcases the variety and complexity of the images in the ImageNet dataset, highlighting the importance of a diverse dataset for training robust computer vision models. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the ImageNet dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@article{ILSVRC15, |
||||
author = {Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei}, |
||||
title={ImageNet Large Scale Visual Recognition Challenge}, |
||||
year={2015}, |
||||
journal={International Journal of Computer Vision (IJCV)}, |
||||
volume={115}, |
||||
number={3}, |
||||
pages={211-252} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge the ImageNet team, led by Olga Russakovsky, Jia Deng, and Li Fei-Fei, for creating and maintaining the ImageNet dataset as a valuable resource for the machine learning and computer vision research community. For more information about the ImageNet dataset and its creators, visit the [ImageNet website](https://www.image-net.org/). |
@ -0,0 +1,78 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore the compact ImageNet10 Dataset developed by Ultralytics. Ideal for fast testing of computer vision training pipelines and CV model sanity checks. |
||||
keywords: Ultralytics, YOLO, ImageNet10 Dataset, Image detection, Deep Learning, ImageNet, AI model testing, Computer vision, Machine learning |
||||
--- |
||||
|
||||
# ImageNet10 Dataset |
||||
|
||||
The [ImageNet10](https://github.com/ultralytics/yolov5/releases/download/v1.0/imagenet10.zip) dataset is a small-scale subset of the [ImageNet](https://www.image-net.org/) database, developed by [Ultralytics](https://ultralytics.com) and designed for CI tests, sanity checks, and fast testing of training pipelines. This dataset is composed of the first image in the training set and the first image from the validation set of the first 10 classes in ImageNet. Although significantly smaller, it retains the structure and diversity of the original ImageNet dataset. |
||||
|
||||
## Key Features |
||||
|
||||
- ImageNet10 is a compact version of ImageNet, with 20 images representing the first 10 classes of the original dataset. |
||||
- The dataset is organized according to the WordNet hierarchy, mirroring the structure of the full ImageNet dataset. |
||||
- It is ideally suited for CI tests, sanity checks, and rapid testing of training pipelines in computer vision tasks. |
||||
- Although not designed for model benchmarking, it can provide a quick indication of a model's basic functionality and correctness. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The ImageNet10 dataset, like the original ImageNet, is organized using the WordNet hierarchy. Each of the 10 classes in ImageNet10 is described by a synset (a collection of synonymous terms). The images in ImageNet10 are annotated with one or more synsets, providing a compact resource for testing models to recognize various objects and their relationships. |
||||
|
||||
## Applications |
||||
|
||||
The ImageNet10 dataset is useful for quickly testing and debugging computer vision models and pipelines. Its small size allows for rapid iteration, making it ideal for continuous integration tests and sanity checks. It can also be used for fast preliminary testing of new models or changes to existing models before moving on to full-scale testing with the complete ImageNet dataset. |
||||
|
||||
## Usage |
||||
|
||||
To test a deep learning model on the ImageNet10 dataset with an image size of 224x224, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Test Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='imagenet10', epochs=5, imgsz=224) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo train data=imagenet10 model=yolov8n-cls.pt epochs=5 imgsz=224 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The ImageNet10 dataset contains a subset of images from the original ImageNet dataset. These images are chosen to represent the first 10 classes in the dataset, providing a diverse yet compact dataset for quick testing and evaluation. |
||||
|
||||
 |
||||
The example showcases the variety and complexity of the images in the ImageNet10 dataset, highlighting its usefulness for sanity checks and quick testing of computer vision models. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the ImageNet10 dataset in your research or development work, please cite the original ImageNet paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@article{ILSVRC15, |
||||
author = {Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei}, |
||||
title={ImageNet Large Scale Visual Recognition Challenge}, |
||||
year={2015}, |
||||
journal={International Journal of Computer Vision (IJCV)}, |
||||
volume={115}, |
||||
number={3}, |
||||
pages={211-252} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge the ImageNet team, led by Olga Russakovsky, Jia Deng, and Li Fei-Fei, for creating and maintaining the ImageNet dataset. The ImageNet10 dataset, while a compact subset, is a valuable resource for quick testing and debugging in the machine learning and computer vision research community. For more information about the ImageNet dataset and its creators, visit the [ImageNet website](https://www.image-net.org/). |
@ -0,0 +1,113 @@ |
||||
--- |
||||
comments: true |
||||
description: Learn about the ImageNette dataset and its usage in deep learning model training. Find code snippets for model training and explore ImageNette datatypes. |
||||
keywords: ImageNette dataset, Ultralytics, YOLO, Image classification, Machine Learning, Deep learning, Training code snippets, CNN, ImageNette160, ImageNette320 |
||||
--- |
||||
|
||||
# ImageNette Dataset |
||||
|
||||
The [ImageNette](https://github.com/fastai/imagenette) dataset is a subset of the larger [Imagenet](http://www.image-net.org/) dataset, but it only includes 10 easily distinguishable classes. It was created to provide a quicker, easier-to-use version of Imagenet for software development and education. |
||||
|
||||
## Key Features |
||||
|
||||
- ImageNette contains images from 10 different classes such as tench, English springer, cassette player, chain saw, church, French horn, garbage truck, gas pump, golf ball, parachute. |
||||
- The dataset comprises colored images of varying dimensions. |
||||
- ImageNette is widely used for training and testing in the field of machine learning, especially for image classification tasks. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The ImageNette dataset is split into two subsets: |
||||
|
||||
1. **Training Set**: This subset contains several thousands of images used for training machine learning models. The exact number varies per class. |
||||
2. **Validation Set**: This subset consists of several hundreds of images used for validating and benchmarking the trained models. Again, the exact number varies per class. |
||||
|
||||
## Applications |
||||
|
||||
The ImageNette dataset is widely used for training and evaluating deep learning models in image classification tasks, such as Convolutional Neural Networks (CNNs), and various other machine learning algorithms. The dataset's straightforward format and well-chosen classes make it a handy resource for both beginner and experienced practitioners in the field of machine learning and computer vision. |
||||
|
||||
## Usage |
||||
|
||||
To train a model on the ImageNette dataset for 100 epochs with a standard image size of 224x224, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='imagenette', epochs=100, imgsz=224) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=imagenette model=yolov8n-cls.pt epochs=100 imgsz=224 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The ImageNette dataset contains colored images of various objects and scenes, providing a diverse dataset for image classification tasks. Here are some examples of images from the dataset: |
||||
|
||||
 |
||||
|
||||
The example showcases the variety and complexity of the images in the ImageNette dataset, highlighting the importance of a diverse dataset for training robust image classification models. |
||||
|
||||
## ImageNette160 and ImageNette320 |
||||
|
||||
For faster prototyping and training, the ImageNette dataset is also available in two reduced sizes: ImageNette160 and ImageNette320. These datasets maintain the same classes and structure as the full ImageNette dataset, but the images are resized to a smaller dimension. As such, these versions of the dataset are particularly useful for preliminary model testing, or when computational resources are limited. |
||||
|
||||
To use these datasets, simply replace 'imagenette' with 'imagenette160' or 'imagenette320' in the training command. The following code snippets illustrate this: |
||||
|
||||
!!! example "Train Example with ImageNette160" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model with ImageNette160 |
||||
results = model.train(data='imagenette160', epochs=100, imgsz=160) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model with ImageNette160 |
||||
yolo detect train data=imagenette160 model=yolov8n-cls.pt epochs=100 imgsz=160 |
||||
``` |
||||
|
||||
!!! example "Train Example with ImageNette320" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model with ImageNette320 |
||||
results = model.train(data='imagenette320', epochs=100, imgsz=320) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model with ImageNette320 |
||||
yolo detect train data=imagenette320 model=yolov8n-cls.pt epochs=100 imgsz=320 |
||||
``` |
||||
|
||||
These smaller versions of the dataset allow for rapid iterations during the development process while still providing valuable and realistic image classification tasks. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the ImageNette dataset in your research or development work, please acknowledge it appropriately. For more information about the ImageNette dataset, visit the [ImageNette dataset GitHub page](https://github.com/fastai/imagenette). |
@ -0,0 +1,84 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore the ImageWoof dataset, designed for challenging dog breed classification. Train AI models with Ultralytics YOLO using this dataset. |
||||
keywords: ImageWoof, image classification, dog breeds, machine learning, deep learning, Ultralytics, YOLO, dataset |
||||
--- |
||||
|
||||
# ImageWoof Dataset |
||||
|
||||
The [ImageWoof](https://github.com/fastai/imagenette) dataset is a subset of the ImageNet consisting of 10 classes that are challenging to classify, since they're all dog breeds. It was created as a more difficult task for image classification algorithms to solve, aiming at encouraging development of more advanced models. |
||||
|
||||
## Key Features |
||||
|
||||
- ImageWoof contains images of 10 different dog breeds: Australian terrier, Border terrier, Samoyed, Beagle, Shih-Tzu, English foxhound, Rhodesian ridgeback, Dingo, Golden retriever, and Old English sheepdog. |
||||
- The dataset provides images at various resolutions (full size, 320px, 160px), accommodating for different computational capabilities and research needs. |
||||
- It also includes a version with noisy labels, providing a more realistic scenario where labels might not always be reliable. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The ImageWoof dataset structure is based on the dog breed classes, with each breed having its own directory of images. |
||||
|
||||
## Applications |
||||
|
||||
The ImageWoof dataset is widely used for training and evaluating deep learning models in image classification tasks, especially when it comes to more complex and similar classes. The dataset's challenge lies in the subtle differences between the dog breeds, pushing the limits of model's performance and generalization. |
||||
|
||||
## Usage |
||||
|
||||
To train a CNN model on the ImageWoof dataset for 100 epochs with an image size of 224x224, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='imagewoof', epochs=100, imgsz=224) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=imagewoof model=yolov8n-cls.pt epochs=100 imgsz=224 |
||||
``` |
||||
|
||||
## Dataset Variants |
||||
|
||||
ImageWoof dataset comes in three different sizes to accommodate various research needs and computational capabilities: |
||||
|
||||
1. **Full Size (imagewoof)**: This is the original version of the ImageWoof dataset. It contains full-sized images and is ideal for final training and performance benchmarking. |
||||
|
||||
2. **Medium Size (imagewoof320)**: This version contains images resized to have a maximum edge length of 320 pixels. It's suitable for faster training without significantly sacrificing model performance. |
||||
|
||||
3. **Small Size (imagewoof160)**: This version contains images resized to have a maximum edge length of 160 pixels. It's designed for rapid prototyping and experimentation where training speed is a priority. |
||||
|
||||
To use these variants in your training, simply replace 'imagewoof' in the dataset argument with 'imagewoof320' or 'imagewoof160'. For example: |
||||
|
||||
```python |
||||
# For medium-sized dataset |
||||
model.train(data='imagewoof320', epochs=100, imgsz=224) |
||||
|
||||
# For small-sized dataset |
||||
model.train(data='imagewoof160', epochs=100, imgsz=224) |
||||
``` |
||||
|
||||
It's important to note that using smaller images will likely yield lower performance in terms of classification accuracy. However, it's an excellent way to iterate quickly in the early stages of model development and prototyping. |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The ImageWoof dataset contains colorful images of various dog breeds, providing a challenging dataset for image classification tasks. Here are some examples of images from the dataset: |
||||
|
||||
 |
||||
|
||||
The example showcases the subtle differences and similarities among the different dog breeds in the ImageWoof dataset, highlighting the complexity and difficulty of the classification task. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the ImageWoof dataset in your research or development work, please make sure to acknowledge the creators of the dataset by linking to the [official dataset repository](https://github.com/fastai/imagenette). |
||||
|
||||
We would like to acknowledge the FastAI team for creating and maintaining the ImageWoof dataset as a valuable resource for the machine learning and computer vision research community. For more information about the ImageWoof dataset, visit the [ImageWoof dataset repository](https://github.com/fastai/imagenette). |
@ -0,0 +1,120 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore image classification datasets supported by Ultralytics, learn the standard dataset format, and set up your own dataset for training models. |
||||
keywords: Ultralytics, image classification, dataset, machine learning, CIFAR-10, ImageNet, MNIST, torchvision |
||||
--- |
||||
|
||||
# Image Classification Datasets Overview |
||||
|
||||
## Dataset format |
||||
|
||||
The folder structure for classification datasets in torchvision typically follows a standard format: |
||||
|
||||
``` |
||||
root/ |
||||
|-- class1/ |
||||
| |-- img1.jpg |
||||
| |-- img2.jpg |
||||
| |-- ... |
||||
| |
||||
|-- class2/ |
||||
| |-- img1.jpg |
||||
| |-- img2.jpg |
||||
| |-- ... |
||||
| |
||||
|-- class3/ |
||||
| |-- img1.jpg |
||||
| |-- img2.jpg |
||||
| |-- ... |
||||
| |
||||
|-- ... |
||||
``` |
||||
|
||||
In this folder structure, the `root` directory contains one subdirectory for each class in the dataset. Each subdirectory is named after the corresponding class and contains all the images for that class. Each image file is named uniquely and is typically in a common image file format such as JPEG or PNG. |
||||
|
||||
** Example ** |
||||
|
||||
For example, in the CIFAR10 dataset, the folder structure would look like this: |
||||
|
||||
``` |
||||
cifar-10-/ |
||||
| |
||||
|-- train/ |
||||
| |-- airplane/ |
||||
| | |-- 10008_airplane.png |
||||
| | |-- 10009_airplane.png |
||||
| | |-- ... |
||||
| | |
||||
| |-- automobile/ |
||||
| | |-- 1000_automobile.png |
||||
| | |-- 1001_automobile.png |
||||
| | |-- ... |
||||
| | |
||||
| |-- bird/ |
||||
| | |-- 10014_bird.png |
||||
| | |-- 10015_bird.png |
||||
| | |-- ... |
||||
| | |
||||
| |-- ... |
||||
| |
||||
|-- test/ |
||||
| |-- airplane/ |
||||
| | |-- 10_airplane.png |
||||
| | |-- 11_airplane.png |
||||
| | |-- ... |
||||
| | |
||||
| |-- automobile/ |
||||
| | |-- 100_automobile.png |
||||
| | |-- 101_automobile.png |
||||
| | |-- ... |
||||
| | |
||||
| |-- bird/ |
||||
| | |-- 1000_bird.png |
||||
| | |-- 1001_bird.png |
||||
| | |-- ... |
||||
| | |
||||
| |-- ... |
||||
``` |
||||
|
||||
In this example, the `train` directory contains subdirectories for each class in the dataset, and each class subdirectory contains all the images for that class. The `test` directory has a similar structure. The `root` directory also contains other files that are part of the CIFAR10 dataset. |
||||
|
||||
## Usage |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='path/to/dataset', epochs=100, imgsz=640) |
||||
``` |
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=path/to/data model=yolov8n-cls.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Supported Datasets |
||||
|
||||
Ultralytics supports the following datasets with automatic download: |
||||
|
||||
* [Caltech 101](caltech101.md): A dataset containing images of 101 object categories for image classification tasks. |
||||
* [Caltech 256](caltech256.md): An extended version of Caltech 101 with 256 object categories and more challenging images. |
||||
* [CIFAR-10](cifar10.md): A dataset of 60K 32x32 color images in 10 classes, with 6K images per class. |
||||
* [CIFAR-100](cifar100.md): An extended version of CIFAR-10 with 100 object categories and 600 images per class. |
||||
* [Fashion-MNIST](fashion-mnist.md): A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks. |
||||
* [ImageNet](imagenet.md): A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories. |
||||
* [ImageNet-10](imagenet10.md): A smaller subset of ImageNet with 10 categories for faster experimentation and testing. |
||||
* [Imagenette](imagenette.md): A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing. |
||||
* [Imagewoof](imagewoof.md): A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks. |
||||
* [MNIST](mnist.md): A dataset of 70,000 grayscale images of handwritten digits for image classification tasks. |
||||
|
||||
### Adding your own dataset |
||||
|
||||
If you have your own dataset and would like to use it for training classification models with Ultralytics, ensure that it follows the format specified above under "Dataset format" and then point your `data` argument to the dataset directory. |
@ -0,0 +1,86 @@ |
||||
--- |
||||
comments: true |
||||
description: Detailed guide on the MNIST Dataset, a benchmark in the machine learning community for image classification tasks. Learn about its structure, usage and application. |
||||
keywords: MNIST dataset, Ultralytics, image classification, machine learning, computer vision, deep learning, AI, dataset guide |
||||
--- |
||||
|
||||
# MNIST Dataset |
||||
|
||||
The [MNIST](http://yann.lecun.com/exdb/mnist/) (Modified National Institute of Standards and Technology) dataset is a large database of handwritten digits that is commonly used for training various image processing systems and machine learning models. It was created by "re-mixing" the samples from NIST's original datasets and has become a benchmark for evaluating the performance of image classification algorithms. |
||||
|
||||
## Key Features |
||||
|
||||
- MNIST contains 60,000 training images and 10,000 testing images of handwritten digits. |
||||
- The dataset comprises grayscale images of size 28x28 pixels. |
||||
- The images are normalized to fit into a 28x28 pixel bounding box and anti-aliased, introducing grayscale levels. |
||||
- MNIST is widely used for training and testing in the field of machine learning, especially for image classification tasks. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The MNIST dataset is split into two subsets: |
||||
|
||||
1. **Training Set**: This subset contains 60,000 images of handwritten digits used for training machine learning models. |
||||
2. **Testing Set**: This subset consists of 10,000 images used for testing and benchmarking the trained models. |
||||
|
||||
## Extended MNIST (EMNIST) |
||||
|
||||
Extended MNIST (EMNIST) is a newer dataset developed and released by NIST to be the successor to MNIST. While MNIST included images only of handwritten digits, EMNIST includes all the images from NIST Special Database 19, which is a large database of handwritten uppercase and lowercase letters as well as digits. The images in EMNIST were converted into the same 28x28 pixel format, by the same process, as were the MNIST images. Accordingly, tools that work with the older, smaller MNIST dataset will likely work unmodified with EMNIST. |
||||
|
||||
## Applications |
||||
|
||||
The MNIST dataset is widely used for training and evaluating deep learning models in image classification tasks, such as Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and various other machine learning algorithms. The dataset's simple and well-structured format makes it an essential resource for researchers and practitioners in the field of machine learning and computer vision. |
||||
|
||||
## Usage |
||||
|
||||
To train a CNN model on the MNIST dataset for 100 epochs with an image size of 32x32, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='mnist', epochs=100, imgsz=32) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
cnn detect train data=mnist model=yolov8n-cls.pt epochs=100 imgsz=28 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The MNIST dataset contains grayscale images of handwritten digits, providing a well-structured dataset for image classification tasks. Here are some examples of images from the dataset: |
||||
|
||||
 |
||||
|
||||
The example showcases the variety and complexity of the handwritten digits in the MNIST dataset, highlighting the importance of a diverse dataset for training robust image classification models. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the MNIST dataset in your |
||||
|
||||
research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@article{lecun2010mnist, |
||||
title={MNIST handwritten digit database}, |
||||
author={LeCun, Yann and Cortes, Corinna and Burges, CJ}, |
||||
journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist}, |
||||
volume={2}, |
||||
year={2010} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge Yann LeCun, Corinna Cortes, and Christopher J.C. Burges for creating and maintaining the MNIST dataset as a valuable resource for the machine learning and computer vision research community. For more information about the MNIST dataset and its creators, visit the [MNIST dataset website](http://yann.lecun.com/exdb/mnist/). |
@ -0,0 +1,97 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore Argoverse, a comprehensive dataset for autonomous driving tasks including 3D tracking, motion forecasting and depth estimation used in YOLO. |
||||
keywords: Argoverse dataset, autonomous driving, YOLO, 3D tracking, motion forecasting, LiDAR data, HD maps, ultralytics documentation |
||||
--- |
||||
|
||||
# Argoverse Dataset |
||||
|
||||
The [Argoverse](https://www.argoverse.org/) dataset is a collection of data designed to support research in autonomous driving tasks, such as 3D tracking, motion forecasting, and stereo depth estimation. Developed by Argo AI, the dataset provides a wide range of high-quality sensor data, including high-resolution images, LiDAR point clouds, and map data. |
||||
|
||||
!!! note |
||||
|
||||
The Argoverse dataset *.zip file required for training was removed from Amazon S3 after the shutdown of Argo AI by Ford, but we have made it available for manual download on [Google Drive](https://drive.google.com/file/d/1st9qW3BeIwQsnR0t8mRpvbsSWIo16ACi/view?usp=drive_link). |
||||
|
||||
## Key Features |
||||
|
||||
- Argoverse contains over 290K labeled 3D object tracks and 5 million object instances across 1,263 distinct scenes. |
||||
- The dataset includes high-resolution camera images, LiDAR point clouds, and richly annotated HD maps. |
||||
- Annotations include 3D bounding boxes for objects, object tracks, and trajectory information. |
||||
- Argoverse provides multiple subsets for different tasks, such as 3D tracking, motion forecasting, and stereo depth estimation. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The Argoverse dataset is organized into three main subsets: |
||||
|
||||
1. **Argoverse 3D Tracking**: This subset contains 113 scenes with over 290K labeled 3D object tracks, focusing on 3D object tracking tasks. It includes LiDAR point clouds, camera images, and sensor calibration information. |
||||
2. **Argoverse Motion Forecasting**: This subset consists of 324K vehicle trajectories collected from 60 hours of driving data, suitable for motion forecasting tasks. |
||||
3. **Argoverse Stereo Depth Estimation**: This subset is designed for stereo depth estimation tasks and includes over 10K stereo image pairs with corresponding LiDAR point clouds for ground truth depth estimation. |
||||
|
||||
## Applications |
||||
|
||||
The Argoverse dataset is widely used for training and evaluating deep learning models in autonomous driving tasks such as 3D object tracking, motion forecasting, and stereo depth estimation. The dataset's diverse set of sensor data, object annotations, and map information make it a valuable resource for researchers and practitioners in the field of autonomous driving. |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. For the case of the Argoverse dataset, the `Argoverse.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/Argoverse.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/Argoverse.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/Argoverse.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/Argoverse.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n model on the Argoverse dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='Argoverse.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=Argoverse.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Data and Annotations |
||||
|
||||
The Argoverse dataset contains a diverse set of sensor data, including camera images, LiDAR point clouds, and HD map information, providing rich context for autonomous driving tasks. Here are some examples of data from the dataset, along with their corresponding annotations: |
||||
|
||||
 |
||||
|
||||
- **Argoverse 3D Tracking**: This image demonstrates an example of 3D object tracking, where objects are annotated with 3D bounding boxes. The dataset provides LiDAR point clouds and camera images to facilitate the development of models for this task. |
||||
|
||||
The example showcases the variety and complexity of the data in the Argoverse dataset and highlights the importance of high-quality sensor data for autonomous driving tasks. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the Argoverse dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@inproceedings{chang2019argoverse, |
||||
title={Argoverse: 3D Tracking and Forecasting with Rich Maps}, |
||||
author={Chang, Ming-Fang and Lambert, John and Sangkloy, Patsorn and Singh, Jagjeet and Bak, Slawomir and Hartnett, Andrew and Wang, Dequan and Carr, Peter and Lucey, Simon and Ramanan, Deva and others}, |
||||
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, |
||||
pages={8748--8757}, |
||||
year={2019} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge Argo AI for creating and maintaining the Argoverse dataset as a valuable resource for the autonomous driving research community. For more information about the Argoverse dataset and its creators, visit the [Argoverse dataset website](https://www.argoverse.org/). |
@ -0,0 +1,94 @@ |
||||
--- |
||||
comments: true |
||||
description: Learn how COCO, a leading dataset for object detection and segmentation, integrates with Ultralytics. Discover ways to use it for training YOLO models. |
||||
keywords: Ultralytics, COCO dataset, object detection, YOLO, YOLO model training, image segmentation, computer vision, deep learning models |
||||
--- |
||||
|
||||
# COCO Dataset |
||||
|
||||
The [COCO](https://cocodataset.org/#home) (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking computer vision models. It is an essential dataset for researchers and developers working on object detection, segmentation, and pose estimation tasks. |
||||
|
||||
## Key Features |
||||
|
||||
- COCO contains 330K images, with 200K images having annotations for object detection, segmentation, and captioning tasks. |
||||
- The dataset comprises 80 object categories, including common objects like cars, bicycles, and animals, as well as more specific categories such as umbrellas, handbags, and sports equipment. |
||||
- Annotations include object bounding boxes, segmentation masks, and captions for each image. |
||||
- COCO provides standardized evaluation metrics like mean Average Precision (mAP) for object detection, and mean Average Recall (mAR) for segmentation tasks, making it suitable for comparing model performance. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The COCO dataset is split into three subsets: |
||||
|
||||
1. **Train2017**: This subset contains 118K images for training object detection, segmentation, and captioning models. |
||||
2. **Val2017**: This subset has 5K images used for validation purposes during model training. |
||||
3. **Test2017**: This subset consists of 20K images used for testing and benchmarking the trained models. Ground truth annotations for this subset are not publicly available, and the results are submitted to the [COCO evaluation server](https://codalab.lisn.upsaclay.fr/competitions/7384) for performance evaluation. |
||||
|
||||
## Applications |
||||
|
||||
The COCO dataset is widely used for training and evaluating deep learning models in object detection (such as YOLO, Faster R-CNN, and SSD), instance segmentation (such as Mask R-CNN), and keypoint detection (such as OpenPose). The dataset's diverse set of object categories, large number of annotated images, and standardized evaluation metrics make it an essential resource for computer vision researchers and practitioners. |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO dataset, the `coco.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/coco.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/coco.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n model on the COCO dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='coco.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=coco.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The COCO dataset contains a diverse set of images with various object categories and complex scenes. Here are some examples of images from the dataset, along with their corresponding annotations: |
||||
|
||||
 |
||||
|
||||
- **Mosaiced Image**: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts. |
||||
|
||||
The example showcases the variety and complexity of the images in the COCO dataset and the benefits of using mosaicing during the training process. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the COCO dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{lin2015microsoft, |
||||
title={Microsoft COCO: Common Objects in Context}, |
||||
author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár}, |
||||
year={2015}, |
||||
eprint={1405.0312}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge the COCO Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the COCO dataset and its creators, visit the [COCO dataset website](https://cocodataset.org/#home). |
@ -0,0 +1,84 @@ |
||||
--- |
||||
comments: true |
||||
description: Discover the benefits of using the practical and diverse COCO8 dataset for object detection model testing. Learn to configure and use it via Ultralytics HUB and YOLOv8. |
||||
keywords: Ultralytics, COCO8 dataset, object detection, model testing, dataset configuration, detection approaches, sanity check, training pipelines, YOLOv8 |
||||
--- |
||||
|
||||
# COCO8 Dataset |
||||
|
||||
## Introduction |
||||
|
||||
[Ultralytics](https://ultralytics.com) COCO8 is a small, but versatile object detection dataset composed of the first 8 |
||||
images of the COCO train 2017 set, 4 for training and 4 for validation. This dataset is ideal for testing and debugging |
||||
object detection models, or for experimenting with new detection approaches. With 8 images, it is small enough to be |
||||
easily manageable, yet diverse enough to test training pipelines for errors and act as a sanity check before training |
||||
larger datasets. |
||||
|
||||
This dataset is intended for use with Ultralytics [HUB](https://hub.ultralytics.com) |
||||
and [YOLOv8](https://github.com/ultralytics/ultralytics). |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO8 dataset, the `coco8.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco8.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco8.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/coco8.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/coco8.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n model on the COCO8 dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='coco8.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=coco8.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
Here are some examples of images from the COCO8 dataset, along with their corresponding annotations: |
||||
|
||||
<img src="https://user-images.githubusercontent.com/26833433/236818348-e6260a3d-0454-436b-83a9-de366ba07235.jpg" alt="Dataset sample image" width="800"> |
||||
|
||||
- **Mosaiced Image**: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts. |
||||
|
||||
The example showcases the variety and complexity of the images in the COCO8 dataset and the benefits of using mosaicing during the training process. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the COCO dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{lin2015microsoft, |
||||
title={Microsoft COCO: Common Objects in Context}, |
||||
author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár}, |
||||
year={2015}, |
||||
eprint={1405.0312}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge the COCO Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the COCO dataset and its creators, visit the [COCO dataset website](https://cocodataset.org/#home). |
@ -0,0 +1,91 @@ |
||||
--- |
||||
comments: true |
||||
description: Understand how to utilize the vast Global Wheat Head Dataset for building wheat head detection models. Features, structure, applications, usage, sample data, and citation. |
||||
keywords: Ultralytics, YOLO, Global Wheat Head Dataset, wheat head detection, plant phenotyping, crop management, deep learning, outdoor images, annotations, YAML configuration |
||||
--- |
||||
|
||||
# Global Wheat Head Dataset |
||||
|
||||
The [Global Wheat Head Dataset](http://www.global-wheat.com/) is a collection of images designed to support the development of accurate wheat head detection models for applications in wheat phenotyping and crop management. Wheat heads, also known as spikes, are the grain-bearing parts of the wheat plant. Accurate estimation of wheat head density and size is essential for assessing crop health, maturity, and yield potential. The dataset, created by a collaboration of nine research institutes from seven countries, covers multiple growing regions to ensure models generalize well across different environments. |
||||
|
||||
## Key Features |
||||
|
||||
- The dataset contains over 3,000 training images from Europe (France, UK, Switzerland) and North America (Canada). |
||||
- It includes approximately 1,000 test images from Australia, Japan, and China. |
||||
- Images are outdoor field images, capturing the natural variability in wheat head appearances. |
||||
- Annotations include wheat head bounding boxes to support object detection tasks. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The Global Wheat Head Dataset is organized into two main subsets: |
||||
|
||||
1. **Training Set**: This subset contains over 3,000 images from Europe and North America. The images are labeled with wheat head bounding boxes, providing ground truth for training object detection models. |
||||
2. **Test Set**: This subset consists of approximately 1,000 images from Australia, Japan, and China. These images are used for evaluating the performance of trained models on unseen genotypes, environments, and observational conditions. |
||||
|
||||
## Applications |
||||
|
||||
The Global Wheat Head Dataset is widely used for training and evaluating deep learning models in wheat head detection tasks. The dataset's diverse set of images, capturing a wide range of appearances, environments, and conditions, make it a valuable resource for researchers and practitioners in the field of plant phenotyping and crop management. |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. For the case of the Global Wheat Head Dataset, the `GlobalWheat2020.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/GlobalWheat2020.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/GlobalWheat2020.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/GlobalWheat2020.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/GlobalWheat2020.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n model on the Global Wheat Head Dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='GlobalWheat2020.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=GlobalWheat2020.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Data and Annotations |
||||
|
||||
The Global Wheat Head Dataset contains a diverse set of outdoor field images, capturing the natural variability in wheat head appearances, environments, and conditions. Here are some examples of data from the dataset, along with their corresponding annotations: |
||||
|
||||
 |
||||
|
||||
- **Wheat Head Detection**: This image demonstrates an example of wheat head detection, where wheat heads are annotated with bounding boxes. The dataset provides a variety of images to facilitate the development of models for this task. |
||||
|
||||
The example showcases the variety and complexity of the data in the Global Wheat Head Dataset and highlights the importance of accurate wheat head detection for applications in wheat phenotyping and crop management. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the Global Wheat Head Dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@article{david2020global, |
||||
title={Global Wheat Head Detection (GWHD) Dataset: A Large and Diverse Dataset of High-Resolution RGB-Labelled Images to Develop and Benchmark Wheat Head Detection Methods}, |
||||
author={David, Etienne and Madec, Simon and Sadeghi-Tehran, Pouria and Aasen, Helge and Zheng, Bangyou and Liu, Shouyang and Kirchgessner, Norbert and Ishikawa, Goro and Nagasawa, Koichi and Badhon, Minhajul and others}, |
||||
journal={arXiv preprint arXiv:2005.02162}, |
||||
year={2020} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge the researchers and institutions that contributed to the creation and maintenance of the Global Wheat Head Dataset as a valuable resource for the plant phenotyping and crop management research community. For more information about the dataset and its creators, visit the [Global Wheat Head Dataset website](http://www.global-wheat.com/). |
@ -0,0 +1,104 @@ |
||||
--- |
||||
comments: true |
||||
description: Navigate through supported dataset formats, methods to utilize them and how to add your own datasets. Get insights on porting or converting label formats. |
||||
keywords: Ultralytics, YOLO, datasets, object detection, dataset formats, label formats, data conversion |
||||
--- |
||||
|
||||
# Object Detection Datasets Overview |
||||
|
||||
Training a robust and accurate object detection model requires a comprehensive dataset. This guide introduces various formats of datasets that are compatible with the Ultralytics YOLO model and provides insights into their structure, usage, and how to convert between different formats. |
||||
|
||||
## Supported Dataset Formats |
||||
|
||||
### Ultralytics YOLO format |
||||
|
||||
The Ultralytics YOLO format is a dataset configuration format that allows you to define the dataset root directory, the relative paths to training/validation/testing image directories or *.txt files containing image paths, and a dictionary of class names. Here is an example: |
||||
|
||||
```yaml |
||||
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..] |
||||
path: ../datasets/coco8 # dataset root dir |
||||
train: images/train # train images (relative to 'path') 4 images |
||||
val: images/val # val images (relative to 'path') 4 images |
||||
test: # test images (optional) |
||||
|
||||
# Classes (80 COCO classes) |
||||
names: |
||||
0: person |
||||
1: bicycle |
||||
2: car |
||||
... |
||||
77: teddy bear |
||||
78: hair drier |
||||
79: toothbrush |
||||
``` |
||||
|
||||
Labels for this format should be exported to YOLO format with one `*.txt` file per image. If there are no objects in an image, no `*.txt` file is required. The `*.txt` file should be formatted with one row per object in `class x_center y_center width height` format. Box coordinates must be in **normalized xywh** format (from 0 to 1). If your boxes are in pixels, you should divide `x_center` and `width` by image width, and `y_center` and `height` by image height. Class numbers should be zero-indexed (start with 0). |
||||
|
||||
<p align="center"><img width="750" src="https://user-images.githubusercontent.com/26833433/91506361-c7965000-e886-11ea-8291-c72b98c25eec.jpg"></p> |
||||
|
||||
The label file corresponding to the above image contains 2 persons (class `0`) and a tie (class `27`): |
||||
|
||||
<p align="center"><img width="428" src="https://user-images.githubusercontent.com/26833433/112467037-d2568c00-8d66-11eb-8796-55402ac0d62f.png"></p> |
||||
|
||||
When using the Ultralytics YOLO format, organize your training and validation images and labels as shown in the example below. |
||||
|
||||
<p align="center"><img width="700" src="https://user-images.githubusercontent.com/26833433/134436012-65111ad1-9541-4853-81a6-f19a3468b75f.png"></p> |
||||
|
||||
## Usage |
||||
|
||||
Here's how you can use these formats to train your model: |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='coco8.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=coco8.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Supported Datasets |
||||
|
||||
Here is a list of the supported datasets and a brief description for each: |
||||
|
||||
- [**Argoverse**](./argoverse.md): A collection of sensor data collected from autonomous vehicles. It contains 3D tracking annotations for car objects. |
||||
- [**COCO**](./coco.md): Common Objects in Context (COCO) is a large-scale object detection, segmentation, and captioning dataset with 80 object categories. |
||||
- [**COCO8**](./coco8.md): A smaller subset of the COCO dataset, COCO8 is more lightweight and faster to train. |
||||
- [**GlobalWheat2020**](./globalwheat2020.md): A dataset containing images of wheat heads for the Global Wheat Challenge 2020. |
||||
- [**Objects365**](./objects365.md): A large-scale object detection dataset with 365 object categories and 600k images, aimed at advancing object detection research. |
||||
- [**OpenImagesV7**](./open-images-v7.md): A comprehensive dataset by Google with 1.7M train images and 42k validation images. |
||||
- [**SKU-110K**](./sku-110k.md): A dataset containing images of densely packed retail products, intended for retail environment object detection. |
||||
- [**VisDrone**](./visdrone.md): A dataset focusing on drone-based images, containing various object categories like cars, pedestrians, and cyclists. |
||||
- [**VOC**](./voc.md): PASCAL VOC is a popular object detection dataset with 20 object categories including vehicles, animals, and furniture. |
||||
- [**xView**](./xview.md): A dataset containing high-resolution satellite imagery, designed for the detection of various object classes in overhead views. |
||||
|
||||
### Adding your own dataset |
||||
|
||||
If you have your own dataset and would like to use it for training detection models with Ultralytics YOLO format, ensure that it follows the format specified above under "Ultralytics YOLO format". Convert your annotations to the required format and specify the paths, number of classes, and class names in the YAML configuration file. |
||||
|
||||
## Port or Convert Label Formats |
||||
|
||||
### COCO Dataset Format to YOLO Format |
||||
|
||||
You can easily convert labels from the popular COCO dataset format to the YOLO format using the following code snippet: |
||||
|
||||
```python |
||||
from ultralytics.data.converter import convert_coco |
||||
|
||||
convert_coco(labels_dir='../coco/annotations/') |
||||
``` |
||||
|
||||
This conversion tool can be used to convert the COCO dataset or any dataset in the COCO format to the Ultralytics YOLO format. |
||||
|
||||
Remember to double-check if the dataset you want to use is compatible with your model and follows the necessary format conventions. Properly formatted datasets are crucial for training successful object detection models. |
@ -0,0 +1,92 @@ |
||||
--- |
||||
comments: true |
||||
description: Discover the Objects365 dataset, a wide-scale, high-quality resource for object detection research. Learn to use it with the Ultralytics YOLO model. |
||||
keywords: Objects365, object detection, Ultralytics, dataset, YOLO, bounding boxes, annotations, computer vision, deep learning, training models |
||||
--- |
||||
|
||||
# Objects365 Dataset |
||||
|
||||
The [Objects365](https://www.objects365.org/) dataset is a large-scale, high-quality dataset designed to foster object detection research with a focus on diverse objects in the wild. Created by a team of [Megvii](https://en.megvii.com/) researchers, the dataset offers a wide range of high-resolution images with a comprehensive set of annotated bounding boxes covering 365 object categories. |
||||
|
||||
## Key Features |
||||
|
||||
- Objects365 contains 365 object categories, with 2 million images and over 30 million bounding boxes. |
||||
- The dataset includes diverse objects in various scenarios, providing a rich and challenging benchmark for object detection tasks. |
||||
- Annotations include bounding boxes for objects, making it suitable for training and evaluating object detection models. |
||||
- Objects365 pre-trained models significantly outperform ImageNet pre-trained models, leading to better generalization on various tasks. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The Objects365 dataset is organized into a single set of images with corresponding annotations: |
||||
|
||||
- **Images**: The dataset includes 2 million high-resolution images, each containing a variety of objects across 365 categories. |
||||
- **Annotations**: The images are annotated with over 30 million bounding boxes, providing comprehensive ground truth information for object detection tasks. |
||||
|
||||
## Applications |
||||
|
||||
The Objects365 dataset is widely used for training and evaluating deep learning models in object detection tasks. The dataset's diverse set of object categories and high-quality annotations make it a valuable resource for researchers and practitioners in the field of computer vision. |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. For the case of the Objects365 Dataset, the `Objects365.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/Objects365.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/Objects365.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/Objects365.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/Objects365.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n model on the Objects365 dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='Objects365.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=Objects365.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Data and Annotations |
||||
|
||||
The Objects365 dataset contains a diverse set of high-resolution images with objects from 365 categories, providing rich context for object detection tasks. Here are some examples of the images in the dataset: |
||||
|
||||
 |
||||
|
||||
- **Objects365**: This image demonstrates an example of object detection, where objects are annotated with bounding boxes. The dataset provides a wide range of images to facilitate the development of models for this task. |
||||
|
||||
The example showcases the variety and complexity of the data in the Objects365 dataset and highlights the importance of accurate object detection for computer vision applications. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the Objects365 dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@inproceedings{shao2019objects365, |
||||
title={Objects365: A Large-scale, High-quality Dataset for Object Detection}, |
||||
author={Shao, Shuai and Li, Zeming and Zhang, Tianyuan and Peng, Chao and Yu, Gang and Li, Jing and Zhang, Xiangyu and Sun, Jian}, |
||||
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, |
||||
pages={8425--8434}, |
||||
year={2019} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge the team of researchers who created and maintain the Objects365 dataset as a valuable resource for the computer vision research community. For more information about the Objects365 dataset and its creators, visit the [Objects365 dataset website](https://www.objects365.org/). |
@ -0,0 +1,110 @@ |
||||
--- |
||||
comments: true |
||||
description: Dive into Google's Open Images V7, a comprehensive dataset offering a broad scope for computer vision research. Understand its usage with deep learning models. |
||||
keywords: Open Images V7, object detection, segmentation masks, visual relationships, localized narratives, computer vision, deep learning, annotations, bounding boxes |
||||
--- |
||||
|
||||
# Open Images V7 Dataset |
||||
|
||||
[Open Images V7](https://storage.googleapis.com/openimages/web/index.html) is a versatile and expansive dataset championed by Google. Aimed at propelling research in the realm of computer vision, it boasts a vast collection of images annotated with a plethora of data, including image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives. |
||||
|
||||
 |
||||
|
||||
## Key Features |
||||
|
||||
- Encompasses ~9M images annotated in various ways to suit multiple computer vision tasks. |
||||
- Houses a staggering 16M bounding boxes across 600 object classes in 1.9M images. These boxes are primarily hand-drawn by experts ensuring high precision. |
||||
- Visual relationship annotations totaling 3.3M are available, detailing 1,466 unique relationship triplets, object properties, and human activities. |
||||
- V5 introduced segmentation masks for 2.8M objects across 350 classes. |
||||
- V6 introduced 675k localized narratives that amalgamate voice, text, and mouse traces highlighting described objects. |
||||
- V7 introduced 66.4M point-level labels on 1.4M images, spanning 5,827 classes. |
||||
- Encompasses 61.4M image-level labels across a diverse set of 20,638 classes. |
||||
- Provides a unified platform for image classification, object detection, relationship detection, instance segmentation, and multimodal image descriptions. |
||||
|
||||
## Dataset Structure |
||||
|
||||
Open Images V7 is structured in multiple components catering to varied computer vision challenges: |
||||
|
||||
- **Images**: About 9 million images, often showcasing intricate scenes with an average of 8.3 objects per image. |
||||
- **Bounding Boxes**: Over 16 million boxes that demarcate objects across 600 categories. |
||||
- **Segmentation Masks**: These detail the exact boundary of 2.8M objects across 350 classes. |
||||
- **Visual Relationships**: 3.3M annotations indicating object relationships, properties, and actions. |
||||
- **Localized Narratives**: 675k descriptions combining voice, text, and mouse traces. |
||||
- **Point-Level Labels**: 66.4M labels across 1.4M images, suitable for zero/few-shot semantic segmentation. |
||||
|
||||
## Applications |
||||
|
||||
Open Images V7 is a cornerstone for training and evaluating state-of-the-art models in various computer vision tasks. The dataset's broad scope and high-quality annotations make it indispensable for researchers and developers specializing in computer vision. |
||||
|
||||
## Dataset YAML |
||||
|
||||
Typically, datasets come with a YAML (Yet Another Markup Language) file that delineates the dataset's configuration. For the case of Open Images V7, a hypothetical `OpenImagesV7.yaml` might exist. For accurate paths and configurations, one should refer to the dataset's official repository or documentation. |
||||
|
||||
!!! example "OpenImagesV7.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/open-images-v7.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n model on the Open Images V7 dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! warning |
||||
|
||||
The complete Open Images V7 dataset comprises 1,743,042 training images and 41,620 validation images, requiring approximately **561 GB of storage space** upon download. |
||||
|
||||
Executing the commands provided below will trigger an automatic download of the full dataset if it's not already present locally. Before running the below example it's crucial to: |
||||
|
||||
- Verify that your device has enough storage capacity. |
||||
- Ensure a robust and speedy internet connection. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a COCO-pretrained YOLOv8n model |
||||
model = YOLO('yolov8n.pt') |
||||
|
||||
# Train the model on the Open Images V7 dataset |
||||
results = model.train(data='open-images-v7.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Train a COCO-pretrained YOLOv8n model on the Open Images V7 dataset |
||||
yolo detect train data=open-images-v7.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Data and Annotations |
||||
|
||||
Illustrations of the dataset help provide insights into its richness: |
||||
|
||||
 |
||||
|
||||
- **Open Images V7**: This image exemplifies the depth and detail of annotations available, including bounding boxes, relationships, and segmentation masks. |
||||
|
||||
Researchers can gain invaluable insights into the array of computer vision challenges that the dataset addresses, from basic object detection to intricate relationship identification. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
For those employing Open Images V7 in their work, it's prudent to cite the relevant papers and acknowledge the creators: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@article{OpenImages, |
||||
author = {Alina Kuznetsova and Hassan Rom and Neil Alldrin and Jasper Uijlings and Ivan Krasin and Jordi Pont-Tuset and Shahab Kamali and Stefan Popov and Matteo Malloci and Alexander Kolesnikov and Tom Duerig and Vittorio Ferrari}, |
||||
title = {The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale}, |
||||
year = {2020}, |
||||
journal = {IJCV} |
||||
} |
||||
``` |
||||
|
||||
A heartfelt acknowledgment goes out to the Google AI team for creating and maintaining the Open Images V7 dataset. For a deep dive into the dataset and its offerings, navigate to the [official Open Images V7 website](https://storage.googleapis.com/openimages/web/index.html). |
@ -0,0 +1,93 @@ |
||||
--- |
||||
comments: true |
||||
description: 'Explore the SKU-110k dataset: densely packed retail shelf images for object detection research. Learn how to use it with Ultralytics.' |
||||
keywords: SKU-110k dataset, object detection, retail shelf images, Ultralytics, YOLO, computer vision, deep learning models |
||||
--- |
||||
|
||||
# SKU-110k Dataset |
||||
|
||||
The [SKU-110k](https://github.com/eg4000/SKU110K_CVPR19) dataset is a collection of densely packed retail shelf images, designed to support research in object detection tasks. Developed by Eran Goldman et al., the dataset contains over 110,000 unique store keeping unit (SKU) categories with densely packed objects, often looking similar or even identical, positioned in close proximity. |
||||
|
||||
 |
||||
|
||||
## Key Features |
||||
|
||||
- SKU-110k contains images of store shelves from around the world, featuring densely packed objects that pose challenges for state-of-the-art object detectors. |
||||
- The dataset includes over 110,000 unique SKU categories, providing a diverse range of object appearances. |
||||
- Annotations include bounding boxes for objects and SKU category labels. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The SKU-110k dataset is organized into three main subsets: |
||||
|
||||
1. **Training set**: This subset contains images and annotations used for training object detection models. |
||||
2. **Validation set**: This subset consists of images and annotations used for model validation during training. |
||||
3. **Test set**: This subset is designed for the final evaluation of trained object detection models. |
||||
|
||||
## Applications |
||||
|
||||
The SKU-110k dataset is widely used for training and evaluating deep learning models in object detection tasks, especially in densely packed scenes such as retail shelf displays. The dataset's diverse set of SKU categories and densely packed object arrangements make it a valuable resource for researchers and practitioners in the field of computer vision. |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. For the case of the SKU-110K dataset, the `SKU-110K.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/SKU-110K.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/SKU-110K.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/SKU-110K.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/SKU-110K.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n model on the SKU-110K dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='SKU-110K.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=SKU-110K.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Data and Annotations |
||||
|
||||
The SKU-110k dataset contains a diverse set of retail shelf images with densely packed objects, providing rich context for object detection tasks. Here are some examples of data from the dataset, along with their corresponding annotations: |
||||
|
||||
 |
||||
|
||||
- **Densely packed retail shelf image**: This image demonstrates an example of densely packed objects in a retail shelf setting. Objects are annotated with bounding boxes and SKU category labels. |
||||
|
||||
The example showcases the variety and complexity of the data in the SKU-110k dataset and highlights the importance of high-quality data for object detection tasks. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the SKU-110k dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@inproceedings{goldman2019dense, |
||||
author = {Eran Goldman and Roei Herzig and Aviv Eisenschtat and Jacob Goldberger and Tal Hassner}, |
||||
title = {Precise Detection in Densely Packed Scenes}, |
||||
booktitle = {Proc. Conf. Comput. Vision Pattern Recognition (CVPR)}, |
||||
year = {2019} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge Eran Goldman et al. for creating and maintaining the SKU-110k dataset as a valuable resource for the computer vision research community. For more information about the SKU-110k dataset and its creators, visit the [SKU-110k dataset GitHub repository](https://github.com/eg4000/SKU110K_CVPR19). |
@ -0,0 +1,92 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore the VisDrone Dataset, a large-scale benchmark for drone-based image analysis, and learn how to train a YOLO model using it. |
||||
keywords: VisDrone Dataset, Ultralytics, drone-based image analysis, YOLO model, object detection, object tracking, crowd counting |
||||
--- |
||||
|
||||
# VisDrone Dataset |
||||
|
||||
The [VisDrone Dataset](https://github.com/VisDrone/VisDrone-Dataset) is a large-scale benchmark created by the AISKYEYE team at the Lab of Machine Learning and Data Mining, Tianjin University, China. It contains carefully annotated ground truth data for various computer vision tasks related to drone-based image and video analysis. |
||||
|
||||
VisDrone is composed of 288 video clips with 261,908 frames and 10,209 static images, captured by various drone-mounted cameras. The dataset covers a wide range of aspects, including location (14 different cities across China), environment (urban and rural), objects (pedestrians, vehicles, bicycles, etc.), and density (sparse and crowded scenes). The dataset was collected using various drone platforms under different scenarios and weather and lighting conditions. These frames are manually annotated with over 2.6 million bounding boxes of targets such as pedestrians, cars, bicycles, and tricycles. Attributes like scene visibility, object class, and occlusion are also provided for better data utilization. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The VisDrone dataset is organized into five main subsets, each focusing on a specific task: |
||||
|
||||
1. **Task 1**: Object detection in images |
||||
2. **Task 2**: Object detection in videos |
||||
3. **Task 3**: Single-object tracking |
||||
4. **Task 4**: Multi-object tracking |
||||
5. **Task 5**: Crowd counting |
||||
|
||||
## Applications |
||||
|
||||
The VisDrone dataset is widely used for training and evaluating deep learning models in drone-based computer vision tasks such as object detection, object tracking, and crowd counting. The dataset's diverse set of sensor data, object annotations, and attributes make it a valuable resource for researchers and practitioners in the field of drone-based computer vision. |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the Visdrone dataset, the `VisDrone.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/VisDrone.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/VisDrone.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/VisDrone.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/VisDrone.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n model on the VisDrone dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='VisDrone.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=VisDrone.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Data and Annotations |
||||
|
||||
The VisDrone dataset contains a diverse set of images and videos captured by drone-mounted cameras. Here are some examples of data from the dataset, along with their corresponding annotations: |
||||
|
||||
 |
||||
|
||||
- **Task 1**: Object detection in images - This image demonstrates an example of object detection in images, where objects are annotated with bounding boxes. The dataset provides a wide variety of images taken from different locations, environments, and densities to facilitate the development of models for this task. |
||||
|
||||
The example showcases the variety and complexity of the data in the VisDrone dataset and highlights the importance of high-quality sensor data for drone-based computer vision tasks. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the VisDrone dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@ARTICLE{9573394, |
||||
author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, |
||||
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, |
||||
title={Detection and Tracking Meet Drones Challenge}, |
||||
year={2021}, |
||||
volume={}, |
||||
number={}, |
||||
pages={1-1}, |
||||
doi={10.1109/TPAMI.2021.3119563}} |
||||
``` |
||||
|
||||
We would like to acknowledge the AISKYEYE team at the Lab of Machine Learning and Data Mining, Tianjin University, China, for creating and maintaining the VisDrone dataset as a valuable resource for the drone-based computer vision research community. For more information about the VisDrone dataset and its creators, visit the [VisDrone Dataset GitHub repository](https://github.com/VisDrone/VisDrone-Dataset). |
@ -0,0 +1,95 @@ |
||||
--- |
||||
comments: true |
||||
description: A complete guide to the PASCAL VOC dataset used for object detection, segmentation and classification tasks with relevance to YOLO model training. |
||||
keywords: Ultralytics, PASCAL VOC dataset, object detection, segmentation, image classification, YOLO, model training, VOC.yaml, deep learning |
||||
--- |
||||
|
||||
# VOC Dataset |
||||
|
||||
The [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) (Visual Object Classes) dataset is a well-known object detection, segmentation, and classification dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking computer vision models. It is an essential dataset for researchers and developers working on object detection, segmentation, and classification tasks. |
||||
|
||||
## Key Features |
||||
|
||||
- VOC dataset includes two main challenges: VOC2007 and VOC2012. |
||||
- The dataset comprises 20 object categories, including common objects like cars, bicycles, and animals, as well as more specific categories such as boats, sofas, and dining tables. |
||||
- Annotations include object bounding boxes and class labels for object detection and classification tasks, and segmentation masks for the segmentation tasks. |
||||
- VOC provides standardized evaluation metrics like mean Average Precision (mAP) for object detection and classification, making it suitable for comparing model performance. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The VOC dataset is split into three subsets: |
||||
|
||||
1. **Train**: This subset contains images for training object detection, segmentation, and classification models. |
||||
2. **Validation**: This subset has images used for validation purposes during model training. |
||||
3. **Test**: This subset consists of images used for testing and benchmarking the trained models. Ground truth annotations for this subset are not publicly available, and the results are submitted to the [PASCAL VOC evaluation server](http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php) for performance evaluation. |
||||
|
||||
## Applications |
||||
|
||||
The VOC dataset is widely used for training and evaluating deep learning models in object detection (such as YOLO, Faster R-CNN, and SSD), instance segmentation (such as Mask R-CNN), and image classification. The dataset's diverse set of object categories, large number of annotated images, and standardized evaluation metrics make it an essential resource for computer vision researchers and practitioners. |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the VOC dataset, the `VOC.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/VOC.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/VOC.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/VOC.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/VOC.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n model on the VOC dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='VOC.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from |
||||
a pretrained *.pt model |
||||
yolo detect train data=VOC.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The VOC dataset contains a diverse set of images with various object categories and complex scenes. Here are some examples of images from the dataset, along with their corresponding annotations: |
||||
|
||||
 |
||||
|
||||
- **Mosaiced Image**: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts. |
||||
|
||||
The example showcases the variety and complexity of the images in the VOC dataset and the benefits of using mosaicing during the training process. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the VOC dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{everingham2010pascal, |
||||
title={The PASCAL Visual Object Classes (VOC) Challenge}, |
||||
author={Mark Everingham and Luc Van Gool and Christopher K. I. Williams and John Winn and Andrew Zisserman}, |
||||
year={2010}, |
||||
eprint={0909.5206}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge the PASCAL VOC Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the VOC dataset and its creators, visit the [PASCAL VOC dataset website](http://host.robots.ox.ac.uk/pascal/VOC/). |
@ -0,0 +1,97 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore xView, a large-scale, high resolution satellite imagery dataset for object detection. Dive into dataset structure, usage examples & its potential applications. |
||||
keywords: Ultralytics, YOLO, computer vision, xView dataset, satellite imagery, object detection, overhead imagery, training, deep learning, dataset YAML |
||||
--- |
||||
|
||||
# xView Dataset |
||||
|
||||
The [xView](http://xviewdataset.org/) dataset is one of the largest publicly available datasets of overhead imagery, containing images from complex scenes around the world annotated using bounding boxes. The goal of the xView dataset is to accelerate progress in four computer vision frontiers: |
||||
|
||||
1. Reduce minimum resolution for detection. |
||||
2. Improve learning efficiency. |
||||
3. Enable discovery of more object classes. |
||||
4. Improve detection of fine-grained classes. |
||||
|
||||
xView builds on the success of challenges like Common Objects in Context (COCO) and aims to leverage computer vision to analyze the growing amount of available imagery from space in order to understand the visual world in new ways and address a range of important applications. |
||||
|
||||
## Key Features |
||||
|
||||
- xView contains over 1 million object instances across 60 classes. |
||||
- The dataset has a resolution of 0.3 meters, providing higher resolution imagery than most public satellite imagery datasets. |
||||
- xView features a diverse collection of small, rare, fine-grained, and multi-type objects with bounding box annotation. |
||||
- Comes with a pre-trained baseline model using the TensorFlow object detection API and an example for PyTorch. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The xView dataset is composed of satellite images collected from WorldView-3 satellites at a 0.3m ground sample distance. It contains over 1 million objects across 60 classes in over 1,400 km² of imagery. |
||||
|
||||
## Applications |
||||
|
||||
The xView dataset is widely used for training and evaluating deep learning models for object detection in overhead imagery. The dataset's diverse set of object classes and high-resolution imagery make it a valuable resource for researchers and practitioners in the field of computer vision, especially for satellite imagery analysis. |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the xView dataset, the `xView.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/xView.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/xView.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/xView.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/xView.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a model on the xView dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='xView.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=xView.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Data and Annotations |
||||
|
||||
The xView dataset contains high-resolution satellite images with a diverse set of objects annotated using bounding boxes. Here are some examples of data from the dataset, along with their corresponding annotations: |
||||
|
||||
 |
||||
|
||||
- **Overhead Imagery**: This image demonstrates an example of object detection in overhead imagery, where objects are annotated with bounding boxes. The dataset provides high-resolution satellite images to facilitate the development of models for this task. |
||||
|
||||
The example showcases the variety and complexity of the data in the xView dataset and highlights the importance of high-quality satellite imagery for object detection tasks. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the xView dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{lam2018xview, |
||||
title={xView: Objects in Context in Overhead Imagery}, |
||||
author={Darius Lam and Richard Kuzma and Kevin McGee and Samuel Dooley and Michael Laielli and Matthew Klaric and Yaroslav Bulatov and Brendan McCord}, |
||||
year={2018}, |
||||
eprint={1802.07856}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge the [Defense Innovation Unit](https://www.diu.mil/) (DIU) and the creators of the xView dataset for their valuable contribution to the computer vision research community. For more information about the xView dataset and its creators, visit the [xView dataset website](http://xviewdataset.org/). |
@ -0,0 +1,66 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore various computer vision datasets supported by Ultralytics for object detection, segmentation, pose estimation, image classification, and multi-object tracking. |
||||
keywords: computer vision, datasets, Ultralytics, YOLO, object detection, instance segmentation, pose estimation, image classification, multi-object tracking |
||||
--- |
||||
|
||||
# Datasets Overview |
||||
|
||||
Ultralytics provides support for various datasets to facilitate computer vision tasks such as detection, instance segmentation, pose estimation, classification, and multi-object tracking. Below is a list of the main Ultralytics datasets, followed by a summary of each computer vision task and the respective datasets. |
||||
|
||||
## [Detection Datasets](detect/index.md) |
||||
|
||||
Bounding box object detection is a computer vision technique that involves detecting and localizing objects in an image by drawing a bounding box around each object. |
||||
|
||||
- [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations. |
||||
- [COCO](detect/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning with over 200K labeled images. |
||||
- [COCO8](detect/coco8.md): Contains the first 4 images from COCO train and COCO val, suitable for quick tests. |
||||
- [Global Wheat 2020](detect/globalwheat2020.md): A dataset of wheat head images collected from around the world for object detection and localization tasks. |
||||
- [Objects365](detect/objects365.md): A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images. |
||||
- [OpenImagesV7](detect/open-images-v7.md): A comprehensive dataset by Google with 1.7M train images and 42k validation images. |
||||
- [SKU-110K](detect/sku-110k.md): A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes. |
||||
- [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences. |
||||
- [VOC](detect/voc.md): The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images. |
||||
- [xView](detect/xview.md): A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects. |
||||
|
||||
## [Instance Segmentation Datasets](segment/index.md) |
||||
|
||||
Instance segmentation is a computer vision technique that involves identifying and localizing objects in an image at the pixel level. |
||||
|
||||
- [COCO](segment/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images. |
||||
- [COCO8-seg](segment/coco8-seg.md): A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations. |
||||
|
||||
## [Pose Estimation](pose/index.md) |
||||
|
||||
Pose estimation is a technique used to determine the pose of the object relative to the camera or the world coordinate system. |
||||
|
||||
- [COCO](pose/coco.md): A large-scale dataset with human pose annotations designed for pose estimation tasks. |
||||
- [COCO8-pose](pose/coco8-pose.md): A smaller dataset for pose estimation tasks, containing a subset of 8 COCO images with human pose annotations. |
||||
|
||||
## [Classification](classify/index.md) |
||||
|
||||
Image classification is a computer vision task that involves categorizing an image into one or more predefined classes or categories based on its visual content. |
||||
|
||||
- [Caltech 101](classify/caltech101.md): A dataset containing images of 101 object categories for image classification tasks. |
||||
- [Caltech 256](classify/caltech256.md): An extended version of Caltech 101 with 256 object categories and more challenging images. |
||||
- [CIFAR-10](classify/cifar10.md): A dataset of 60K 32x32 color images in 10 classes, with 6K images per class. |
||||
- [CIFAR-100](classify/cifar100.md): An extended version of CIFAR-10 with 100 object categories and 600 images per class. |
||||
- [Fashion-MNIST](classify/fashion-mnist.md): A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks. |
||||
- [ImageNet](classify/imagenet.md): A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories. |
||||
- [ImageNet-10](classify/imagenet10.md): A smaller subset of ImageNet with 10 categories for faster experimentation and testing. |
||||
- [Imagenette](classify/imagenette.md): A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing. |
||||
- [Imagewoof](classify/imagewoof.md): A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks. |
||||
- [MNIST](classify/mnist.md): A dataset of 70,000 grayscale images of handwritten digits for image classification tasks. |
||||
|
||||
## [Oriented Bounding Boxes (OBB)](obb/index.md) |
||||
|
||||
Oriented Bounding Boxes (OBB) is a method in computer vision for detecting angled objects in images using rotated bounding boxes, often applied to aerial and satellite imagery. |
||||
|
||||
- [DOTAv2](obb/dota-v2.md): A popular OBB aerial imagery dataset with 1.7 million instances and 11,268 images. |
||||
|
||||
## [Multi-Object Tracking](track/index.md) |
||||
|
||||
Multi-object tracking is a computer vision technique that involves detecting and tracking multiple objects over time in a video sequence. |
||||
|
||||
- [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations for multi-object tracking tasks. |
||||
- [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences. |
@ -0,0 +1,129 @@ |
||||
--- |
||||
comments: true |
||||
description: Delve into DOTA v2, an Oriented Bounding Box (OBB) aerial imagery dataset with 1.7 million instances and 11,268 images. |
||||
keywords: DOTA v2, object detection, aerial images, computer vision, deep learning, annotations, oriented bounding boxes, OBB |
||||
--- |
||||
|
||||
# DOTA v2 Dataset with OBB |
||||
|
||||
[DOTA v2](https://captain-whu.github.io/DOTA/index.html) stands as a specialized dataset, emphasizing object detection in aerial images. Originating from the DOTA series of datasets, it offers annotated images capturing a diverse array of aerial scenes with Oriented Bounding Boxes (OBB). |
||||
|
||||
 |
||||
|
||||
## Key Features |
||||
|
||||
- Collection from various sensors and platforms, with image sizes ranging from 800 × 800 to 20,000 × 20,000 pixels. |
||||
- Features more than 1.7M Oriented Bounding Boxes across 18 categories. |
||||
- Encompasses multiscale object detection. |
||||
- Instances are annotated by experts using arbitrary (8 d.o.f.) quadrilateral, capturing objects of different scales, orientations, and shapes. |
||||
|
||||
## Dataset Versions |
||||
|
||||
### DOTA-v1.0 |
||||
|
||||
- Contains 15 common categories. |
||||
- Comprises 2,806 images with 188,282 instances. |
||||
- Split ratios: 1/2 for training, 1/6 for validation, and 1/3 for testing. |
||||
|
||||
### DOTA-v1.5 |
||||
|
||||
- Incorporates the same images as DOTA-v1.0. |
||||
- Very small instances (less than 10 pixels) are also annotated. |
||||
- Addition of a new category: "container crane". |
||||
- A total of 403,318 instances. |
||||
- Released for the DOAI Challenge 2019 on Object Detection in Aerial Images. |
||||
|
||||
### DOTA-v2.0 |
||||
|
||||
- Collections from Google Earth, GF-2 Satellite, and other aerial images. |
||||
- Contains 18 common categories. |
||||
- Comprises 11,268 images with a whopping 1,793,658 instances. |
||||
- New categories introduced: "airport" and "helipad". |
||||
- Image splits: |
||||
- Training: 1,830 images with 268,627 instances. |
||||
- Validation: 593 images with 81,048 instances. |
||||
- Test-dev: 2,792 images with 353,346 instances. |
||||
- Test-challenge: 6,053 images with 1,090,637 instances. |
||||
|
||||
## Dataset Structure |
||||
|
||||
DOTA v2 exhibits a structured layout tailored for OBB object detection challenges: |
||||
|
||||
- **Images**: A vast collection of high-resolution aerial images capturing diverse terrains and structures. |
||||
- **Oriented Bounding Boxes**: Annotations in the form of rotated rectangles encapsulating objects irrespective of their orientation, ideal for capturing objects like airplanes, ships, and buildings. |
||||
|
||||
## Applications |
||||
|
||||
DOTA v2 serves as a benchmark for training and evaluating models specifically tailored for aerial image analysis. With the inclusion of OBB annotations, it provides a unique challenge, enabling the development of specialized object detection models that cater to aerial imagery's nuances. |
||||
|
||||
## Dataset YAML |
||||
|
||||
Typically, datasets incorporate a YAML (Yet Another Markup Language) file detailing the dataset's configuration. For DOTA v2, a hypothetical `DOTAv2.yaml` could be used. For accurate paths and configurations, it's vital to consult the dataset's official repository or documentation. |
||||
|
||||
!!! example "DOTAv2.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/DOTAv2.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a model on the DOTA v2 dataset, you can utilize the following code snippets. Always refer to your model's documentation for a thorough list of available arguments. |
||||
|
||||
!!! warning |
||||
|
||||
Please note that all images and associated annotations in the DOTAv2 dataset can be used for academic purposes, but commercial use is prohibited. Your understanding and respect for the dataset creators' wishes are greatly appreciated! |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Create a new YOLOv8n-OBB model from scratch |
||||
model = YOLO('yolov8n-obb.yaml') |
||||
|
||||
# Train the model on the DOTAv2 dataset |
||||
results = model.train(data='DOTAv2.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Train a new YOLOv8n-OBB model on the DOTAv2 dataset |
||||
yolo detect train data=DOTAv2.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Data and Annotations |
||||
|
||||
Having a glance at the dataset illustrates its depth: |
||||
|
||||
 |
||||
|
||||
- **DOTA v2**: This snapshot underlines the complexity of aerial scenes and the significance of Oriented Bounding Box annotations, capturing objects in their natural orientation. |
||||
|
||||
The dataset's richness offers invaluable insights into object detection challenges exclusive to aerial imagery. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
For those leveraging DOTA v2 in their endeavors, it's pertinent to cite the relevant research papers: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@article{9560031, |
||||
author={Ding, Jian and Xue, Nan and Xia, Gui-Song and Bai, Xiang and Yang, Wen and Yang, Michael and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei}, |
||||
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, |
||||
title={Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges}, |
||||
year={2021}, |
||||
volume={}, |
||||
number={}, |
||||
pages={1-1}, |
||||
doi={10.1109/TPAMI.2021.3117983} |
||||
} |
||||
``` |
||||
|
||||
A special note of gratitude to the team behind DOTA v2 for their commendable effort in curating this dataset. For an exhaustive understanding of the dataset and its nuances, please visit the [official DOTA v2 website](https://captain-whu.github.io/DOTA/index.html). |
@ -0,0 +1,80 @@ |
||||
--- |
||||
comments: true |
||||
description: Dive deep into various oriented bounding box (OBB) dataset formats compatible with Ultralytics YOLO models. Grasp the nuances of using and converting datasets to this format. |
||||
keywords: Ultralytics, YOLO, oriented bounding boxes, OBB, dataset formats, label formats, DOTA v2, data conversion |
||||
--- |
||||
|
||||
# Oriented Bounding Box (OBB) Datasets Overview |
||||
|
||||
Training a precise object detection model with oriented bounding boxes (OBB) requires a thorough dataset. This guide explains the various OBB dataset formats compatible with Ultralytics YOLO models, offering insights into their structure, application, and methods for format conversions. |
||||
|
||||
## Supported OBB Dataset Formats |
||||
|
||||
### YOLO OBB Format |
||||
|
||||
The YOLO OBB format designates bounding boxes by their four corner points with coordinates normalized between 0 and 1. It follows this format: |
||||
|
||||
```bash |
||||
class_index, x1, y1, x2, y2, x3, y3, x4, y4 |
||||
``` |
||||
|
||||
Internally, YOLO processes losses and outputs in the `xywhr` format, which represents the bounding box's center point (xy), width, height, and rotation. |
||||
|
||||
<p align="center"><img width="800" src="https://user-images.githubusercontent.com/26833433/259471881-59020fe2-09a4-4dcc-acce-9b0f7cfa40ee.png"></p> |
||||
|
||||
An example of a `*.txt` label file for the above image, which contains an object of class `0` in OBB format, could look like: |
||||
|
||||
```bash |
||||
0 0.780811 0.743961 0.782371 0.74686 0.777691 0.752174 0.776131 0.749758 |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a model using these OBB formats: |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Create a new YOLOv8n-OBB model from scratch |
||||
model = YOLO('yolov8n-obb.yaml') |
||||
|
||||
# Train the model on the DOTAv2 dataset |
||||
results = model.train(data='DOTAv2.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Train a new YOLOv8n-OBB model on the DOTAv2 dataset |
||||
yolo detect train data=DOTAv2.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Supported Datasets |
||||
|
||||
Currently, the following datasets with Oriented Bounding Boxes are supported: |
||||
|
||||
- [**DOTA v2**](./dota-v2.md): DOTA (A Large-scale Dataset for Object Detection in Aerial Images) version 2, emphasizes detection from aerial perspectives and contains oriented bounding boxes with 1.7 million instances and 11,268 images. |
||||
|
||||
### Incorporating your own OBB dataset |
||||
|
||||
For those looking to introduce their own datasets with oriented bounding boxes, ensure compatibility with the "YOLO OBB format" mentioned above. Convert your annotations to this required format and detail the paths, classes, and class names in a corresponding YAML configuration file. |
||||
|
||||
## Convert Label Formats |
||||
|
||||
### DOTA Dataset Format to YOLO OBB Format |
||||
|
||||
Transitioning labels from the DOTA dataset format to the YOLO OBB format can be achieved with this script: |
||||
|
||||
```python |
||||
from ultralytics.data.converter import convert_dota_to_yolo_obb |
||||
|
||||
convert_dota_to_yolo_obb('path/to/DOTA') |
||||
``` |
||||
|
||||
This conversion mechanism is instrumental for datasets in the DOTA format, ensuring alignment with the Ultralytics YOLO OBB format. |
||||
|
||||
It's imperative to validate the compatibility of the dataset with your model and adhere to the necessary format conventions. Properly structured datasets are pivotal for training efficient object detection models with oriented bounding boxes. |
@ -0,0 +1,95 @@ |
||||
--- |
||||
comments: true |
||||
description: Detailed guide on the special COCO-Pose Dataset in Ultralytics. Learn about its key features, structure, and usage in pose estimation tasks with YOLO. |
||||
keywords: Ultralytics YOLO, COCO-Pose Dataset, Deep Learning, Pose Estimation, Training Models, Dataset YAML, openpose, YOLO |
||||
--- |
||||
|
||||
# COCO-Pose Dataset |
||||
|
||||
The [COCO-Pose](https://cocodataset.org/#keypoints-2017) dataset is a specialized version of the COCO (Common Objects in Context) dataset, designed for pose estimation tasks. It leverages the COCO Keypoints 2017 images and labels to enable the training of models like YOLO for pose estimation tasks. |
||||
|
||||
 |
||||
|
||||
## Key Features |
||||
|
||||
- COCO-Pose builds upon the COCO Keypoints 2017 dataset which contains 200K images labeled with keypoints for pose estimation tasks. |
||||
- The dataset supports 17 keypoints for human figures, facilitating detailed pose estimation. |
||||
- Like COCO, it provides standardized evaluation metrics, including Object Keypoint Similarity (OKS) for pose estimation tasks, making it suitable for comparing model performance. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The COCO-Pose dataset is split into three subsets: |
||||
|
||||
1. **Train2017**: This subset contains a portion of the 118K images from the COCO dataset, annotated for training pose estimation models. |
||||
2. **Val2017**: This subset has a selection of images used for validation purposes during model training. |
||||
3. **Test2017**: This subset consists of images used for testing and benchmarking the trained models. Ground truth annotations for this subset are not publicly available, and the results are submitted to the [COCO evaluation server](https://codalab.lisn.upsaclay.fr/competitions/7384) for performance evaluation. |
||||
|
||||
## Applications |
||||
|
||||
The COCO-Pose dataset is specifically used for training and evaluating deep learning models in keypoint detection and pose estimation tasks, such as OpenPose. The dataset's large number of annotated images and standardized evaluation metrics make it an essential resource for computer vision researchers and practitioners focused on pose estimation. |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO-Pose dataset, the `coco-pose.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco-pose.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco-pose.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/coco-pose.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/coco-pose.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n-pose model on the COCO-Pose dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-pose.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='coco-pose.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=coco-pose.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
The COCO-Pose dataset contains a diverse set of images with human figures annotated with keypoints. Here are some examples of images from the dataset, along with their corresponding annotations: |
||||
|
||||
 |
||||
|
||||
- **Mosaiced Image**: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts. |
||||
|
||||
The example showcases the variety and complexity of the images in the COCO-Pose dataset and the benefits of using mosaicing during the training process. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the COCO-Pose dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{lin2015microsoft, |
||||
title={Microsoft COCO: Common Objects in Context}, |
||||
author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár}, |
||||
year={2015}, |
||||
eprint={1405.0312}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge the COCO Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the COCO-Pose dataset and its creators, visit the [COCO dataset website](https://cocodataset.org/#home). |
@ -0,0 +1,84 @@ |
||||
--- |
||||
comments: true |
||||
description: Discover the versatile COCO8-Pose dataset, perfect for testing and debugging pose detection models. Learn how to get started with YOLOv8-pose model training. |
||||
keywords: Ultralytics, YOLOv8, pose detection, COCO8-Pose dataset, dataset, model training, YAML |
||||
--- |
||||
|
||||
# COCO8-Pose Dataset |
||||
|
||||
## Introduction |
||||
|
||||
[Ultralytics](https://ultralytics.com) COCO8-Pose is a small, but versatile pose detection dataset composed of the first |
||||
8 images of the COCO train 2017 set, 4 for training and 4 for validation. This dataset is ideal for testing and |
||||
debugging object detection models, or for experimenting with new detection approaches. With 8 images, it is small enough |
||||
to be easily manageable, yet diverse enough to test training pipelines for errors and act as a sanity check before |
||||
training larger datasets. |
||||
|
||||
This dataset is intended for use with Ultralytics [HUB](https://hub.ultralytics.com) |
||||
and [YOLOv8](https://github.com/ultralytics/ultralytics). |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO8-Pose dataset, the `coco8-pose.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco8-pose.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco8-pose.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/coco8-pose.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/coco8-pose.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n-pose model on the COCO8-Pose dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-pose.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='coco8-pose.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=coco8-pose.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
Here are some examples of images from the COCO8-Pose dataset, along with their corresponding annotations: |
||||
|
||||
<img src="https://user-images.githubusercontent.com/26833433/236818283-52eecb96-fc6a-420d-8a26-d488b352dd4c.jpg" alt="Dataset sample image" width="800"> |
||||
|
||||
- **Mosaiced Image**: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts. |
||||
|
||||
The example showcases the variety and complexity of the images in the COCO8-Pose dataset and the benefits of using mosaicing during the training process. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the COCO dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{lin2015microsoft, |
||||
title={Microsoft COCO: Common Objects in Context}, |
||||
author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár}, |
||||
year={2015}, |
||||
eprint={1405.0312}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge the COCO Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the COCO dataset and its creators, visit the [COCO dataset website](https://cocodataset.org/#home). |
@ -0,0 +1,128 @@ |
||||
--- |
||||
comments: true |
||||
description: Understand the YOLO pose dataset format and learn to use Ultralytics datasets to train your pose estimation models effectively. |
||||
keywords: Ultralytics, YOLO, pose estimation, datasets, training, YAML, keypoints, COCO-Pose, COCO8-Pose, data conversion |
||||
--- |
||||
|
||||
# Pose Estimation Datasets Overview |
||||
|
||||
## Supported Dataset Formats |
||||
|
||||
### Ultralytics YOLO format |
||||
|
||||
** Label Format ** |
||||
|
||||
The dataset format used for training YOLO pose models is as follows: |
||||
|
||||
1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension. |
||||
2. One row per object: Each row in the text file corresponds to one object instance in the image. |
||||
3. Object information per row: Each row contains the following information about the object instance: |
||||
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.). |
||||
- Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1. |
||||
- Object width and height: The width and height of the object, normalized to be between 0 and 1. |
||||
- Object keypoint coordinates: The keypoints of the object, normalized to be between 0 and 1. |
||||
|
||||
Here is an example of the label format for pose estimation task: |
||||
|
||||
Format with Dim = 2 |
||||
|
||||
``` |
||||
<class-index> <x> <y> <width> <height> <px1> <py1> <px2> <py2> ... <pxn> <pyn> |
||||
``` |
||||
|
||||
Format with Dim = 3 |
||||
|
||||
``` |
||||
<class-index> <x> <y> <width> <height> <px1> <py1> <p1-visibility> <px2> <py2> <p2-visibility> <pxn> <pyn> <p2-visibility> |
||||
``` |
||||
|
||||
In this format, `<class-index>` is the index of the class for the object,`<x> <y> <width> <height>` are coordinates of boudning box, and `<px1> <py1> <px2> <py2> ... <pxn> <pyn>` are the pixel coordinates of the keypoints. The coordinates are separated by spaces. |
||||
|
||||
### Dataset YAML format |
||||
|
||||
The Ultralytics framework uses a YAML file format to define the dataset and model configuration for training Detection Models. Here is an example of the YAML format used for defining a detection dataset: |
||||
|
||||
```yaml |
||||
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..] |
||||
path: ../datasets/coco8-pose # dataset root dir |
||||
train: images/train # train images (relative to 'path') 4 images |
||||
val: images/val # val images (relative to 'path') 4 images |
||||
test: # test images (optional) |
||||
|
||||
# Keypoints |
||||
kpt_shape: [17, 3] # number of keypoints, number of dims (2 for x,y or 3 for x,y,visible) |
||||
flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15] |
||||
|
||||
# Classes dictionary |
||||
names: |
||||
0: person |
||||
``` |
||||
|
||||
The `train` and `val` fields specify the paths to the directories containing the training and validation images, respectively. |
||||
|
||||
`names` is a dictionary of class names. The order of the names should match the order of the object class indices in the YOLO dataset files. |
||||
|
||||
(Optional) if the points are symmetric then need flip_idx, like left-right side of human or face. |
||||
For example if we assume five keypoints of facial landmark: [left eye, right eye, nose, left mouth, right mouth], and the original index is [0, 1, 2, 3, 4], then flip_idx is [1, 0, 2, 4, 3] (just exchange the left-right index, i.e 0-1 and 3-4, and do not modify others like nose in this example). |
||||
|
||||
## Usage |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-pose.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='coco128-pose.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=coco128-pose.yaml model=yolov8n-pose.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Supported Datasets |
||||
|
||||
This section outlines the datasets that are compatible with Ultralytics YOLO format and can be used for training pose estimation models: |
||||
|
||||
### COCO-Pose |
||||
|
||||
- **Description**: COCO-Pose is a large-scale object detection, segmentation, and pose estimation dataset. It is a subset of the popular COCO dataset and focuses on human pose estimation. COCO-Pose includes multiple keypoints for each human instance. |
||||
- **Label Format**: Same as Ultralytics YOLO format as described above, with keypoints for human poses. |
||||
- **Number of Classes**: 1 (Human). |
||||
- **Keypoints**: 17 keypoints including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles. |
||||
- **Usage**: Suitable for training human pose estimation models. |
||||
- **Additional Notes**: The dataset is rich and diverse, containing over 200k labeled images. |
||||
- [Read more about COCO-Pose](./coco.md) |
||||
|
||||
### COCO8-Pose |
||||
|
||||
- **Description**: [Ultralytics](https://ultralytics.com) COCO8-Pose is a small, but versatile pose detection dataset composed of the first 8 images of the COCO train 2017 set, 4 for training and 4 for validation. |
||||
- **Label Format**: Same as Ultralytics YOLO format as described above, with keypoints for human poses. |
||||
- **Number of Classes**: 1 (Human). |
||||
- **Keypoints**: 17 keypoints including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles. |
||||
- **Usage**: Suitable for testing and debugging object detection models, or for experimenting with new detection approaches. |
||||
- **Additional Notes**: COCO8-Pose is ideal for sanity checks and CI checks. |
||||
- [Read more about COCO8-Pose](./coco8-pose.md) |
||||
|
||||
### Adding your own dataset |
||||
|
||||
If you have your own dataset and would like to use it for training pose estimation models with Ultralytics YOLO format, ensure that it follows the format specified above under "Ultralytics YOLO format". Convert your annotations to the required format and specify the paths, number of classes, and class names in the YAML configuration file. |
||||
|
||||
### Conversion Tool |
||||
|
||||
Ultralytics provides a convenient conversion tool to convert labels from the popular COCO dataset format to YOLO format: |
||||
|
||||
```python |
||||
from ultralytics.data.converter import convert_coco |
||||
|
||||
convert_coco(labels_dir='../coco/annotations/', use_keypoints=True) |
||||
``` |
||||
|
||||
This conversion tool can be used to convert the COCO dataset or any dataset in the COCO format to the Ultralytics YOLO format. The `use_keypoints` parameter specifies whether to include keypoints (for pose estimation) in the converted labels. |
@ -0,0 +1,94 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore the possibilities of the COCO-Seg dataset, designed for object instance segmentation and YOLO model training. Discover key features, dataset structure, applications, and usage. |
||||
keywords: Ultralytics, YOLO, COCO-Seg, dataset, instance segmentation, model training, deep learning, computer vision |
||||
--- |
||||
|
||||
# COCO-Seg Dataset |
||||
|
||||
The [COCO-Seg](https://cocodataset.org/#home) dataset, an extension of the COCO (Common Objects in Context) dataset, is specially designed to aid research in object instance segmentation. It uses the same images as COCO but introduces more detailed segmentation annotations. This dataset is a crucial resource for researchers and developers working on instance segmentation tasks, especially for training YOLO models. |
||||
|
||||
## Key Features |
||||
|
||||
- COCO-Seg retains the original 330K images from COCO. |
||||
- The dataset consists of the same 80 object categories found in the original COCO dataset. |
||||
- Annotations now include more detailed instance segmentation masks for each object in the images. |
||||
- COCO-Seg provides standardized evaluation metrics like mean Average Precision (mAP) for object detection, and mean Average Recall (mAR) for instance segmentation tasks, enabling effective comparison of model performance. |
||||
|
||||
## Dataset Structure |
||||
|
||||
The COCO-Seg dataset is partitioned into three subsets: |
||||
|
||||
1. **Train2017**: This subset contains 118K images for training instance segmentation models. |
||||
2. **Val2017**: This subset includes 5K images used for validation purposes during model training. |
||||
3. **Test2017**: This subset encompasses 20K images used for testing and benchmarking the trained models. Ground truth annotations for this subset are not publicly available, and the results are submitted to the [COCO evaluation server](https://codalab.lisn.upsaclay.fr/competitions/7383) for performance evaluation. |
||||
|
||||
## Applications |
||||
|
||||
COCO-Seg is widely used for training and evaluating deep learning models in instance segmentation, such as the YOLO models. The large number of annotated images, the diversity of object categories, and the standardized evaluation metrics make it an indispensable resource for computer vision researchers and practitioners. |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO-Seg dataset, the `coco.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/coco.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/coco.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n-seg model on the COCO-Seg dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-seg.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='coco-seg.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=coco-seg.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
COCO-Seg, like its predecessor COCO, contains a diverse set of images with various object categories and complex scenes. However, COCO-Seg introduces more detailed instance segmentation masks for each object in the images. Here are some examples of images from the dataset, along with their corresponding instance segmentation masks: |
||||
|
||||
 |
||||
|
||||
- **Mosaiced Image**: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This aids the model's ability to generalize to different object sizes, aspect ratios, and contexts. |
||||
|
||||
The example showcases the variety and complexity of the images in the COCO-Seg dataset and the benefits of using mosaicing during the training process. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the COCO-Seg dataset in your research or development work, please cite the original COCO paper and acknowledge the extension to COCO-Seg: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{lin2015microsoft, |
||||
title={Microsoft COCO: Common Objects in Context}, |
||||
author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár}, |
||||
year={2015}, |
||||
eprint={1405.0312}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
We extend our thanks to the COCO Consortium for creating and maintaining this invaluable resource for the computer vision community. For more information about the COCO dataset and its creators, visit the [COCO dataset website](https://cocodataset.org/#home). |
@ -0,0 +1,84 @@ |
||||
--- |
||||
comments: true |
||||
description: 'Discover the COCO8-Seg: a compact but versatile instance segmentation dataset ideal for testing Ultralytics YOLOv8 detection approaches. Complete usage guide included.' |
||||
keywords: COCO8-Seg dataset, Ultralytics, YOLOv8, instance segmentation, dataset configuration, YAML, YOLOv8n-seg model, mosaiced dataset images |
||||
--- |
||||
|
||||
# COCO8-Seg Dataset |
||||
|
||||
## Introduction |
||||
|
||||
[Ultralytics](https://ultralytics.com) COCO8-Seg is a small, but versatile instance segmentation dataset composed of the |
||||
first 8 images of the COCO train 2017 set, 4 for training and 4 for validation. This dataset is ideal for testing and |
||||
debugging segmentation models, or for experimenting with new detection approaches. With 8 images, it is small enough to |
||||
be easily manageable, yet diverse enough to test training pipelines for errors and act as a sanity check before training |
||||
larger datasets. |
||||
|
||||
This dataset is intended for use with Ultralytics [HUB](https://hub.ultralytics.com) |
||||
and [YOLOv8](https://github.com/ultralytics/ultralytics). |
||||
|
||||
## Dataset YAML |
||||
|
||||
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO8-Seg dataset, the `coco8-seg.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco8-seg.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco8-seg.yaml). |
||||
|
||||
!!! example "ultralytics/cfg/datasets/coco8-seg.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/coco8-seg.yaml" |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
To train a YOLOv8n-seg model on the COCO8-Seg dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. |
||||
|
||||
!!! example "Train Example" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-seg.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='coco8-seg.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=coco8-seg.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Sample Images and Annotations |
||||
|
||||
Here are some examples of images from the COCO8-Seg dataset, along with their corresponding annotations: |
||||
|
||||
<img src="https://user-images.githubusercontent.com/26833433/236818387-f7bde7df-caaa-46d1-8341-1f7504cd11a1.jpg" alt="Dataset sample image" width="800"> |
||||
|
||||
- **Mosaiced Image**: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts. |
||||
|
||||
The example showcases the variety and complexity of the images in the COCO8-Seg dataset and the benefits of using mosaicing during the training process. |
||||
|
||||
## Citations and Acknowledgments |
||||
|
||||
If you use the COCO dataset in your research or development work, please cite the following paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{lin2015microsoft, |
||||
title={Microsoft COCO: Common Objects in Context}, |
||||
author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár}, |
||||
year={2015}, |
||||
eprint={1405.0312}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge the COCO Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the COCO dataset and its creators, visit the [COCO dataset website](https://cocodataset.org/#home). |
@ -0,0 +1,140 @@ |
||||
--- |
||||
comments: true |
||||
description: Learn how Ultralytics YOLO supports various dataset formats for instance segmentation. This guide includes information on data conversions, auto-annotations, and dataset usage. |
||||
keywords: Ultralytics, YOLO, Instance Segmentation, Dataset, YAML, COCO, Auto-Annotation, Image Segmentation |
||||
--- |
||||
|
||||
# Instance Segmentation Datasets Overview |
||||
|
||||
## Supported Dataset Formats |
||||
|
||||
### Ultralytics YOLO format |
||||
|
||||
** Label Format ** |
||||
|
||||
The dataset format used for training YOLO segmentation models is as follows: |
||||
|
||||
1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension. |
||||
2. One row per object: Each row in the text file corresponds to one object instance in the image. |
||||
3. Object information per row: Each row contains the following information about the object instance: |
||||
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.). |
||||
- Object bounding coordinates: The bounding coordinates around the mask area, normalized to be between 0 and 1. |
||||
|
||||
The format for a single row in the segmentation dataset file is as follows: |
||||
|
||||
``` |
||||
<class-index> <x1> <y1> <x2> <y2> ... <xn> <yn> |
||||
``` |
||||
|
||||
In this format, `<class-index>` is the index of the class for the object, and `<x1> <y1> <x2> <y2> ... <xn> <yn>` are the bounding coordinates of the object's segmentation mask. The coordinates are separated by spaces. |
||||
|
||||
Here is an example of the YOLO dataset format for a single image with two object instances: |
||||
|
||||
``` |
||||
0 0.6812 0.48541 0.67 0.4875 0.67656 0.487 0.675 0.489 0.66 |
||||
1 0.5046 0.0 0.5015 0.004 0.4984 0.00416 0.4937 0.010 0.492 0.0104 |
||||
``` |
||||
|
||||
!!! tip "Tip" |
||||
|
||||
- The length of each row does not have to be equal. |
||||
- Each segmentation label must have a **minimum of 3 xy points**: `<class-index> <x1> <y1> <x2> <y2> <x3> <y3>` |
||||
|
||||
### Dataset YAML format |
||||
|
||||
The Ultralytics framework uses a YAML file format to define the dataset and model configuration for training Detection Models. Here is an example of the YAML format used for defining a detection dataset: |
||||
|
||||
```yaml |
||||
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..] |
||||
path: ../datasets/coco8-seg # dataset root dir |
||||
train: images/train # train images (relative to 'path') 4 images |
||||
val: images/val # val images (relative to 'path') 4 images |
||||
test: # test images (optional) |
||||
|
||||
# Classes (80 COCO classes) |
||||
names: |
||||
0: person |
||||
1: bicycle |
||||
2: car |
||||
... |
||||
77: teddy bear |
||||
78: hair drier |
||||
79: toothbrush |
||||
``` |
||||
|
||||
The `train` and `val` fields specify the paths to the directories containing the training and validation images, respectively. |
||||
|
||||
`names` is a dictionary of class names. The order of the names should match the order of the object class indices in the YOLO dataset files. |
||||
|
||||
## Usage |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-seg.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
results = model.train(data='coco128-seg.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=coco128-seg.yaml model=yolov8n-seg.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Supported Datasets |
||||
|
||||
* [COCO](coco.md): A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images. |
||||
* [COCO8-seg](coco8-seg.md): A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations. |
||||
|
||||
### Adding your own dataset |
||||
|
||||
If you have your own dataset and would like to use it for training segmentation models with Ultralytics YOLO format, ensure that it follows the format specified above under "Ultralytics YOLO format". Convert your annotations to the required format and specify the paths, number of classes, and class names in the YAML configuration file. |
||||
|
||||
## Port or Convert Label Formats |
||||
|
||||
### COCO Dataset Format to YOLO Format |
||||
|
||||
You can easily convert labels from the popular COCO dataset format to the YOLO format using the following code snippet: |
||||
|
||||
```python |
||||
from ultralytics.data.converter import convert_coco |
||||
|
||||
convert_coco(labels_dir='../coco/annotations/', use_segments=True) |
||||
``` |
||||
|
||||
This conversion tool can be used to convert the COCO dataset or any dataset in the COCO format to the Ultralytics YOLO format. |
||||
|
||||
Remember to double-check if the dataset you want to use is compatible with your model and follows the necessary format conventions. Properly formatted datasets are crucial for training successful object detection models. |
||||
|
||||
## Auto-Annotation |
||||
|
||||
Auto-annotation is an essential feature that allows you to generate a segmentation dataset using a pre-trained detection model. It enables you to quickly and accurately annotate a large number of images without the need for manual labeling, saving time and effort. |
||||
|
||||
### Generate Segmentation Dataset Using a Detection Model |
||||
|
||||
To auto-annotate your dataset using the Ultralytics framework, you can use the `auto_annotate` function as shown below: |
||||
|
||||
```python |
||||
from ultralytics.data.annotator import auto_annotate |
||||
|
||||
auto_annotate(data="path/to/images", det_model="yolov8x.pt", sam_model='sam_b.pt') |
||||
``` |
||||
|
||||
| Argument | Type | Description | Default | |
||||
|------------|---------------------|---------------------------------------------------------------------------------------------------------|--------------| |
||||
| data | str | Path to a folder containing images to be annotated. | | |
||||
| det_model | str, optional | Pre-trained YOLO detection model. Defaults to 'yolov8x.pt'. | 'yolov8x.pt' | |
||||
| sam_model | str, optional | Pre-trained SAM segmentation model. Defaults to 'sam_b.pt'. | 'sam_b.pt' | |
||||
| device | str, optional | Device to run the models on. Defaults to an empty string (CPU or GPU, if available). | | |
||||
| output_dir | str, None, optional | Directory to save the annotated results. Defaults to a 'labels' folder in the same directory as 'data'. | None | |
||||
|
||||
The `auto_annotate` function takes the path to your images, along with optional arguments for specifying the pre-trained detection and [SAM segmentation models](https://docs.ultralytics.com/models/sam), the device to run the models on, and the output directory for saving the annotated results. |
||||
|
||||
By leveraging the power of pre-trained models, auto-annotation can significantly reduce the time and effort required for creating high-quality segmentation datasets. This feature is particularly useful for researchers and developers working with large image collections, as it allows them to focus on model development and evaluation rather than manual annotation. |
@ -0,0 +1,30 @@ |
||||
--- |
||||
comments: true |
||||
description: Understand multi-object tracking datasets, upcoming features and how to use them with YOLO in Python and CLI. Dive in now!. |
||||
keywords: Ultralytics, YOLO, multi-object tracking, datasets, detection, segmentation, pose models, Python, CLI |
||||
--- |
||||
|
||||
# Multi-object Tracking Datasets Overview |
||||
|
||||
## Dataset Format (Coming Soon) |
||||
|
||||
Multi-Object Detector doesn't need standalone training and directly supports pre-trained detection, segmentation or Pose models. |
||||
Support for training trackers alone is coming soon |
||||
|
||||
## Usage |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
model = YOLO('yolov8n.pt') |
||||
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", conf=0.3, iou=0.5, show=True) |
||||
``` |
||||
=== "CLI" |
||||
|
||||
```bash |
||||
yolo track model=yolov8n.pt source="https://youtu.be/Zgi9g1ksQHc" conf=0.3, iou=0.5 show |
||||
``` |
@ -0,0 +1,19 @@ |
||||
--- |
||||
comments: true |
||||
description: In-depth exploration of Ultralytics' YOLO. Learn about the YOLO object detection model, how to train it on custom data, multi-GPU training, exporting, predicting, deploying, and more. |
||||
keywords: Ultralytics, YOLO, Deep Learning, Object detection, PyTorch, Tutorial, Multi-GPU training, Custom data training |
||||
--- |
||||
|
||||
# Comprehensive Tutorials to Ultralytics YOLO |
||||
|
||||
Welcome to the Ultralytics' YOLO 🚀 Guides! Our comprehensive tutorials cover various aspects of the YOLO object detection model, ranging from training and prediction to deployment. Built on PyTorch, YOLO stands out for its exceptional speed and accuracy in real-time object detection tasks. |
||||
|
||||
Whether you're a beginner or an expert in deep learning, our tutorials offer valuable insights into the implementation and optimization of YOLO for your computer vision projects. Let's dive in! |
||||
|
||||
## Guides |
||||
|
||||
Here's a compilation of in-depth guides to help you master different aspects of Ultralytics YOLO. |
||||
|
||||
* [K-Fold Cross Validation](kfold-cross-validation.md) 🚀 NEW: Learn how to improve model generalization using K-Fold cross-validation technique. |
||||
|
||||
Note: More guides about training, exporting, predicting, and deploying with Ultralytics YOLO are coming soon. Stay tuned! |
@ -0,0 +1,265 @@ |
||||
--- |
||||
comments: true |
||||
description: An in-depth guide demonstrating the implementation of K-Fold Cross Validation with the Ultralytics ecosystem for object detection datasets, leveraging Python, YOLO, and sklearn. |
||||
keywords: K-Fold cross validation, Ultralytics, YOLO detection format, Python, sklearn, object detection |
||||
--- |
||||
|
||||
# K-Fold Cross Validation with Ultralytics |
||||
|
||||
## Introduction |
||||
|
||||
This comprehensive guide illustrates the implementation of K-Fold Cross Validation for object detection datasets within the Ultralytics ecosystem. We'll leverage the YOLO detection format and key Python libraries such as sklearn, pandas, and PyYaml to guide you through the necessary setup, the process of generating feature vectors, and the execution of a K-Fold dataset split. |
||||
|
||||
<p align="center"> |
||||
<img width="800" src="https://user-images.githubusercontent.com/26833433/258589390-8d815058-ece8-48b9-a94e-0e1ab53ea0f6.png" alt="K-Fold Cross Validation Overview"> |
||||
</p> |
||||
|
||||
Whether your project involves the Fruit Detection dataset or a custom data source, this tutorial aims to help you comprehend and apply K-Fold Cross Validation to bolster the reliability and robustness of your machine learning models. While we're applying `k=5` folds for this tutorial, keep in mind that the optimal number of folds can vary depending on your dataset and the specifics of your project. |
||||
|
||||
Without further ado, let's dive in! |
||||
|
||||
## Setup |
||||
|
||||
- Your annotations should be in the [YOLO detection format](https://docs.ultralytics.com/datasets/detect/). |
||||
|
||||
- This guide assumes that annotation files are locally available. |
||||
|
||||
- For our demonstration, we use the [Fruit Detection](https://www.kaggle.com/datasets/lakshaytyagi01/fruit-detection/code) dataset. |
||||
|
||||
- This dataset contains a total of 8479 images. |
||||
- It includes 6 class labels, each with its total instance counts listed below. |
||||
|
||||
| Class Label | Instance Count | |
||||
|:------------|:--------------:| |
||||
| Apple | 7049 | |
||||
| Grapes | 7202 | |
||||
| Pineapple | 1613 | |
||||
| Orange | 15549 | |
||||
| Banana | 3536 | |
||||
| Watermelon | 1976 | |
||||
|
||||
- Necessary Python packages include: |
||||
|
||||
- `ultralytics` |
||||
- `sklearn` |
||||
- `pandas` |
||||
- `pyyaml` |
||||
|
||||
- This tutorial operates with `k=5` folds. However, you should determine the best number of folds for your specific dataset. |
||||
|
||||
1. Initiate a new Python virtual environment (`venv`) for your project and activate it. Use `pip` (or your preferred package manager) to install: |
||||
|
||||
- The Ultralytics library: `pip install -U ultralytics`. Alternatively, you can clone the official [repo](https://github.com/ultralytics/ultralytics). |
||||
- Scikit-learn, pandas, and PyYAML: `pip install -U scikit-learn pandas pyyaml`. |
||||
|
||||
2. Verify that your annotations are in the [YOLO detection format](https://docs.ultralytics.com/datasets/detect/). |
||||
|
||||
- For this tutorial, all annotation files are found in the `Fruit-Detection/labels` directory. |
||||
|
||||
## Generating Feature Vectors for Object Detection Dataset |
||||
|
||||
1. Start by creating a new Python file and import the required libraries. |
||||
|
||||
```python |
||||
import datetime |
||||
import shutil |
||||
from pathlib import Path |
||||
from collections import Counter |
||||
|
||||
import yaml |
||||
import numpy as np |
||||
import pandas as pd |
||||
from ultralytics import YOLO |
||||
from sklearn.model_selection import KFold |
||||
``` |
||||
|
||||
2. Proceed to retrieve all label files for your dataset. |
||||
|
||||
```python |
||||
dataset_path = Path('./Fruit-detection') # replace with 'path/to/dataset' for your custom data |
||||
labels = sorted(dataset_path.rglob("*labels/*.txt")) # all data in 'labels' |
||||
``` |
||||
|
||||
3. Now, read the contents of the dataset YAML file and extract the indices of the class labels. |
||||
|
||||
```python |
||||
with open(yaml_file, 'r', encoding="utf8") as y: |
||||
classes = yaml.safe_load(y)['names'] |
||||
cls_idx = sorted(classes.keys()) |
||||
``` |
||||
|
||||
4. Initialize an empty `pandas` DataFrame. |
||||
|
||||
```python |
||||
indx = [l.stem for l in labels] # uses base filename as ID (no extension) |
||||
labels_df = pd.DataFrame([], columns=cls_idx, index=indx) |
||||
``` |
||||
|
||||
5. Count the instances of each class-label present in the annotation files. |
||||
|
||||
```python |
||||
for label in labels: |
||||
lbl_counter = Counter() |
||||
|
||||
with open(label,'r') as lf: |
||||
lines = lf.readlines() |
||||
|
||||
for l in lines: |
||||
# classes for YOLO label uses integer at first position of each line |
||||
lbl_counter[int(l.split(' ')[0])] += 1 |
||||
|
||||
labels_df.loc[label.stem] = lbl_counter |
||||
|
||||
labels_df = labels_df.fillna(0.0) # replace `nan` values with `0.0` |
||||
``` |
||||
|
||||
6. The following is a sample view of the populated DataFrame: |
||||
|
||||
```pandas |
||||
0 1 2 3 4 5 |
||||
'0000a16e4b057580_jpg.rf.00ab48988370f64f5ca8ea4...' 0.0 0.0 0.0 0.0 0.0 7.0 |
||||
'0000a16e4b057580_jpg.rf.7e6dce029fb67f01eb19aa7...' 0.0 0.0 0.0 0.0 0.0 7.0 |
||||
'0000a16e4b057580_jpg.rf.bc4d31cdcbe229dd022957a...' 0.0 0.0 0.0 0.0 0.0 7.0 |
||||
'00020ebf74c4881c_jpg.rf.508192a0a97aa6c4a3b6882...' 0.0 0.0 0.0 1.0 0.0 0.0 |
||||
'00020ebf74c4881c_jpg.rf.5af192a2254c8ecc4188a25...' 0.0 0.0 0.0 1.0 0.0 0.0 |
||||
... ... ... ... ... ... ... |
||||
'ff4cd45896de38be_jpg.rf.c4b5e967ca10c7ced3b9e97...' 0.0 0.0 0.0 0.0 0.0 2.0 |
||||
'ff4cd45896de38be_jpg.rf.ea4c1d37d2884b3e3cbce08...' 0.0 0.0 0.0 0.0 0.0 2.0 |
||||
'ff5fd9c3c624b7dc_jpg.rf.bb519feaa36fc4bf630a033...' 1.0 0.0 0.0 0.0 0.0 0.0 |
||||
'ff5fd9c3c624b7dc_jpg.rf.f0751c9c3aa4519ea3c9d6a...' 1.0 0.0 0.0 0.0 0.0 0.0 |
||||
'fffe28b31f2a70d4_jpg.rf.7ea16bd637ba0711c53b540...' 0.0 6.0 0.0 0.0 0.0 0.0 |
||||
``` |
||||
|
||||
The rows index the label files, each corresponding to an image in your dataset, and the columns correspond to your class-label indices. Each row represents a pseudo feature-vector, with the count of each class-label present in your dataset. This data structure enables the application of K-Fold Cross Validation to an object detection dataset. |
||||
|
||||
## K-Fold Dataset Split |
||||
|
||||
1. Now we will use the `KFold` class from `sklearn.model_selection` to generate `k` splits of the dataset. |
||||
|
||||
- Important: |
||||
- Setting `shuffle=True` ensures a randomized distribution of classes in your splits. |
||||
- By setting `random_state=M` where `M` is a chosen integer, you can obtain repeatable results. |
||||
|
||||
```python |
||||
ksplit = 5 |
||||
kf = KFold(n_splits=ksplit, shuffle=True, random_state=20) # setting random_state for repeatable results |
||||
|
||||
kfolds = list(kf.split(labels_df)) |
||||
``` |
||||
|
||||
2. The dataset has now been split into `k` folds, each having a list of `train` and `val` indices. We will construct a DataFrame to display these results more clearly. |
||||
|
||||
```python |
||||
folds = [f'split_{n}' for n in range(1, ksplit + 1)] |
||||
folds_df = pd.DataFrame(index=indx, columns=folds) |
||||
|
||||
for idx, (train, val) in enumerate(kfolds, start=1): |
||||
folds_df[f'split_{idx}'].loc[labels_df.iloc[train].index] = 'train' |
||||
folds_df[f'split_{idx}'].loc[labels_df.iloc[val].index] = 'val' |
||||
``` |
||||
|
||||
3. Now we will calculate the distribution of class labels for each fold as a ratio of the classes present in `val` to those present in `train`. |
||||
|
||||
```python |
||||
fold_lbl_distrb = pd.DataFrame(index=folds, columns=cls_idx) |
||||
|
||||
for n, (train_indices, val_indices) in enumerate(kfolds, start=1): |
||||
train_totals = labels_df.iloc[train_indices].sum() |
||||
val_totals = labels_df.iloc[val_indices].sum() |
||||
|
||||
# To avoid division by zero, we add a small value (1E-7) to the denominator |
||||
ratio = val_totals / (train_totals + 1E-7) |
||||
fold_lbl_distrb.loc[f'split_{n}'] = ratio |
||||
``` |
||||
|
||||
The ideal scenario is for all class ratios to be reasonably similar for each split and across classes. This, however, will be subject to the specifics of your dataset. |
||||
|
||||
4. Next, we create the directories and dataset YAML files for each split. |
||||
|
||||
```python |
||||
save_path = Path(dataset_path / f'{datetime.date.today().isoformat()}_{ksplit}-Fold_Cross-val') |
||||
save_path.mkdir(parents=True, exist_ok=True) |
||||
|
||||
images = sorted((dataset_path / 'images').rglob("*.jpg")) # change file extension as needed |
||||
ds_yamls = [] |
||||
|
||||
for split in folds_df.columns: |
||||
# Create directories |
||||
split_dir = save_path / split |
||||
split_dir.mkdir(parents=True, exist_ok=True) |
||||
(split_dir / 'train' / 'images').mkdir(parents=True, exist_ok=True) |
||||
(split_dir / 'train' / 'labels').mkdir(parents=True, exist_ok=True) |
||||
(split_dir / 'val' / 'images').mkdir(parents=True, exist_ok=True) |
||||
(split_dir / 'val' / 'labels').mkdir(parents=True, exist_ok=True) |
||||
|
||||
# Create dataset YAML files |
||||
dataset_yaml = split_dir / f'{split}_dataset.yaml' |
||||
ds_yamls.append(dataset_yaml) |
||||
|
||||
with open(dataset_yaml, 'w') as ds_y: |
||||
yaml.safe_dump({ |
||||
'path': split_dir.as_posix(), |
||||
'train': 'train', |
||||
'val': 'val', |
||||
'names': classes |
||||
}, ds_y) |
||||
``` |
||||
|
||||
5. Lastly, copy images and labels into the respective directory ('train' or 'val') for each split. |
||||
|
||||
- __NOTE:__ The time required for this portion of the code will vary based on the size of your dataset and your system hardware. |
||||
|
||||
```python |
||||
for image, label in zip(images, labels): |
||||
for split, k_split in folds_df.loc[image.stem].items(): |
||||
# Destination directory |
||||
img_to_path = save_path / split / k_split / 'images' |
||||
lbl_to_path = save_path / split / k_split / 'labels' |
||||
|
||||
# Copy image and label files to new directory |
||||
# Might throw a SamefileError if file already exists |
||||
shutil.copy(image, img_to_path / image.name) |
||||
shutil.copy(label, lbl_to_path / label.name) |
||||
``` |
||||
|
||||
## Save Records (Optional) |
||||
|
||||
Optionally, you can save the records of the K-Fold split and label distribution DataFrames as CSV files for future reference. |
||||
|
||||
```python |
||||
folds_df.to_csv(save_path / "kfold_datasplit.csv") |
||||
fold_lbl_distrb.to_csv(save_path / "kfold_label_distribution.csv") |
||||
``` |
||||
|
||||
## Train YOLO using K-Fold Data Splits |
||||
|
||||
1. First, load the YOLO model. |
||||
|
||||
```python |
||||
weights_path = 'path/to/weights.pt' |
||||
model = YOLO(weights_path, task='detect') |
||||
``` |
||||
|
||||
2. Next, iterate over the dataset YAML files to run training. The results will be saved to a directory specified by the `project` and `name` arguments. By default, this directory is 'exp/runs#' where # is an integer index. |
||||
|
||||
```python |
||||
results = {} |
||||
for k in range(ksplit): |
||||
dataset_yaml = ds_yamls[k] |
||||
model.train(data=dataset_yaml, *args, **kwargs) # Include any training arguments |
||||
results[k] = model.metrics # save output metrics for further analysis |
||||
``` |
||||
|
||||
## Conclusion |
||||
|
||||
In this guide, we have explored the process of using K-Fold cross-validation for training the YOLO object detection model. We learned how to split our dataset into K partitions, ensuring a balanced class distribution across the different folds. |
||||
|
||||
We also explored the procedure for creating report DataFrames to visualize the data splits and label distributions across these splits, providing us a clear insight into the structure of our training and validation sets. |
||||
|
||||
Optionally, we saved our records for future reference, which could be particularly useful in large-scale projects or when troubleshooting model performance. |
||||
|
||||
Finally, we implemented the actual model training using each split in a loop, saving our training results for further analysis and comparison. |
||||
|
||||
This technique of K-Fold cross-validation is a robust way of making the most out of your available data, and it helps to ensure that your model performance is reliable and consistent across different data subsets. This results in a more generalizable and reliable model that is less likely to overfit to specific data patterns. |
||||
|
||||
Remember that although we used YOLO in this guide, these steps are mostly transferable to other machine learning models. Understanding these steps allows you to apply cross-validation effectively in your own machine learning projects. Happy coding! |
@ -0,0 +1,62 @@ |
||||
--- |
||||
comments: true |
||||
description: Learn how Ultralytics leverages Continuous Integration (CI) for maintaining high-quality code. Explore our CI tests and the status of these tests for our repositories. |
||||
keywords: continuous integration, software development, CI tests, Ultralytics repositories, high-quality code, Docker Deployment, Broken Links, CodeQL, PyPi Publishing |
||||
--- |
||||
|
||||
# Continuous Integration (CI) |
||||
|
||||
Continuous Integration (CI) is an essential aspect of software development which involves integrating changes and testing them automatically. CI allows us to maintain high-quality code by catching issues early and often in the development process. At Ultralytics, we use various CI tests to ensure the quality and integrity of our codebase. |
||||
|
||||
## CI Actions |
||||
|
||||
Here's a brief description of our CI actions: |
||||
|
||||
- **CI:** This is our primary CI test that involves running unit tests, linting checks, and sometimes more comprehensive tests depending on the repository. |
||||
- **Docker Deployment:** This test checks the deployment of the project using Docker to ensure the Dockerfile and related scripts are working correctly. |
||||
- **Broken Links:** This test scans the codebase for any broken or dead links in our markdown or HTML files. |
||||
- **CodeQL:** CodeQL is a tool from GitHub that performs semantic analysis on our code, helping to find potential security vulnerabilities and maintain high-quality code. |
||||
- **PyPi Publishing:** This test checks if the project can be packaged and published to PyPi without any errors. |
||||
|
||||
### CI Results |
||||
|
||||
Below is the table showing the status of these CI tests for our main repositories: |
||||
|
||||
| Repository | CI | Docker Deployment | Broken Links | CodeQL | PyPi and Docs Publishing | |
||||
|-----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
||||
| [yolov3](https://github.com/ultralytics/yolov3) | [](https://github.com/ultralytics/yolov3/actions/workflows/ci-testing.yml) | [](https://github.com/ultralytics/yolov3/actions/workflows/docker.yml) | [](https://github.com/ultralytics/yolov3/actions/workflows/links.yml) | [](https://github.com/ultralytics/yolov3/actions/workflows/codeql-analysis.yml) | | |
||||
| [yolov5](https://github.com/ultralytics/yolov5) | [](https://github.com/ultralytics/yolov5/actions/workflows/ci-testing.yml) | [](https://github.com/ultralytics/yolov5/actions/workflows/docker.yml) | [](https://github.com/ultralytics/yolov5/actions/workflows/links.yml) | [](https://github.com/ultralytics/yolov5/actions/workflows/codeql-analysis.yml) | | |
||||
| [ultralytics](https://github.com/ultralytics/ultralytics) | [](https://github.com/ultralytics/ultralytics/actions/workflows/ci.yaml) | [](https://github.com/ultralytics/ultralytics/actions/workflows/docker.yaml) | [](https://github.com/ultralytics/ultralytics/actions/workflows/links.yml) | [](https://github.com/ultralytics/ultralytics/actions/workflows/codeql.yaml) | [](https://github.com/ultralytics/ultralytics/actions/workflows/publish.yml) | |
||||
| [hub](https://github.com/ultralytics/hub) | [](https://github.com/ultralytics/hub/actions/workflows/ci.yaml) | | [](https://github.com/ultralytics/hub/actions/workflows/links.yml) | | | |
||||
| [docs](https://github.com/ultralytics/docs) | | | | | [](https://github.com/ultralytics/docs/actions/workflows/pages/pages-build-deployment) | |
||||
|
||||
Each badge shows the status of the last run of the corresponding CI test on the `main` branch of the respective repository. If a test fails, the badge will display a "failing" status, and if it passes, it will display a "passing" status. |
||||
|
||||
If you notice a test failing, it would be a great help if you could report it through a GitHub issue in the respective repository. |
||||
|
||||
Remember, a successful CI test does not mean that everything is perfect. It is always recommended to manually review the code before deployment or merging changes. |
||||
|
||||
## Code Coverage |
||||
|
||||
Code coverage is a metric that represents the percentage of your codebase that is executed when your tests run. It provides insight into how well your tests exercise your code and can be crucial in identifying untested parts of your application. A high code coverage percentage is often associated with a lower likelihood of bugs. However, it's essential to understand that code coverage doesn't guarantee the absence of defects. It merely indicates which parts of the code have been executed by the tests. |
||||
|
||||
### Integration with [codecov.io](https://codecov.io/) |
||||
|
||||
At Ultralytics, we have integrated our repositories with [codecov.io](https://codecov.io/), a popular online platform for measuring and visualizing code coverage. Codecov provides detailed insights, coverage comparisons between commits, and visual overlays directly on your code, indicating which lines were covered. |
||||
|
||||
By integrating with Codecov, we aim to maintain and improve the quality of our code by focusing on areas that might be prone to errors or need further testing. |
||||
|
||||
### Coverage Results |
||||
|
||||
To quickly get a glimpse of the code coverage status of the `ultralytics` python package, we have included a badge and and sunburst visual of the `ultralytics` coverage results. These images show the percentage of code covered by our tests, offering an at-a-glance metric of our testing efforts. For full details please see https://codecov.io/github/ultralytics/ultralytics. |
||||
|
||||
| Repository | Code Coverage | |
||||
|-----------------------------------------------------------|----------------------------------------------------------------------| |
||||
| [ultralytics](https://github.com/ultralytics/ultralytics) | [](https://codecov.io/gh/ultralytics/ultralytics) | |
||||
|
||||
In the sunburst graphic below, the inner-most circle is the entire project, moving away from the center are folders then, finally, a single file. The size and color of each slice is representing the number of statements and the coverage, respectively. |
||||
|
||||
<a href="https://codecov.io/github/ultralytics/ultralytics"> |
||||
<img src="https://codecov.io/gh/ultralytics/ultralytics/branch/main/graphs/sunburst.svg?token=HHW7IIVFVY" alt="Ultralytics Codecov Image"> |
||||
</a> |
||||
|
@ -0,0 +1,37 @@ |
||||
--- |
||||
comments: false |
||||
description: Discover Ultralytics’ EHS policy principles and implementation measures. Committed to safety, environment, and continuous improvement for a sustainable future. |
||||
keywords: Ultralytics policy, EHS, environment, health and safety, compliance, prevention, continuous improvement, risk management, emergency preparedness, resource allocation, communication |
||||
--- |
||||
|
||||
# Ultralytics Environmental, Health and Safety (EHS) Policy |
||||
|
||||
At Ultralytics, we recognize that the long-term success of our company relies not only on the products and services we offer, but also the manner in which we conduct our business. We are committed to ensuring the safety and well-being of our employees, stakeholders, and the environment, and we will continuously strive to mitigate our impact on the environment while promoting health and safety. |
||||
|
||||
## Policy Principles |
||||
|
||||
1. **Compliance**: We will comply with all applicable laws, regulations, and standards related to EHS, and we will strive to exceed these standards where possible. |
||||
|
||||
2. **Prevention**: We will work to prevent accidents, injuries, and environmental harm by implementing risk management measures and ensuring all our operations and procedures are safe. |
||||
|
||||
3. **Continuous Improvement**: We will continuously improve our EHS performance by setting measurable objectives, monitoring our performance, auditing our operations, and revising our policies and procedures as needed. |
||||
|
||||
4. **Communication**: We will communicate openly about our EHS performance and will engage with stakeholders to understand and address their concerns and expectations. |
||||
|
||||
5. **Education and Training**: We will educate and train our employees and contractors in appropriate EHS procedures and practices. |
||||
|
||||
## Implementation Measures |
||||
|
||||
1. **Responsibility and Accountability**: Every employee and contractor working at or with Ultralytics is responsible for adhering to this policy. Managers and supervisors are accountable for ensuring this policy is implemented within their areas of control. |
||||
|
||||
2. **Risk Management**: We will identify, assess, and manage EHS risks associated with our operations and activities to prevent accidents, injuries, and environmental harm. |
||||
|
||||
3. **Resource Allocation**: We will allocate the necessary resources to ensure the effective implementation of our EHS policy, including the necessary equipment, personnel, and training. |
||||
|
||||
4. **Emergency Preparedness and Response**: We will develop, maintain, and test emergency preparedness and response plans to ensure we can respond effectively to EHS incidents. |
||||
|
||||
5. **Monitoring and Review**: We will monitor and review our EHS performance regularly to identify opportunities for improvement and ensure we are meeting our objectives. |
||||
|
||||
This policy reflects our commitment to minimizing our environmental footprint, ensuring the safety and well-being of our employees, and continuously improving our performance. |
||||
|
||||
Please remember that the implementation of an effective EHS policy requires the involvement and commitment of everyone working at or with Ultralytics. We encourage you to take personal responsibility for your safety and the safety of others, and to take care of the environment in which we live and work. |
@ -1,14 +1,18 @@ |
||||
--- |
||||
comments: true |
||||
description: Find comprehensive guides and documents on Ultralytics YOLO tasks. Includes FAQs, contributing guides, CI guide, CLA, MRE guide, code of conduct & more. |
||||
keywords: Ultralytics, YOLO, guides, documents, FAQ, contributing, CI guide, CLA, MRE guide, code of conduct, EHS policy, security policy |
||||
--- |
||||
|
||||
Welcome to the Ultralytics Help page! We are committed to providing you with comprehensive resources to make your experience with Ultralytics YOLO repositories as smooth and enjoyable as possible. On this page, you'll find essential links to guides and documents that will help you navigate through common tasks and address any questions you might have while using our repositories. |
||||
|
||||
- [Frequently Asked Questions (FAQ)](FAQ.md): Find answers to common questions and issues faced by users and contributors of Ultralytics YOLO repositories. |
||||
- [Contributing Guide](contributing.md): Learn the best practices for submitting pull requests, reporting bugs, and contributing to the development of our repositories. |
||||
- [Continuous Integration (CI) Guide](CI.md): Understand the CI tests we perform for each Ultralytics repository and see their current statuses. |
||||
- [Contributor License Agreement (CLA)](CLA.md): Familiarize yourself with our CLA to understand the terms and conditions for contributing to Ultralytics projects. |
||||
- [Minimum Reproducible Example (MRE) Guide](minimum_reproducible_example.md): Understand how to create an MRE when submitting bug reports to ensure that our team can quickly and efficiently address the issue. |
||||
- [Code of Conduct](code_of_conduct.md): Learn about our community guidelines and expectations to ensure a welcoming and inclusive environment for all participants. |
||||
- [Environmental, Health and Safety (EHS) Policy](environmental-health-safety.md): Explore Ultralytics' dedicated approach towards maintaining a sustainable, safe, and healthy work environment for all our stakeholders. |
||||
- [Security Policy](../SECURITY.md): Understand our security practices and how to report security vulnerabilities responsibly. |
||||
|
||||
We highly recommend going through these guides to make the most of your collaboration with the Ultralytics community. Our goal is to maintain a welcoming and supportive environment for all users and contributors. If you need further assistance, don't hesitate to reach out to us through GitHub Issues or the official discussion forum. Happy coding! |
||||
We highly recommend going through these guides to make the most of your collaboration with the Ultralytics community. Our goal is to maintain a welcoming and supportive environment for all users and contributors. If you need further assistance, don't hesitate to reach out to us through GitHub Issues or the official discussion forum. Happy coding! |
||||
|
@ -1,116 +0,0 @@ |
||||
--- |
||||
comments: true |
||||
--- |
||||
|
||||
# Ultralytics HUB |
||||
|
||||
<a href="https://bit.ly/ultralytics_hub" target="_blank"> |
||||
<img width="100%" src="https://github.com/ultralytics/assets/raw/main/im/ultralytics-hub.png"></a> |
||||
<br> |
||||
<div align="center"> |
||||
<a href="https://github.com/ultralytics" style="text-decoration:none;"> |
||||
<img src="https://github.com/ultralytics/assets/raw/main/social/logo-social-github.png" width="2%" alt="" /></a> |
||||
<img src="https://github.com/ultralytics/assets/raw/main/social/logo-transparent.png" width="2%" alt="" /> |
||||
<a href="https://www.linkedin.com/company/ultralytics" style="text-decoration:none;"> |
||||
<img src="https://github.com/ultralytics/assets/raw/main/social/logo-social-linkedin.png" width="2%" alt="" /></a> |
||||
<img src="https://github.com/ultralytics/assets/raw/main/social/logo-transparent.png" width="2%" alt="" /> |
||||
<a href="https://twitter.com/ultralytics" style="text-decoration:none;"> |
||||
<img src="https://github.com/ultralytics/assets/raw/main/social/logo-social-twitter.png" width="2%" alt="" /></a> |
||||
<img src="https://github.com/ultralytics/assets/raw/main/social/logo-transparent.png" width="2%" alt="" /> |
||||
<a href="https://youtube.com/ultralytics" style="text-decoration:none;"> |
||||
<img src="https://github.com/ultralytics/assets/raw/main/social/logo-social-youtube.png" width="2%" alt="" /></a> |
||||
<img src="https://github.com/ultralytics/assets/raw/main/social/logo-transparent.png" width="2%" alt="" /> |
||||
<a href="https://www.tiktok.com/@ultralytics" style="text-decoration:none;"> |
||||
<img src="https://github.com/ultralytics/assets/raw/main/social/logo-social-tiktok.png" width="2%" alt="" /></a> |
||||
<img src="https://github.com/ultralytics/assets/raw/main/social/logo-transparent.png" width="2%" alt="" /> |
||||
<a href="https://www.instagram.com/ultralytics/" style="text-decoration:none;"> |
||||
<img src="https://github.com/ultralytics/assets/raw/main/social/logo-social-instagram.png" width="2%" alt="" /></a> |
||||
<br> |
||||
<br> |
||||
<a href="https://github.com/ultralytics/hub/actions/workflows/ci.yaml"> |
||||
<img src="https://github.com/ultralytics/hub/actions/workflows/ci.yaml/badge.svg" alt="CI CPU"></a> |
||||
<a href="https://colab.research.google.com/github/ultralytics/hub/blob/master/hub.ipynb"> |
||||
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> |
||||
</div> |
||||
|
||||
|
||||
[Ultralytics HUB](https://hub.ultralytics.com) is a new no-code online tool developed |
||||
by [Ultralytics](https://ultralytics.com), the creators of the popular [YOLOv5](https://github.com/ultralytics/yolov5) |
||||
object detection and image segmentation models. With Ultralytics HUB, users can easily train and deploy YOLO models |
||||
without any coding or technical expertise. |
||||
|
||||
Ultralytics HUB is designed to be user-friendly and intuitive, with a drag-and-drop interface that allows users to |
||||
easily upload their data and select their model configurations. It also offers a range of pre-trained models and |
||||
templates to choose from, making it easy for users to get started with training their own models. Once a model is |
||||
trained, it can be easily deployed and used for real-time object detection and image segmentation tasks. Overall, |
||||
Ultralytics HUB is an essential tool for anyone looking to use YOLO for their object detection and image segmentation |
||||
projects. |
||||
|
||||
**[Get started now](https://hub.ultralytics.com)** and experience the power and simplicity of Ultralytics HUB for |
||||
yourself. Sign up for a free account and start building, training, and deploying YOLOv5 and YOLOv8 models today. |
||||
|
||||
## 1. Upload a Dataset |
||||
|
||||
Ultralytics HUB datasets are just like YOLOv5 🚀 datasets, they use the same structure and the same label formats to keep |
||||
everything simple. |
||||
|
||||
When you upload a dataset to Ultralytics HUB, make sure to **place your dataset YAML inside the dataset root directory** |
||||
as in the example shown below, and then zip for upload to https://hub.ultralytics.com/. Your **dataset YAML, directory |
||||
and zip** should all share the same name. For example, if your dataset is called 'coco6' as in our |
||||
example [ultralytics/hub/coco6.zip](https://github.com/ultralytics/hub/blob/master/coco6.zip), then you should have a |
||||
coco6.yaml inside your coco6/ directory, which should zip to create coco6.zip for upload: |
||||
|
||||
```bash |
||||
zip -r coco6.zip coco6 |
||||
``` |
||||
|
||||
The example [coco6.zip](https://github.com/ultralytics/hub/blob/master/coco6.zip) dataset in this repository can be |
||||
downloaded and unzipped to see exactly how to structure your custom dataset. |
||||
|
||||
<p align="center"> |
||||
<img width="80%" src="https://user-images.githubusercontent.com/26833433/201424843-20fa081b-ad4b-4d6c-a095-e810775908d8.png" title="COCO6" /> |
||||
</p> |
||||
|
||||
The dataset YAML is the same standard YOLOv5 YAML format. See |
||||
the [YOLOv5 Train Custom Data tutorial](https://docs.ultralytics.com/yolov5/tutorials/train_custom_data) for full details. |
||||
|
||||
```yaml |
||||
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..] |
||||
path: # dataset root dir (leave empty for HUB) |
||||
train: images/train # train images (relative to 'path') 8 images |
||||
val: images/val # val images (relative to 'path') 8 images |
||||
test: # test images (optional) |
||||
|
||||
# Classes |
||||
names: |
||||
0: person |
||||
1: bicycle |
||||
2: car |
||||
3: motorcycle |
||||
... |
||||
``` |
||||
|
||||
After zipping your dataset, sign in to [Ultralytics HUB](https://bit.ly/ultralytics_hub) and click the Datasets tab. |
||||
Click 'Upload Dataset' to upload, scan and visualize your new dataset before training new YOLOv5 models on it! |
||||
|
||||
<img width="100%" alt="HUB Dataset Upload" src="https://user-images.githubusercontent.com/26833433/216763338-9a8812c8-a4e5-4362-8102-40dad7818396.png"> |
||||
|
||||
## 2. Train a Model |
||||
|
||||
Connect to the Ultralytics HUB notebook and use your model API key to begin training! |
||||
|
||||
<a href="https://colab.research.google.com/github/ultralytics/hub/blob/master/hub.ipynb" target="_blank"> |
||||
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> |
||||
|
||||
## 3. Deploy to Real World |
||||
|
||||
Export your model to 13 different formats, including TensorFlow, ONNX, OpenVINO, CoreML, Paddle and many others. Run |
||||
models directly on your [iOS](https://apps.apple.com/xk/app/ultralytics/id1583935240) or |
||||
[Android](https://play.google.com/store/apps/details?id=com.ultralytics.ultralytics_app) mobile device by downloading |
||||
the [Ultralytics App](https://ultralytics.com/app_install)! |
||||
|
||||
## ❓ Issues |
||||
|
||||
If you are a new [Ultralytics HUB](https://bit.ly/ultralytics_hub) user and have questions or comments, you are in the |
||||
right place! Please raise a [New Issue](https://github.com/ultralytics/hub/issues/new/choose) and let us know what we |
||||
can do to make your life better 😃! |
@ -0,0 +1,66 @@ |
||||
--- |
||||
comments: true |
||||
description: Learn about the Ultralytics Android App, enabling real-time object detection using YOLO models. Discover in-app features, quantization methods, and delegate options for optimal performance. |
||||
keywords: Ultralytics, Android App, real-time object detection, YOLO models, TensorFlow Lite, FP16 quantization, INT8 quantization, CPU, GPU, Hexagon, NNAPI |
||||
--- |
||||
|
||||
# Ultralytics Android App: Real-time Object Detection with YOLO Models |
||||
|
||||
The Ultralytics Android App is a powerful tool that allows you to run YOLO models directly on your Android device for real-time object detection. This app utilizes TensorFlow Lite for model optimization and various hardware delegates for acceleration, enabling fast and efficient object detection. |
||||
|
||||
## Quantization and Acceleration |
||||
|
||||
To achieve real-time performance on your Android device, YOLO models are quantized to either FP16 or INT8 precision. Quantization is a process that reduces the numerical precision of the model's weights and biases, thus reducing the model's size and the amount of computation required. This results in faster inference times without significantly affecting the model's accuracy. |
||||
|
||||
### FP16 Quantization |
||||
|
||||
FP16 (or half-precision) quantization converts the model's 32-bit floating-point numbers to 16-bit floating-point numbers. This reduces the model's size by half and speeds up the inference process, while maintaining a good balance between accuracy and performance. |
||||
|
||||
### INT8 Quantization |
||||
|
||||
INT8 (or 8-bit integer) quantization further reduces the model's size and computation requirements by converting its 32-bit floating-point numbers to 8-bit integers. This quantization method can result in a significant speedup, but it may lead to a slight reduction in mean average precision (mAP) due to the lower numerical precision. |
||||
|
||||
!!! tip "mAP Reduction in INT8 Models" |
||||
|
||||
The reduced numerical precision in INT8 models can lead to some loss of information during the quantization process, which may result in a slight decrease in mAP. However, this trade-off is often acceptable considering the substantial performance gains offered by INT8 quantization. |
||||
|
||||
## Delegates and Performance Variability |
||||
|
||||
Different delegates are available on Android devices to accelerate model inference. These delegates include CPU, [GPU](https://www.tensorflow.org/lite/android/delegates/gpu), [Hexagon](https://www.tensorflow.org/lite/android/delegates/hexagon) and [NNAPI](https://www.tensorflow.org/lite/android/delegates/nnapi). The performance of these delegates varies depending on the device's hardware vendor, product line, and specific chipsets used in the device. |
||||
|
||||
1. **CPU**: The default option, with reasonable performance on most devices. |
||||
2. **GPU**: Utilizes the device's GPU for faster inference. It can provide a significant performance boost on devices with powerful GPUs. |
||||
3. **Hexagon**: Leverages Qualcomm's Hexagon DSP for faster and more efficient processing. This option is available on devices with Qualcomm Snapdragon processors. |
||||
4. **NNAPI**: The Android Neural Networks API (NNAPI) serves as an abstraction layer for running ML models on Android devices. NNAPI can utilize various hardware accelerators, such as CPU, GPU, and dedicated AI chips (e.g., Google's Edge TPU, or the Pixel Neural Core). |
||||
|
||||
Here's a table showing the primary vendors, their product lines, popular devices, and supported delegates: |
||||
|
||||
| Vendor | Product Lines | Popular Devices | Delegates Supported | |
||||
|-----------------------------------------|---------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------| |
||||
| [Qualcomm](https://www.qualcomm.com/) | [Snapdragon (e.g., 800 series)](https://www.qualcomm.com/snapdragon) | [Samsung Galaxy S21](https://www.samsung.com/global/galaxy/galaxy-s21-5g/), [OnePlus 9](https://www.oneplus.com/9), [Google Pixel 6](https://store.google.com/product/pixel_6) | CPU, GPU, Hexagon, NNAPI | |
||||
| [Samsung](https://www.samsung.com/) | [Exynos (e.g., Exynos 2100)](https://www.samsung.com/semiconductor/minisite/exynos/) | [Samsung Galaxy S21 (Global version)](https://www.samsung.com/global/galaxy/galaxy-s21-5g/) | CPU, GPU, NNAPI | |
||||
| [MediaTek](https://www.mediatek.com/) | [Dimensity (e.g., Dimensity 1200)](https://www.mediatek.com/products/smartphones) | [Realme GT](https://www.realme.com/global/realme-gt), [Xiaomi Redmi Note](https://www.mi.com/en/phone/redmi/note-list) | CPU, GPU, NNAPI | |
||||
| [HiSilicon](https://www.hisilicon.com/) | [Kirin (e.g., Kirin 990)](https://www.hisilicon.com/en/products/Kirin) | [Huawei P40 Pro](https://consumer.huawei.com/en/phones/p40-pro/), [Huawei Mate 30 Pro](https://consumer.huawei.com/en/phones/mate30-pro/) | CPU, GPU, NNAPI | |
||||
| [NVIDIA](https://www.nvidia.com/) | [Tegra (e.g., Tegra X1)](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems-dev-kits-modules/) | [NVIDIA Shield TV](https://www.nvidia.com/en-us/shield/shield-tv/), [Nintendo Switch](https://www.nintendo.com/switch/) | CPU, GPU, NNAPI | |
||||
|
||||
Please note that the list of devices mentioned is not exhaustive and may vary depending on the specific chipsets and device models. Always test your models on your target devices to ensure compatibility and optimal performance. |
||||
|
||||
Keep in mind that the choice of delegate can affect performance and model compatibility. For example, some models may not work with certain delegates, or a delegate may not be available on a specific device. As such, it's essential to test your model and the chosen delegate on your target devices for the best results. |
||||
|
||||
## Getting Started with the Ultralytics Android App |
||||
|
||||
To get started with the Ultralytics Android App, follow these steps: |
||||
|
||||
1. Download the Ultralytics App from the [Google Play Store](https://play.google.com/store/apps/details?id=com.ultralytics.ultralytics_app). |
||||
|
||||
2. Launch the app on your Android device and sign in with your Ultralytics account. If you don't have an account yet, create one [here](https://hub.ultralytics.com/). |
||||
|
||||
3. Once signed in, you will see a list of your trained YOLO models. Select a model to use for object detection. |
||||
|
||||
4. Grant the app permission to access your device's camera. |
||||
|
||||
5. Point your device's camera at objects you want to detect. The app will display bounding boxes and class labels in real-time as it detects objects. |
||||
|
||||
6. Explore the app's settings to adjust the detection threshold, enable or disable specific object classes, and more. |
||||
|
||||
With the Ultralytics Android App, you now have the power of real-time object detection using YOLO models right at your fingertips. Enjoy exploring the app's features and optimizing its settings to suit your specific use cases. |
@ -0,0 +1,56 @@ |
||||
--- |
||||
comments: true |
||||
description: Execute object detection in real-time on your iOS devices utilizing YOLO models. Leverage the power of the Apple Neural Engine and Core ML for fast and efficient object detection. |
||||
keywords: Ultralytics, iOS app, object detection, YOLO models, real time, Apple Neural Engine, Core ML, FP16, INT8, quantization |
||||
--- |
||||
|
||||
# Ultralytics iOS App: Real-time Object Detection with YOLO Models |
||||
|
||||
The Ultralytics iOS App is a powerful tool that allows you to run YOLO models directly on your iPhone or iPad for real-time object detection. This app utilizes the Apple Neural Engine and Core ML for model optimization and acceleration, enabling fast and efficient object detection. |
||||
|
||||
## Quantization and Acceleration |
||||
|
||||
To achieve real-time performance on your iOS device, YOLO models are quantized to either FP16 or INT8 precision. Quantization is a process that reduces the numerical precision of the model's weights and biases, thus reducing the model's size and the amount of computation required. This results in faster inference times without significantly affecting the model's accuracy. |
||||
|
||||
### FP16 Quantization |
||||
|
||||
FP16 (or half-precision) quantization converts the model's 32-bit floating-point numbers to 16-bit floating-point numbers. This reduces the model's size by half and speeds up the inference process, while maintaining a good balance between accuracy and performance. |
||||
|
||||
### INT8 Quantization |
||||
|
||||
INT8 (or 8-bit integer) quantization further reduces the model's size and computation requirements by converting its 32-bit floating-point numbers to 8-bit integers. This quantization method can result in a significant speedup, but it may lead to a slight reduction in accuracy. |
||||
|
||||
## Apple Neural Engine |
||||
|
||||
The Apple Neural Engine (ANE) is a dedicated hardware component integrated into Apple's A-series and M-series chips. It's designed to accelerate machine learning tasks, particularly for neural networks, allowing for faster and more efficient execution of your YOLO models. |
||||
|
||||
By combining quantized YOLO models with the Apple Neural Engine, the Ultralytics iOS App achieves real-time object detection on your iOS device without compromising on accuracy or performance. |
||||
|
||||
| Release Year | iPhone Name | Chipset Name | Node Size | ANE TOPs | |
||||
|--------------|------------------------------------------------------|-------------------------------------------------------|-----------|----------| |
||||
| 2017 | [iPhone X](https://en.wikipedia.org/wiki/IPhone_X) | [A11 Bionic](https://en.wikipedia.org/wiki/Apple_A11) | 10 nm | 0.6 | |
||||
| 2018 | [iPhone XS](https://en.wikipedia.org/wiki/IPhone_XS) | [A12 Bionic](https://en.wikipedia.org/wiki/Apple_A12) | 7 nm | 5 | |
||||
| 2019 | [iPhone 11](https://en.wikipedia.org/wiki/IPhone_11) | [A13 Bionic](https://en.wikipedia.org/wiki/Apple_A13) | 7 nm | 6 | |
||||
| 2020 | [iPhone 12](https://en.wikipedia.org/wiki/IPhone_12) | [A14 Bionic](https://en.wikipedia.org/wiki/Apple_A14) | 5 nm | 11 | |
||||
| 2021 | [iPhone 13](https://en.wikipedia.org/wiki/IPhone_13) | [A15 Bionic](https://en.wikipedia.org/wiki/Apple_A15) | 5 nm | 15.8 | |
||||
| 2022 | [iPhone 14](https://en.wikipedia.org/wiki/IPhone_14) | [A16 Bionic](https://en.wikipedia.org/wiki/Apple_A16) | 4 nm | 17.0 | |
||||
|
||||
Please note that this list only includes iPhone models from 2017 onwards, and the ANE TOPs values are approximate. |
||||
|
||||
## Getting Started with the Ultralytics iOS App |
||||
|
||||
To get started with the Ultralytics iOS App, follow these steps: |
||||
|
||||
1. Download the Ultralytics App from the [App Store](https://apps.apple.com/xk/app/ultralytics/id1583935240). |
||||
|
||||
2. Launch the app on your iOS device and sign in with your Ultralytics account. If you don't have an account yet, create one [here](https://hub.ultralytics.com/). |
||||
|
||||
3. Once signed in, you will see a list of your trained YOLO models. Select a model to use for object detection. |
||||
|
||||
4. Grant the app permission to access your device's camera. |
||||
|
||||
5. Point your device's camera at objects you want to detect. The app will display bounding boxes and class labels in real-time as it detects objects. |
||||
|
||||
6. Explore the app's settings to adjust the detection threshold, enable or disable specific object classes, and more. |
||||
|
||||
With the Ultralytics iOS App, you can now leverage the power of YOLO models for real-time object detection on your iPhone or iPad, powered by the Apple Neural Engine and optimized with FP16 or INT8 quantization. |
@ -0,0 +1,159 @@ |
||||
--- |
||||
comments: true |
||||
description: Learn how Ultralytics HUB datasets streamline your ML workflow. Upload, format, validate, access, share, edit or delete datasets for Ultralytics YOLO model training. |
||||
keywords: Ultralytics, HUB datasets, YOLO model training, upload datasets, dataset validation, ML workflow, share datasets |
||||
--- |
||||
|
||||
# HUB Datasets |
||||
|
||||
Ultralytics HUB datasets are a practical solution for managing and leveraging your custom datasets. |
||||
|
||||
Once uploaded, datasets can be immediately utilized for model training. This integrated approach facilitates a seamless transition from dataset management to model training, significantly simplifying the entire process. |
||||
|
||||
## Upload Dataset |
||||
|
||||
Ultralytics HUB datasets are just like YOLOv5 and YOLOv8 🚀 datasets. They use the same structure and the same label formats to keep |
||||
everything simple. |
||||
|
||||
Before you upload a dataset to Ultralytics HUB, make sure to **place your dataset YAML file inside the dataset root directory** and that **your dataset YAML, directory and ZIP have the same name**, as shown in the example below, and then zip the dataset directory. |
||||
|
||||
For example, if your dataset is called "coco8", as our [COCO8](https://docs.ultralytics.com/datasets/detect/coco8) example dataset, then you should have a `coco8.yaml` inside your `coco8/` directory, which will create a `coco8.zip` when zipped: |
||||
|
||||
```bash |
||||
zip -r coco8.zip coco8 |
||||
``` |
||||
|
||||
You can download our [COCO8](https://github.com/ultralytics/hub/blob/master/example_datasets/coco8.zip) example dataset and unzip it to see exactly how to structure your dataset. |
||||
|
||||
<p align="center"> |
||||
<img src="https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_upload_dataset_1.jpg" alt="COCO8 Dataset Structure" width="80%" /> |
||||
</p> |
||||
|
||||
The dataset YAML is the same standard YOLOv5 and YOLOv8 YAML format. |
||||
|
||||
!!! example "coco8.yaml" |
||||
|
||||
```yaml |
||||
--8<-- "ultralytics/cfg/datasets/coco8.yaml" |
||||
``` |
||||
|
||||
After zipping your dataset, you should validate it before uploading it to Ultralytics HUB. Ultralytics HUB conducts the dataset validation check post-upload, so by ensuring your dataset is correctly formatted and error-free ahead of time, you can forestall any setbacks due to dataset rejection. |
||||
|
||||
```py |
||||
from ultralytics.hub import check_dataset |
||||
check_dataset('path/to/coco8.zip') |
||||
``` |
||||
|
||||
Once your dataset ZIP is ready, navigate to the [Datasets](https://hub.ultralytics.com/datasets) page by clicking on the **Datasets** button in the sidebar. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also upload a dataset directly from the [Home](https://hub.ultralytics.com/home) page. |
||||
|
||||
 |
||||
|
||||
Click on the **Upload Dataset** button on the top right of the page. This action will trigger the **Upload Dataset** dialog. |
||||
|
||||
 |
||||
|
||||
Upload your dataset in the _Dataset .zip file_ field. |
||||
|
||||
You have the additional option to set a custom name and description for your Ultralytics HUB dataset. |
||||
|
||||
When you're happy with your dataset configuration, click **Upload**. |
||||
|
||||
 |
||||
|
||||
After your dataset is uploaded and processed, you will be able to access it from the Datasets page. |
||||
|
||||
 |
||||
|
||||
You can view the images in your dataset grouped by splits (Train, Validation, Test). |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
Each image can be enlarged for better visualization. |
||||
|
||||
 |
||||
|
||||
 |
||||
|
||||
Also, you can analyze your dataset by click on the **Overview** tab. |
||||
|
||||
 |
||||
|
||||
Next, [train a model](https://docs.ultralytics.com/hub/models/#train-model) on your dataset. |
||||
|
||||
 |
||||
|
||||
## Share Dataset |
||||
|
||||
!!! info "Info" |
||||
|
||||
Ultralytics HUB's sharing functionality provides a convenient way to share datasets with others. This feature is designed to accommodate both existing Ultralytics HUB users and those who have yet to create an account. |
||||
|
||||
??? note "Note" |
||||
|
||||
You have control over the general access of your datasets. |
||||
|
||||
You can choose to set the general access to "Private", in which case, only you will have access to it. Alternatively, you can set the general access to "Unlisted" which grants viewing access to anyone who has the direct link to the dataset, regardless of whether they have an Ultralytics HUB account or not. |
||||
|
||||
Navigate to the Dataset page of the dataset you want to share, open the dataset actions dropdown and click on the **Share** option. This action will trigger the **Share Dataset** dialog. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also share a dataset directly from the [Datasets](https://hub.ultralytics.com/datasets) page. |
||||
|
||||
 |
||||
|
||||
Set the general access to "Unlisted" and click **Save**. |
||||
|
||||
 |
||||
|
||||
Now, anyone who has the direct link to your dataset can view it. |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can easily click on the dataset's link shown in the **Share Dataset** dialog to copy it. |
||||
|
||||
 |
||||
|
||||
## Edit Dataset |
||||
|
||||
Navigate to the Dataset page of the dataset you want to edit, open the dataset actions dropdown and click on the **Edit** option. This action will trigger the **Update Dataset** dialog. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also edit a dataset directly from the [Datasets](https://hub.ultralytics.com/datasets) page. |
||||
|
||||
 |
||||
|
||||
Apply the desired modifications to your dataset and then confirm the changes by clicking **Save**. |
||||
|
||||
 |
||||
|
||||
## Delete Dataset |
||||
|
||||
Navigate to the Dataset page of the dataset you want to delete, open the dataset actions dropdown and click on the **Delete** option. This action will delete the dataset. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also delete a dataset directly from the [Datasets](https://hub.ultralytics.com/datasets) page. |
||||
|
||||
 |
||||
|
||||
??? note "Note" |
||||
|
||||
If you change your mind, you can restore the dataset from the [Trash](https://hub.ultralytics.com/trash) page. |
||||
|
||||
 |
@ -0,0 +1,42 @@ |
||||
--- |
||||
comments: true |
||||
description: Gain seamless experience in training and deploying your YOLOv5 and YOLOv8 models with Ultralytics HUB. Explore pre-trained models, templates and various integrations. |
||||
keywords: Ultralytics HUB, YOLOv5, YOLOv8, model training, model deployment, pretrained models, model integrations |
||||
--- |
||||
|
||||
# Ultralytics HUB |
||||
|
||||
<a href="https://bit.ly/ultralytics_hub" target="_blank"> |
||||
<img width="100%" src="https://github.com/ultralytics/assets/raw/main/im/ultralytics-hub.png"></a> |
||||
<br> |
||||
<br> |
||||
<div align="center"> |
||||
<a href="https://github.com/ultralytics/hub/actions/workflows/ci.yaml"> |
||||
<img src="https://github.com/ultralytics/hub/actions/workflows/ci.yaml/badge.svg" alt="CI CPU"></a> |
||||
<a href="https://colab.research.google.com/github/ultralytics/hub/blob/master/hub.ipynb"> |
||||
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> |
||||
</div> |
||||
<br> |
||||
|
||||
👋 Hello from the [Ultralytics](https://ultralytics.com/) Team! We've been working hard these last few months to |
||||
launch [Ultralytics HUB](https://bit.ly/ultralytics_hub), a new web tool for training and deploying all your YOLOv5 and YOLOv8 🚀 |
||||
models from one spot! |
||||
|
||||
## Introduction |
||||
|
||||
HUB is designed to be user-friendly and intuitive, with a drag-and-drop interface that allows users to |
||||
easily upload their data and train new models quickly. It offers a range of pre-trained models and |
||||
templates to choose from, making it easy for users to get started with training their own models. Once a model is |
||||
trained, it can be easily deployed and used for real-time object detection, instance segmentation and classification tasks. |
||||
|
||||
We hope that the resources here will help you get the most out of HUB. Please browse the HUB <a href="https://docs.ultralytics.com/hub">Docs</a> for details, raise an issue on <a href="https://github.com/ultralytics/hub/issues/new/choose">GitHub</a> for support, and join our <a href="https://ultralytics.com/discord">Discord</a> community for questions and discussions! |
||||
|
||||
- [**Quickstart**](./quickstart.md). Start training and deploying YOLO models with HUB in seconds. |
||||
- [**Datasets: Preparing and Uploading**](./datasets.md). Learn how to prepare and upload your datasets to HUB in YOLO format. |
||||
- [**Projects: Creating and Managing**](./projects.md). Group your models into projects for improved organization. |
||||
- [**Models: Training and Exporting**](./models.md). Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment. |
||||
- [**Integrations: Options**](./integrations.md). Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle. |
||||
- [**Ultralytics HUB App**](./app/index.md). Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device. |
||||
* [**iOS**](./app/ios.md). Learn about YOLO CoreML models accelerated on Apple's Neural Engine on iPhones and iPads. |
||||
* [**Android**](./app/android.md). Explore TFLite acceleration on mobile devices. |
||||
- [**Inference API**](./inference_api.md). Understand how to use the Inference API for running your trained models in the cloud to generate predictions. |
@ -0,0 +1,458 @@ |
||||
--- |
||||
comments: true |
||||
description: Access object detection capabilities of YOLOv8 via our RESTful API. Learn how to use the YOLO Inference API with Python or CLI for swift object detection. |
||||
keywords: Ultralytics, YOLOv8, Inference API, object detection, RESTful API, Python, CLI, Quickstart |
||||
--- |
||||
|
||||
# YOLO Inference API |
||||
|
||||
The YOLO Inference API allows you to access the YOLOv8 object detection capabilities via a RESTful API. This enables you to run object detection on images without the need to install and set up the YOLOv8 environment locally. |
||||
|
||||
 |
||||
Screenshot of the Inference API section in the trained model Preview tab. |
||||
|
||||
## API URL |
||||
|
||||
The API URL is the address used to access the YOLO Inference API. In this case, the base URL is: |
||||
|
||||
``` |
||||
https://api.ultralytics.com/v1/predict |
||||
``` |
||||
|
||||
## Example Usage in Python |
||||
|
||||
To access the YOLO Inference API with the specified model and API key using Python, you can use the following code: |
||||
|
||||
```python |
||||
import requests |
||||
|
||||
# API URL, use actual MODEL_ID |
||||
url = f"https://api.ultralytics.com/v1/predict/MODEL_ID" |
||||
|
||||
# Headers, use actual API_KEY |
||||
headers = {"x-api-key": "API_KEY"} |
||||
|
||||
# Inference arguments (optional) |
||||
data = {"size": 640, "confidence": 0.25, "iou": 0.45} |
||||
|
||||
# Load image and send request |
||||
with open("path/to/image.jpg", "rb") as image_file: |
||||
files = {"image": image_file} |
||||
response = requests.post(url, headers=headers, files=files, data=data) |
||||
|
||||
print(response.json()) |
||||
``` |
||||
|
||||
In this example, replace `API_KEY` with your actual API key, `MODEL_ID` with the desired model ID, and `path/to/image.jpg` with the path to the image you want to analyze. |
||||
|
||||
## Example Usage with CLI |
||||
|
||||
You can use the YOLO Inference API with the command-line interface (CLI) by utilizing the `curl` command. Replace `API_KEY` with your actual API key, `MODEL_ID` with the desired model ID, and `image.jpg` with the path to the image you want to analyze: |
||||
|
||||
```bash |
||||
curl -X POST "https://api.ultralytics.com/v1/predict/MODEL_ID" \ |
||||
-H "x-api-key: API_KEY" \ |
||||
-F "image=@/path/to/image.jpg" \ |
||||
-F "size=640" \ |
||||
-F "confidence=0.25" \ |
||||
-F "iou=0.45" |
||||
``` |
||||
|
||||
## Passing Arguments |
||||
|
||||
This command sends a POST request to the YOLO Inference API with the specified `MODEL_ID` in the URL and the `API_KEY` in the request `headers`, along with the image file specified by `@path/to/image.jpg`. |
||||
|
||||
Here's an example of passing the `size`, `confidence`, and `iou` arguments via the API URL using the `requests` library in Python: |
||||
|
||||
```python |
||||
import requests |
||||
|
||||
# API URL, use actual MODEL_ID |
||||
url = f"https://api.ultralytics.com/v1/predict/MODEL_ID" |
||||
|
||||
# Headers, use actual API_KEY |
||||
headers = {"x-api-key": "API_KEY"} |
||||
|
||||
# Inference arguments (optional) |
||||
data = {"size": 640, "confidence": 0.25, "iou": 0.45} |
||||
|
||||
# Load image and send request |
||||
with open("path/to/image.jpg", "rb") as image_file: |
||||
files = {"image": image_file} |
||||
response = requests.post(url, headers=headers, files=files, data=data) |
||||
|
||||
print(response.json()) |
||||
``` |
||||
|
||||
In this example, the `data` dictionary contains the query arguments `size`, `confidence`, and `iou`, which tells the API to run inference at image size 640 with confidence and IoU thresholds of 0.25 and 0.45. |
||||
|
||||
This will send the query parameters along with the file in the POST request. See the table below for a full list of available inference arguments. |
||||
|
||||
| Inference Argument | Default | Type | Notes | |
||||
|--------------------|---------|---------|------------------------------------------------| |
||||
| `size` | `640` | `int` | valid range is `32` - `1280` pixels | |
||||
| `confidence` | `0.25` | `float` | valid range is `0.01` - `1.0` | |
||||
| `iou` | `0.45` | `float` | valid range is `0.0` - `0.95` | |
||||
| `url` | `''` | `str` | optional image URL if not image file is passed | |
||||
| `normalize` | `False` | `bool` | | |
||||
|
||||
## Return JSON format |
||||
|
||||
The YOLO Inference API returns a JSON list with the detection results. The format of the JSON list will be the same as the one produced locally by the `results[0].tojson()` command. |
||||
|
||||
The JSON list contains information about the detected objects, their coordinates, classes, and confidence scores. |
||||
|
||||
### Detect Model Format |
||||
|
||||
YOLO detection models, such as `yolov8n.pt`, can return JSON responses from local inference, CLI API inference, and Python API inference. All of these methods produce the same JSON response format. |
||||
|
||||
!!! example "Detect Model JSON Response" |
||||
|
||||
=== "Local" |
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load model |
||||
model = YOLO('yolov8n.pt') |
||||
|
||||
# Run inference |
||||
results = model('image.jpg') |
||||
|
||||
# Print image.jpg results in JSON format |
||||
print(results[0].tojson()) |
||||
``` |
||||
|
||||
=== "CLI API" |
||||
```bash |
||||
curl -X POST "https://api.ultralytics.com/v1/predict/MODEL_ID" \ |
||||
-H "x-api-key: API_KEY" \ |
||||
-F "image=@/path/to/image.jpg" \ |
||||
-F "size=640" \ |
||||
-F "confidence=0.25" \ |
||||
-F "iou=0.45" |
||||
``` |
||||
|
||||
=== "Python API" |
||||
```python |
||||
import requests |
||||
|
||||
# API URL, use actual MODEL_ID |
||||
url = f"https://api.ultralytics.com/v1/predict/MODEL_ID" |
||||
|
||||
# Headers, use actual API_KEY |
||||
headers = {"x-api-key": "API_KEY"} |
||||
|
||||
# Inference arguments (optional) |
||||
data = {"size": 640, "confidence": 0.25, "iou": 0.45} |
||||
|
||||
# Load image and send request |
||||
with open("path/to/image.jpg", "rb") as image_file: |
||||
files = {"image": image_file} |
||||
response = requests.post(url, headers=headers, files=files, data=data) |
||||
|
||||
print(response.json()) |
||||
``` |
||||
|
||||
=== "JSON Response" |
||||
```json |
||||
{ |
||||
"success": True, |
||||
"message": "Inference complete.", |
||||
"data": [ |
||||
{ |
||||
"name": "person", |
||||
"class": 0, |
||||
"confidence": 0.8359682559967041, |
||||
"box": { |
||||
"x1": 0.08974208831787109, |
||||
"y1": 0.27418340047200523, |
||||
"x2": 0.8706787109375, |
||||
"y2": 0.9887352837456598 |
||||
} |
||||
}, |
||||
{ |
||||
"name": "person", |
||||
"class": 0, |
||||
"confidence": 0.8189555406570435, |
||||
"box": { |
||||
"x1": 0.5847355842590332, |
||||
"y1": 0.05813225640190972, |
||||
"x2": 0.8930277824401855, |
||||
"y2": 0.9903111775716146 |
||||
} |
||||
}, |
||||
{ |
||||
"name": "tie", |
||||
"class": 27, |
||||
"confidence": 0.2909725308418274, |
||||
"box": { |
||||
"x1": 0.3433395862579346, |
||||
"y1": 0.6070465511745877, |
||||
"x2": 0.40964522361755373, |
||||
"y2": 0.9849439832899306 |
||||
} |
||||
} |
||||
] |
||||
} |
||||
``` |
||||
|
||||
### Segment Model Format |
||||
|
||||
YOLO segmentation models, such as `yolov8n-seg.pt`, can return JSON responses from local inference, CLI API inference, and Python API inference. All of these methods produce the same JSON response format. |
||||
|
||||
!!! example "Segment Model JSON Response" |
||||
|
||||
=== "Local" |
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load model |
||||
model = YOLO('yolov8n-seg.pt') |
||||
|
||||
# Run inference |
||||
results = model('image.jpg') |
||||
|
||||
# Print image.jpg results in JSON format |
||||
print(results[0].tojson()) |
||||
``` |
||||
|
||||
=== "CLI API" |
||||
```bash |
||||
curl -X POST "https://api.ultralytics.com/v1/predict/MODEL_ID" \ |
||||
-H "x-api-key: API_KEY" \ |
||||
-F "image=@/path/to/image.jpg" \ |
||||
-F "size=640" \ |
||||
-F "confidence=0.25" \ |
||||
-F "iou=0.45" |
||||
``` |
||||
|
||||
=== "Python API" |
||||
```python |
||||
import requests |
||||
|
||||
# API URL, use actual MODEL_ID |
||||
url = f"https://api.ultralytics.com/v1/predict/MODEL_ID" |
||||
|
||||
# Headers, use actual API_KEY |
||||
headers = {"x-api-key": "API_KEY"} |
||||
|
||||
# Inference arguments (optional) |
||||
data = {"size": 640, "confidence": 0.25, "iou": 0.45} |
||||
|
||||
# Load image and send request |
||||
with open("path/to/image.jpg", "rb") as image_file: |
||||
files = {"image": image_file} |
||||
response = requests.post(url, headers=headers, files=files, data=data) |
||||
|
||||
print(response.json()) |
||||
``` |
||||
|
||||
=== "JSON Response" |
||||
Note `segments` `x` and `y` lengths may vary from one object to another. Larger or more complex objects may have more segment points. |
||||
```json |
||||
{ |
||||
"success": True, |
||||
"message": "Inference complete.", |
||||
"data": [ |
||||
{ |
||||
"name": "person", |
||||
"class": 0, |
||||
"confidence": 0.856913149356842, |
||||
"box": { |
||||
"x1": 0.1064866065979004, |
||||
"y1": 0.2798851860894097, |
||||
"x2": 0.8738358497619629, |
||||
"y2": 0.9894873725043403 |
||||
}, |
||||
"segments": { |
||||
"x": [ |
||||
0.421875, |
||||
0.4203124940395355, |
||||
0.41718751192092896 |
||||
... |
||||
], |
||||
"y": [ |
||||
0.2888889014720917, |
||||
0.2916666567325592, |
||||
0.2916666567325592 |
||||
... |
||||
] |
||||
} |
||||
}, |
||||
{ |
||||
"name": "person", |
||||
"class": 0, |
||||
"confidence": 0.8512625694274902, |
||||
"box": { |
||||
"x1": 0.5757311820983887, |
||||
"y1": 0.053943040635850696, |
||||
"x2": 0.8960096359252929, |
||||
"y2": 0.985154045952691 |
||||
}, |
||||
"segments": { |
||||
"x": [ |
||||
0.7515624761581421, |
||||
0.75, |
||||
0.7437499761581421 |
||||
... |
||||
], |
||||
"y": [ |
||||
0.0555555559694767, |
||||
0.05833333358168602, |
||||
0.05833333358168602 |
||||
... |
||||
] |
||||
} |
||||
}, |
||||
{ |
||||
"name": "tie", |
||||
"class": 27, |
||||
"confidence": 0.6485961675643921, |
||||
"box": { |
||||
"x1": 0.33911995887756347, |
||||
"y1": 0.6057066175672743, |
||||
"x2": 0.4081430912017822, |
||||
"y2": 0.9916408962673611 |
||||
}, |
||||
"segments": { |
||||
"x": [ |
||||
0.37187498807907104, |
||||
0.37031251192092896, |
||||
0.3687500059604645 |
||||
... |
||||
], |
||||
"y": [ |
||||
0.6111111044883728, |
||||
0.6138888597488403, |
||||
0.6138888597488403 |
||||
... |
||||
] |
||||
} |
||||
} |
||||
] |
||||
} |
||||
``` |
||||
|
||||
### Pose Model Format |
||||
|
||||
YOLO pose models, such as `yolov8n-pose.pt`, can return JSON responses from local inference, CLI API inference, and Python API inference. All of these methods produce the same JSON response format. |
||||
|
||||
!!! example "Pose Model JSON Response" |
||||
|
||||
=== "Local" |
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load model |
||||
model = YOLO('yolov8n-seg.pt') |
||||
|
||||
# Run inference |
||||
results = model('image.jpg') |
||||
|
||||
# Print image.jpg results in JSON format |
||||
print(results[0].tojson()) |
||||
``` |
||||
|
||||
=== "CLI API" |
||||
```bash |
||||
curl -X POST "https://api.ultralytics.com/v1/predict/MODEL_ID" \ |
||||
-H "x-api-key: API_KEY" \ |
||||
-F "image=@/path/to/image.jpg" \ |
||||
-F "size=640" \ |
||||
-F "confidence=0.25" \ |
||||
-F "iou=0.45" |
||||
``` |
||||
|
||||
=== "Python API" |
||||
```python |
||||
import requests |
||||
|
||||
# API URL, use actual MODEL_ID |
||||
url = f"https://api.ultralytics.com/v1/predict/MODEL_ID" |
||||
|
||||
# Headers, use actual API_KEY |
||||
headers = {"x-api-key": "API_KEY"} |
||||
|
||||
# Inference arguments (optional) |
||||
data = {"size": 640, "confidence": 0.25, "iou": 0.45} |
||||
|
||||
# Load image and send request |
||||
with open("path/to/image.jpg", "rb") as image_file: |
||||
files = {"image": image_file} |
||||
response = requests.post(url, headers=headers, files=files, data=data) |
||||
|
||||
print(response.json()) |
||||
``` |
||||
|
||||
=== "JSON Response" |
||||
Note COCO-keypoints pretrained models will have 17 human keypoints. The `visible` part of the keypoints indicates whether a keypoint is visible or obscured. Obscured keypoints may be outside the image or may not be visible, i.e. a person's eyes facing away from the camera. |
||||
```json |
||||
{ |
||||
"success": True, |
||||
"message": "Inference complete.", |
||||
"data": [ |
||||
{ |
||||
"name": "person", |
||||
"class": 0, |
||||
"confidence": 0.8439509868621826, |
||||
"box": { |
||||
"x1": 0.1125, |
||||
"y1": 0.28194444444444444, |
||||
"x2": 0.7953125, |
||||
"y2": 0.9902777777777778 |
||||
}, |
||||
"keypoints": { |
||||
"x": [ |
||||
0.5058594942092896, |
||||
0.5103894472122192, |
||||
0.4920862317085266 |
||||
... |
||||
], |
||||
"y": [ |
||||
0.48964157700538635, |
||||
0.4643048942089081, |
||||
0.4465252459049225 |
||||
... |
||||
], |
||||
"visible": [ |
||||
0.8726999163627625, |
||||
0.653947651386261, |
||||
0.9130823612213135 |
||||
... |
||||
] |
||||
} |
||||
}, |
||||
{ |
||||
"name": "person", |
||||
"class": 0, |
||||
"confidence": 0.7474289536476135, |
||||
"box": { |
||||
"x1": 0.58125, |
||||
"y1": 0.0625, |
||||
"x2": 0.8859375, |
||||
"y2": 0.9888888888888889 |
||||
}, |
||||
"keypoints": { |
||||
"x": [ |
||||
0.778544008731842, |
||||
0.7976160049438477, |
||||
0.7530890107154846 |
||||
... |
||||
], |
||||
"y": [ |
||||
0.27595141530036926, |
||||
0.2378823608160019, |
||||
0.23644638061523438 |
||||
... |
||||
], |
||||
"visible": [ |
||||
0.8900790810585022, |
||||
0.789978563785553, |
||||
0.8974530100822449 |
||||
... |
||||
] |
||||
} |
||||
} |
||||
] |
||||
} |
||||
``` |
@ -0,0 +1,213 @@ |
||||
--- |
||||
comments: true |
||||
description: Learn how to use Ultralytics HUB models for efficient and user-friendly AI model training. For easy model creation, training, evaluation and deployment, follow our detailed guide. |
||||
keywords: Ultralytics, HUB Models, AI model training, model creation, model training, model evaluation, model deployment |
||||
--- |
||||
|
||||
# Ultralytics HUB Models |
||||
|
||||
Ultralytics HUB models provide a streamlined solution for training vision AI models on your custom datasets. |
||||
|
||||
The process is user-friendly and efficient, involving a simple three-step creation and accelerated training powered by Utralytics YOLOv8. During training, real-time updates on model metrics are available so that you can monitor each step of the progress. Once training is completed, you can preview your model and easily deploy it to real-world applications. Therefore, Ultralytics HUB offers a comprehensive yet straightforward system for model creation, training, evaluation, and deployment. |
||||
|
||||
## Train Model |
||||
|
||||
Navigate to the [Models](https://hub.ultralytics.com/models) page by clicking on the **Models** button in the sidebar. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also train a model directly from the [Home](https://hub.ultralytics.com/home) page. |
||||
|
||||
 |
||||
|
||||
Click on the **Train Model** button on the top right of the page. This action will trigger the **Train Model** dialog. |
||||
|
||||
 |
||||
|
||||
The **Train Model** dialog has three simple steps, explained below. |
||||
|
||||
### 1. Dataset |
||||
|
||||
In this step, you have to select the dataset you want to train your model on. After you selected a dataset, click **Continue**. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can skip this step if you train a model directly from the Dataset page. |
||||
|
||||
 |
||||
|
||||
### 2. Model |
||||
|
||||
In this step, you have to choose the project in which you want to create your model, the name of your model and your model's architecture. |
||||
|
||||
??? note "Note" |
||||
|
||||
Ultralytics HUB will try to pre-select the project. |
||||
|
||||
If you opened the **Train Model** dialog as described above, Ultralytics HUB will pre-select the last project you used. |
||||
|
||||
If you opened the **Train Model** dialog from the Project page, Ultralytics HUB will pre-select the project you were inside of. |
||||
|
||||
 |
||||
|
||||
In case you don't have a project created yet, you can set the name of your project in this step and it will be created together with your model. |
||||
|
||||
 |
||||
|
||||
!!! info "Info" |
||||
|
||||
You can read more about the available [YOLOv8](https://docs.ultralytics.com/models/yolov8) (and [YOLOv5](https://docs.ultralytics.com/models/yolov5)) architectures in our documentation. |
||||
|
||||
When you're happy with your model configuration, click **Continue**. |
||||
|
||||
 |
||||
|
||||
??? note "Note" |
||||
|
||||
By default, your model will use a pre-trained model (trained on the [COCO](https://docs.ultralytics.com/datasets/detect/coco) dataset) to reduce training time. |
||||
|
||||
You can change this behaviour by opening the **Advanced Options** accordion. |
||||
|
||||
### 3. Train |
||||
|
||||
In this step, you will start training you model. |
||||
|
||||
Ultralytics HUB offers three training options: |
||||
|
||||
- Ultralytics Cloud **(COMING SOON)** |
||||
- Google Colab |
||||
- Bring your own agent |
||||
|
||||
In order to start training your model, follow the instructions presented in this step. |
||||
|
||||
 |
||||
|
||||
??? note "Note" |
||||
|
||||
When you are on this step, before the training starts, you can change the default training configuration by opening the **Advanced Options** accordion. |
||||
|
||||
 |
||||
|
||||
??? note "Note" |
||||
|
||||
When you are on this step, you have the option to close the **Train Model** dialog and start training your model from the Model page later. |
||||
|
||||
 |
||||
|
||||
To start training your model using Google Colab, simply follow the instructions shown above or on the Google Colab notebook. |
||||
|
||||
<a href="https://colab.research.google.com/github/ultralytics/hub/blob/master/hub.ipynb" target="_blank"> |
||||
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"> |
||||
</a> |
||||
|
||||
When the training starts, you can click **Done** and monitor the training progress on the Model page. |
||||
|
||||
 |
||||
|
||||
 |
||||
|
||||
??? note "Note" |
||||
|
||||
In case the training stops and a checkpoint was saved, you can resume training your model from the Model page. |
||||
|
||||
 |
||||
|
||||
## Preview Model |
||||
|
||||
Ultralytics HUB offers a variety of ways to preview your trained model. |
||||
|
||||
You can preview your model if you click on the **Preview** tab and upload an image in the **Test** card. |
||||
|
||||
 |
||||
|
||||
You can also use our Ultralytics Cloud API to effortlessly [run inference](https://docs.ultralytics.com/hub/inference_api) with your custom model. |
||||
|
||||
 |
||||
|
||||
Furthermore, you can preview your model in real-time directly on your [iOS](https://apps.apple.com/xk/app/ultralytics/id1583935240) or [Android](https://play.google.com/store/apps/details?id=com.ultralytics.ultralytics_app) mobile device by [downloading](https://ultralytics.com/app_install) our [Ultralytics HUB Mobile Application](./app/index.md). |
||||
|
||||
 |
||||
|
||||
## Deploy Model |
||||
|
||||
You can export your model to 13 different formats, including ONNX, OpenVINO, CoreML, TensorFlow, Paddle and many others. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can customize the export options of each format if you open the export actions dropdown and click on the **Advanced** option. |
||||
|
||||
 |
||||
|
||||
## Share Model |
||||
|
||||
!!! info "Info" |
||||
|
||||
Ultralytics HUB's sharing functionality provides a convenient way to share models with others. This feature is designed to accommodate both existing Ultralytics HUB users and those who have yet to create an account. |
||||
|
||||
??? note "Note" |
||||
|
||||
You have control over the general access of your models. |
||||
|
||||
You can choose to set the general access to "Private", in which case, only you will have access to it. Alternatively, you can set the general access to "Unlisted" which grants viewing access to anyone who has the direct link to the model, regardless of whether they have an Ultralytics HUB account or not. |
||||
|
||||
Navigate to the Model page of the model you want to share, open the model actions dropdown and click on the **Share** option. This action will trigger the **Share Model** dialog. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also share a model directly from the [Models](https://hub.ultralytics.com/models) page or from the Project page of the project where your model is located. |
||||
|
||||
 |
||||
|
||||
Set the general access to "Unlisted" and click **Save**. |
||||
|
||||
 |
||||
|
||||
Now, anyone who has the direct link to your model can view it. |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can easily click on the models's link shown in the **Share Model** dialog to copy it. |
||||
|
||||
 |
||||
|
||||
## Edit Model |
||||
|
||||
Navigate to the Model page of the model you want to edit, open the model actions dropdown and click on the **Edit** option. This action will trigger the **Update Model** dialog. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also edit a model directly from the [Models](https://hub.ultralytics.com/models) page or from the Project page of the project where your model is located. |
||||
|
||||
 |
||||
|
||||
Apply the desired modifications to your model and then confirm the changes by clicking **Save**. |
||||
|
||||
 |
||||
|
||||
## Delete Model |
||||
|
||||
Navigate to the Model page of the model you want to delete, open the model actions dropdown and click on the **Delete** option. This action will delete the model. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also delete a model directly from the [Models](https://hub.ultralytics.com/models) page or from the Project page of the project where your model is located. |
||||
|
||||
 |
||||
|
||||
??? note "Note" |
||||
|
||||
If you change your mind, you can restore the model from the [Trash](https://hub.ultralytics.com/trash) page. |
||||
|
||||
 |
@ -0,0 +1,169 @@ |
||||
--- |
||||
comments: true |
||||
description: Learn how to manage Ultralytics HUB projects. Understand effective strategies to create, share, edit, delete, and compare models in an organized workspace. |
||||
keywords: Ultralytics, HUB projects, Create project, Edit project, Share project, Delete project, Compare Models, Model Management |
||||
--- |
||||
|
||||
# Ultralytics HUB Projects |
||||
|
||||
Ultralytics HUB projects provide an effective solution for consolidating and managing your models. If you are working with several models that perform similar tasks or have related purposes, Ultralytics HUB projects allow you to group these models together. |
||||
|
||||
This creates a unified and organized workspace that facilitates easier model management, comparison and development. Having similar models or various iterations together can facilitate rapid benchmarking, as you can compare their effectiveness. This can lead to faster, more insightful iterative development and refinement of your models. |
||||
|
||||
## Create Project |
||||
|
||||
Navigate to the [Projects](https://hub.ultralytics.com/projects) page by clicking on the **Projects** button in the sidebar. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also create a project directly from the [Home](https://hub.ultralytics.com/home) page. |
||||
|
||||
 |
||||
|
||||
Click on the **Create Project** button on the top right of the page. This action will trigger the **Create Project** dialog, opening up a suite of options for tailoring your project to your needs. |
||||
|
||||
 |
||||
|
||||
Type the name of your project in the _Project name_ field or keep the default name and finalize the project creation with a single click. |
||||
|
||||
You have the additional option to enrich your project with a description and a unique image, enhancing its recognizability on the Projects page. |
||||
|
||||
When you're happy with your project configuration, click **Create**. |
||||
|
||||
 |
||||
|
||||
After your project is created, you will be able to access it from the Projects page. |
||||
|
||||
 |
||||
|
||||
Next, [train a model](https://docs.ultralytics.com/hub/models/#train-model) inside your project. |
||||
|
||||
 |
||||
|
||||
## Share Project |
||||
|
||||
!!! info "Info" |
||||
|
||||
Ultralytics HUB's sharing functionality provides a convenient way to share projects with others. This feature is designed to accommodate both existing Ultralytics HUB users and those who have yet to create an account. |
||||
|
||||
??? note "Note" |
||||
|
||||
You have control over the general access of your projects. |
||||
|
||||
You can choose to set the general access to "Private", in which case, only you will have access to it. Alternatively, you can set the general access to "Unlisted" which grants viewing access to anyone who has the direct link to the project, regardless of whether they have an Ultralytics HUB account or not. |
||||
|
||||
Navigate to the Project page of the project you want to share, open the project actions dropdown and click on the **Share** option. This action will trigger the **Share Project** dialog. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also share a project directly from the [Projects](https://hub.ultralytics.com/projects) page. |
||||
|
||||
 |
||||
|
||||
Set the general access to "Unlisted" and click **Save**. |
||||
|
||||
 |
||||
|
||||
!!! warning "Warning" |
||||
|
||||
When changing the general access of a project, the general access of the models inside the project will be changed as well. |
||||
|
||||
Now, anyone who has the direct link to your project can view it. |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can easily click on the project's link shown in the **Share Project** dialog to copy it. |
||||
|
||||
 |
||||
|
||||
## Edit Project |
||||
|
||||
Navigate to the Project page of the project you want to edit, open the project actions dropdown and click on the **Edit** option. This action will trigger the **Update Project** dialog. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also edit a project directly from the [Projects](https://hub.ultralytics.com/projects) page. |
||||
|
||||
 |
||||
|
||||
Apply the desired modifications to your project and then confirm the changes by clicking **Save**. |
||||
|
||||
 |
||||
|
||||
## Delete Project |
||||
|
||||
Navigate to the Project page of the project you want to delete, open the project actions dropdown and click on the **Delete** option. This action will delete the project. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also delete a project directly from the [Projects](https://hub.ultralytics.com/projects) page. |
||||
|
||||
 |
||||
|
||||
!!! warning "Warning" |
||||
|
||||
When deleting a project, the the models inside the project will be deleted as well. |
||||
|
||||
??? note "Note" |
||||
|
||||
If you change your mind, you can restore the project from the [Trash](https://hub.ultralytics.com/trash) page. |
||||
|
||||
 |
||||
|
||||
## Compare Models |
||||
|
||||
Navigate to the Project page of the project where the models you want to compare are located. To use the model comparison feature, click on the **Charts** tab. |
||||
|
||||
 |
||||
|
||||
This will display all the relevant charts. Each chart corresponds to a different metric and contains the performance of each model for that metric. The models are represented by different colors and you can hover over each data point to get more information. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
Each chart can be enlarged for better visualization. |
||||
|
||||
 |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You have the flexibility to customize your view by selectively hiding certain models. This feature allows you to concentrate on the models of interest. |
||||
|
||||
 |
||||
|
||||
## Reorder Models |
||||
|
||||
??? note "Note" |
||||
|
||||
Ultralytics HUB's reordering functionality works only inside projects you own. |
||||
|
||||
Navigate to the Project page of the project where the models you want to reorder are located. Click on the designated reorder icon of the model you want to move and drag it to the desired location. |
||||
|
||||
 |
||||
|
||||
## Transfer Models |
||||
|
||||
Navigate to the Project page of the project where the model you want to mode is located, open the project actions dropdown and click on the **Transfer** option. This action will trigger the **Transfer Model** dialog. |
||||
|
||||
 |
||||
|
||||
??? tip "Tip" |
||||
|
||||
You can also transfer a model directly from the [Models](https://hub.ultralytics.com/models) page. |
||||
|
||||
 |
||||
|
||||
Select the project you want to transfer the model to and click **Save**. |
||||
|
||||
 |
@ -1,429 +0,0 @@ |
||||
--- |
||||
comments: true |
||||
--- |
||||
|
||||
# YOLO Inference API (UNDER CONSTRUCTION) |
||||
|
||||
The YOLO Inference API allows you to access the YOLOv8 object detection capabilities via a RESTful API. This enables you to run object detection on images without the need to install and set up the YOLOv8 environment locally. |
||||
|
||||
## API URL |
||||
|
||||
The API URL is the address used to access the YOLO Inference API. In this case, the base URL is: |
||||
|
||||
``` |
||||
https://api.ultralytics.com/inference/v1 |
||||
``` |
||||
|
||||
To access the API with a specific model and your API key, you can include them as query parameters in the API URL. The `model` parameter refers to the `MODEL_ID` you want to use for inference, and the `key` parameter corresponds to your `API_KEY`. |
||||
|
||||
The complete API URL with the model and API key parameters would be: |
||||
|
||||
``` |
||||
https://api.ultralytics.com/inference/v1?model=MODEL_ID&key=API_KEY |
||||
``` |
||||
|
||||
Replace `MODEL_ID` with the ID of the model you want to use and `API_KEY` with your actual API key from [https://hub.ultralytics.com/settings?tab=api+keys](https://hub.ultralytics.com/settings?tab=api+keys). |
||||
|
||||
## Example Usage in Python |
||||
|
||||
To access the YOLO Inference API with the specified model and API key using Python, you can use the following code: |
||||
|
||||
```python |
||||
import requests |
||||
|
||||
api_key = "API_KEY" |
||||
model_id = "MODEL_ID" |
||||
url = f"https://api.ultralytics.com/inference/v1?model={model_id}&key={api_key}" |
||||
image_path = "image.jpg" |
||||
|
||||
with open(image_path, "rb") as image_file: |
||||
files = {"image": image_file} |
||||
response = requests.post(url, files=files) |
||||
|
||||
print(response.text) |
||||
``` |
||||
|
||||
In this example, replace `API_KEY` with your actual API key, `MODEL_ID` with the desired model ID, and `image.jpg` with the path to the image you want to analyze. |
||||
|
||||
|
||||
## Example Usage with CLI |
||||
|
||||
You can use the YOLO Inference API with the command-line interface (CLI) by utilizing the `curl` command. Replace `API_KEY` with your actual API key, `MODEL_ID` with the desired model ID, and `image.jpg` with the path to the image you want to analyze: |
||||
|
||||
```commandline |
||||
curl -X POST -F image=@image.jpg "https://api.ultralytics.com/inference/v1?model=MODEL_ID&key=API_KEY" |
||||
``` |
||||
|
||||
## Passing Arguments |
||||
|
||||
This command sends a POST request to the YOLO Inference API with the specified `model` and `key` parameters in the URL, along with the image file specified by `@image.jpg`. |
||||
|
||||
Here's an example of passing the `model`, `key`, and `normalize` arguments via the API URL using the `requests` library in Python: |
||||
|
||||
```python |
||||
import requests |
||||
|
||||
api_key = "API_KEY" |
||||
model_id = "MODEL_ID" |
||||
url = "https://api.ultralytics.com/inference/v1" |
||||
|
||||
# Define your query parameters |
||||
params = { |
||||
"key": api_key, |
||||
"model": model_id, |
||||
"normalize": "True" |
||||
} |
||||
|
||||
image_path = "image.jpg" |
||||
|
||||
with open(image_path, "rb") as image_file: |
||||
files = {"image": image_file} |
||||
response = requests.post(url, files=files, params=params) |
||||
|
||||
print(response.text) |
||||
``` |
||||
|
||||
In this example, the `params` dictionary contains the query parameters `key`, `model`, and `normalize`, which tells the API to return all values in normalized image coordinates from 0 to 1. The `normalize` parameter is set to `"True"` as a string since query parameters should be passed as strings. These query parameters are then passed to the `requests.post()` function. |
||||
|
||||
This will send the query parameters along with the file in the POST request. Make sure to consult the API documentation for the list of available arguments and their expected values. |
||||
|
||||
## Return JSON format |
||||
|
||||
The YOLO Inference API returns a JSON list with the detection results. The format of the JSON list will be the same as the one produced locally by the `results[0].tojson()` command. |
||||
|
||||
The JSON list contains information about the detected objects, their coordinates, classes, and confidence scores. |
||||
|
||||
### Detect Model Format |
||||
|
||||
YOLO detection models, such as `yolov8n.pt`, can return JSON responses from local inference, CLI API inference, and Python API inference. All of these methods produce the same JSON response format. |
||||
|
||||
!!! example "Detect Model JSON Response" |
||||
|
||||
=== "Local" |
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load model |
||||
model = YOLO('yolov8n.pt') |
||||
|
||||
# Run inference |
||||
results = model('image.jpg') |
||||
|
||||
# Print image.jpg results in JSON format |
||||
print(results[0].tojson()) |
||||
``` |
||||
|
||||
=== "CLI API" |
||||
```commandline |
||||
curl -X POST -F image=@image.jpg https://api.ultralytics.com/inference/v1?model=MODEL_ID,key=API_KEY |
||||
``` |
||||
|
||||
=== "Python API" |
||||
```python |
||||
import requests |
||||
|
||||
api_key = "API_KEY" |
||||
model_id = "MODEL_ID" |
||||
url = "https://api.ultralytics.com/inference/v1" |
||||
|
||||
# Define your query parameters |
||||
params = { |
||||
"key": api_key, |
||||
"model": model_id, |
||||
} |
||||
|
||||
image_path = "image.jpg" |
||||
|
||||
with open(image_path, "rb") as image_file: |
||||
files = {"image": image_file} |
||||
response = requests.post(url, files=files, params=params) |
||||
|
||||
print(response.text) |
||||
``` |
||||
|
||||
=== "JSON Response" |
||||
```json |
||||
[ |
||||
{ |
||||
"name": "person", |
||||
"class": 0, |
||||
"confidence": 0.8359682559967041, |
||||
"box": { |
||||
"x1": 0.08974208831787109, |
||||
"y1": 0.27418340047200523, |
||||
"x2": 0.8706787109375, |
||||
"y2": 0.9887352837456598 |
||||
} |
||||
}, |
||||
{ |
||||
"name": "person", |
||||
"class": 0, |
||||
"confidence": 0.8189555406570435, |
||||
"box": { |
||||
"x1": 0.5847355842590332, |
||||
"y1": 0.05813225640190972, |
||||
"x2": 0.8930277824401855, |
||||
"y2": 0.9903111775716146 |
||||
} |
||||
}, |
||||
{ |
||||
"name": "tie", |
||||
"class": 27, |
||||
"confidence": 0.2909725308418274, |
||||
"box": { |
||||
"x1": 0.3433395862579346, |
||||
"y1": 0.6070465511745877, |
||||
"x2": 0.40964522361755373, |
||||
"y2": 0.9849439832899306 |
||||
} |
||||
} |
||||
] |
||||
``` |
||||
|
||||
### Segment Model Format |
||||
|
||||
YOLO segmentation models, such as `yolov8n-seg.pt`, can return JSON responses from local inference, CLI API inference, and Python API inference. All of these methods produce the same JSON response format. |
||||
|
||||
!!! example "Segment Model JSON Response" |
||||
|
||||
=== "Local" |
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load model |
||||
model = YOLO('yolov8n-seg.pt') |
||||
|
||||
# Run inference |
||||
results = model('image.jpg') |
||||
|
||||
# Print image.jpg results in JSON format |
||||
print(results[0].tojson()) |
||||
``` |
||||
|
||||
=== "CLI API" |
||||
```commandline |
||||
curl -X POST -F image=@image.jpg https://api.ultralytics.com/inference/v1?model=MODEL_ID,key=API_KEY |
||||
``` |
||||
|
||||
=== "Python API" |
||||
```python |
||||
import requests |
||||
|
||||
api_key = "API_KEY" |
||||
model_id = "MODEL_ID" |
||||
url = "https://api.ultralytics.com/inference/v1" |
||||
|
||||
# Define your query parameters |
||||
params = { |
||||
"key": api_key, |
||||
"model": model_id, |
||||
} |
||||
|
||||
image_path = "image.jpg" |
||||
|
||||
with open(image_path, "rb") as image_file: |
||||
files = {"image": image_file} |
||||
response = requests.post(url, files=files, params=params) |
||||
|
||||
print(response.text) |
||||
``` |
||||
|
||||
=== "JSON Response" |
||||
Note `segments` `x` and `y` lengths may vary from one object to another. Larger or more complex objects may have more segment points. |
||||
```json |
||||
[ |
||||
{ |
||||
"name": "person", |
||||
"class": 0, |
||||
"confidence": 0.856913149356842, |
||||
"box": { |
||||
"x1": 0.1064866065979004, |
||||
"y1": 0.2798851860894097, |
||||
"x2": 0.8738358497619629, |
||||
"y2": 0.9894873725043403 |
||||
}, |
||||
"segments": { |
||||
"x": [ |
||||
0.421875, |
||||
0.4203124940395355, |
||||
0.41718751192092896 |
||||
... |
||||
], |
||||
"y": [ |
||||
0.2888889014720917, |
||||
0.2916666567325592, |
||||
0.2916666567325592 |
||||
... |
||||
] |
||||
} |
||||
}, |
||||
{ |
||||
"name": "person", |
||||
"class": 0, |
||||
"confidence": 0.8512625694274902, |
||||
"box": { |
||||
"x1": 0.5757311820983887, |
||||
"y1": 0.053943040635850696, |
||||
"x2": 0.8960096359252929, |
||||
"y2": 0.985154045952691 |
||||
}, |
||||
"segments": { |
||||
"x": [ |
||||
0.7515624761581421, |
||||
0.75, |
||||
0.7437499761581421 |
||||
... |
||||
], |
||||
"y": [ |
||||
0.0555555559694767, |
||||
0.05833333358168602, |
||||
0.05833333358168602 |
||||
... |
||||
] |
||||
} |
||||
}, |
||||
{ |
||||
"name": "tie", |
||||
"class": 27, |
||||
"confidence": 0.6485961675643921, |
||||
"box": { |
||||
"x1": 0.33911995887756347, |
||||
"y1": 0.6057066175672743, |
||||
"x2": 0.4081430912017822, |
||||
"y2": 0.9916408962673611 |
||||
}, |
||||
"segments": { |
||||
"x": [ |
||||
0.37187498807907104, |
||||
0.37031251192092896, |
||||
0.3687500059604645 |
||||
... |
||||
], |
||||
"y": [ |
||||
0.6111111044883728, |
||||
0.6138888597488403, |
||||
0.6138888597488403 |
||||
... |
||||
] |
||||
} |
||||
} |
||||
] |
||||
``` |
||||
|
||||
|
||||
### Pose Model Format |
||||
|
||||
YOLO pose models, such as `yolov8n-pose.pt`, can return JSON responses from local inference, CLI API inference, and Python API inference. All of these methods produce the same JSON response format. |
||||
|
||||
!!! example "Pose Model JSON Response" |
||||
|
||||
=== "Local" |
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load model |
||||
model = YOLO('yolov8n-seg.pt') |
||||
|
||||
# Run inference |
||||
results = model('image.jpg') |
||||
|
||||
# Print image.jpg results in JSON format |
||||
print(results[0].tojson()) |
||||
``` |
||||
|
||||
=== "CLI API" |
||||
```commandline |
||||
curl -X POST -F image=@image.jpg https://api.ultralytics.com/inference/v1?model=MODEL_ID,key=API_KEY |
||||
``` |
||||
|
||||
=== "Python API" |
||||
```python |
||||
import requests |
||||
|
||||
api_key = "API_KEY" |
||||
model_id = "MODEL_ID" |
||||
url = "https://api.ultralytics.com/inference/v1" |
||||
|
||||
# Define your query parameters |
||||
params = { |
||||
"key": api_key, |
||||
"model": model_id, |
||||
} |
||||
|
||||
image_path = "image.jpg" |
||||
|
||||
with open(image_path, "rb") as image_file: |
||||
files = {"image": image_file} |
||||
response = requests.post(url, files=files, params=params) |
||||
|
||||
print(response.text) |
||||
``` |
||||
|
||||
=== "JSON Response" |
||||
Note COCO-keypoints pretrained models will have 17 human keypoints. The `visible` part of the keypoints indicates whether a keypoint is visible or obscured. Obscured keypoints may be outside the image or may not be visible, i.e. a person's eyes facing away from the camera. |
||||
```json |
||||
[ |
||||
{ |
||||
"name": "person", |
||||
"class": 0, |
||||
"confidence": 0.8439509868621826, |
||||
"box": { |
||||
"x1": 0.1125, |
||||
"y1": 0.28194444444444444, |
||||
"x2": 0.7953125, |
||||
"y2": 0.9902777777777778 |
||||
}, |
||||
"keypoints": { |
||||
"x": [ |
||||
0.5058594942092896, |
||||
0.5103894472122192, |
||||
0.4920862317085266 |
||||
... |
||||
], |
||||
"y": [ |
||||
0.48964157700538635, |
||||
0.4643048942089081, |
||||
0.4465252459049225 |
||||
... |
||||
], |
||||
"visible": [ |
||||
0.8726999163627625, |
||||
0.653947651386261, |
||||
0.9130823612213135 |
||||
... |
||||
] |
||||
} |
||||
}, |
||||
{ |
||||
"name": "person", |
||||
"class": 0, |
||||
"confidence": 0.7474289536476135, |
||||
"box": { |
||||
"x1": 0.58125, |
||||
"y1": 0.0625, |
||||
"x2": 0.8859375, |
||||
"y2": 0.9888888888888889 |
||||
}, |
||||
"keypoints": { |
||||
"x": [ |
||||
0.778544008731842, |
||||
0.7976160049438477, |
||||
0.7530890107154846 |
||||
... |
||||
], |
||||
"y": [ |
||||
0.27595141530036926, |
||||
0.2378823608160019, |
||||
0.23644638061523438 |
||||
... |
||||
], |
||||
"visible": [ |
||||
0.8900790810585022, |
||||
0.789978563785553, |
||||
0.8974530100822449 |
||||
... |
||||
] |
||||
} |
||||
} |
||||
] |
||||
``` |
@ -0,0 +1,61 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore Ultralytics integrations with tools for dataset management, model optimization, ML workflows automation, experiment tracking, version control, and more. Learn about our support for various model export formats for deployment. |
||||
keywords: Ultralytics integrations, Roboflow, Neural Magic, ClearML, Comet ML, DVC, Ultralytics HUB, MLFlow, Neptune, Ray Tune, TensorBoard, W&B, model export formats, PyTorch, TorchScript, ONNX, OpenVINO, TensorRT, CoreML, TF SavedModel, TF GraphDef, TF Lite, TF Edge TPU, TF.js, PaddlePaddle, NCNN |
||||
--- |
||||
|
||||
# Ultralytics Integrations |
||||
|
||||
Welcome to the Ultralytics Integrations page! This page provides an overview of our partnerships with various tools and platforms, designed to streamline your machine learning workflows, enhance dataset management, simplify model training, and facilitate efficient deployment. |
||||
|
||||
<img width="1024" src="https://github.com/ultralytics/assets/raw/main/yolov8/banner-integrations.png"> |
||||
|
||||
## Datasets Integrations |
||||
|
||||
- [Roboflow](https://roboflow.com/?ref=ultralytics): Facilitate seamless dataset management for Ultralytics models, offering robust annotation, preprocessing, and augmentation capabilities. |
||||
|
||||
## Training Integrations |
||||
|
||||
- [Comet ML](https://www.comet.ml/): Enhance your model development with Ultralytics by tracking, comparing, and optimizing your machine learning experiments. |
||||
|
||||
- [ClearML](https://clear.ml/): Automate your Ultralytics ML workflows, monitor experiments, and foster team collaboration. |
||||
|
||||
- [DVC](https://dvc.org/): Implement version control for your Ultralytics machine learning projects, synchronizing data, code, and models effectively. |
||||
|
||||
- [Ultralytics HUB](https://hub.ultralytics.com): Access and contribute to a community of pre-trained Ultralytics models. |
||||
|
||||
- [MLFlow](https://mlflow.org/): Streamline the entire ML lifecycle of Ultralytics models, from experimentation and reproducibility to deployment. |
||||
|
||||
- [Neptune](https://neptune.ai/): Maintain a comprehensive log of your ML experiments with Ultralytics in this metadata store designed for MLOps. |
||||
|
||||
- [Ray Tune](ray-tune.md): Optimize the hyperparameters of your Ultralytics models at any scale. |
||||
|
||||
- [TensorBoard](https://tensorboard.dev/): Visualize your Ultralytics ML workflows, monitor model metrics, and foster team collaboration. |
||||
|
||||
- [Weights & Biases (W&B)](https://wandb.ai/site): Monitor experiments, visualize metrics, and foster reproducibility and collaboration on Ultralytics projects. |
||||
|
||||
## Deployment Integrations |
||||
|
||||
- [Neural Magic](https://neuralmagic.com/): Leverage Quantization Aware Training (QAT) and pruning techniques to optimize Ultralytics models for superior performance and leaner size. |
||||
|
||||
### Export Formats |
||||
|
||||
We also support a variety of model export formats for deployment in different environments. Here are the available formats: |
||||
|
||||
| Format | `format` Argument | Model | Metadata | Arguments | |
||||
|--------------------------------------------------------------------|-------------------|---------------------------|----------|-----------------------------------------------------| |
||||
| [PyTorch](https://pytorch.org/) | - | `yolov8n.pt` | ✅ | - | |
||||
| [TorchScript](https://pytorch.org/docs/stable/jit.html) | `torchscript` | `yolov8n.torchscript` | ✅ | `imgsz`, `optimize` | |
||||
| [ONNX](https://onnx.ai/) | `onnx` | `yolov8n.onnx` | ✅ | `imgsz`, `half`, `dynamic`, `simplify`, `opset` | |
||||
| [OpenVINO](openvino.md) | `openvino` | `yolov8n_openvino_model/` | ✅ | `imgsz`, `half` | |
||||
| [TensorRT](https://developer.nvidia.com/tensorrt) | `engine` | `yolov8n.engine` | ✅ | `imgsz`, `half`, `dynamic`, `simplify`, `workspace` | |
||||
| [CoreML](https://github.com/apple/coremltools) | `coreml` | `yolov8n.mlpackage` | ✅ | `imgsz`, `half`, `int8`, `nms` | |
||||
| [TF SavedModel](https://www.tensorflow.org/guide/saved_model) | `saved_model` | `yolov8n_saved_model/` | ✅ | `imgsz`, `keras` | |
||||
| [TF GraphDef](https://www.tensorflow.org/api_docs/python/tf/Graph) | `pb` | `yolov8n.pb` | ❌ | `imgsz` | |
||||
| [TF Lite](https://www.tensorflow.org/lite) | `tflite` | `yolov8n.tflite` | ✅ | `imgsz`, `half`, `int8` | |
||||
| [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/) | `edgetpu` | `yolov8n_edgetpu.tflite` | ✅ | `imgsz` | |
||||
| [TF.js](https://www.tensorflow.org/js) | `tfjs` | `yolov8n_web_model/` | ✅ | `imgsz` | |
||||
| [PaddlePaddle](https://github.com/PaddlePaddle) | `paddle` | `yolov8n_paddle_model/` | ✅ | `imgsz` | |
||||
| [NCNN](https://github.com/Tencent/ncnn) | `ncnn` | `yolov8n_ncnn_model/` | ✅ | `imgsz`, `half` | |
||||
|
||||
Explore the links to learn more about each integration and how to get the most out of them with Ultralytics. |
@ -0,0 +1,271 @@ |
||||
--- |
||||
comments: true |
||||
description: Discover the power of deploying your Ultralytics YOLOv8 model using OpenVINO format for up to 10x speedup vs PyTorch. |
||||
keywords: ultralytics docs, YOLOv8, export YOLOv8, YOLOv8 model deployment, exporting YOLOv8, OpenVINO, OpenVINO format |
||||
--- |
||||
|
||||
<img width="1024" src="https://user-images.githubusercontent.com/26833433/252345644-0cf84257-4b34-404c-b7ce-eb73dfbcaff1.png" alt="OpenVINO Ecosystem"> |
||||
|
||||
**Export mode** is used for exporting a YOLOv8 model to a format that can be used for deployment. In this guide, we specifically cover exporting to OpenVINO, which can provide up to 3x [CPU](https://docs.openvino.ai/2023.0/openvino_docs_OV_UG_supported_plugins_CPU.html) speedup as well as accelerating on other Intel hardware ([iGPU](https://docs.openvino.ai/2023.0/openvino_docs_OV_UG_supported_plugins_GPU.html), [dGPU](https://docs.openvino.ai/2023.0/openvino_docs_OV_UG_supported_plugins_GPU.html), [VPU](https://docs.openvino.ai/2022.3/openvino_docs_OV_UG_supported_plugins_VPU.html), etc.). |
||||
|
||||
OpenVINO, short for Open Visual Inference & Neural Network Optimization toolkit, is a comprehensive toolkit for optimizing and deploying AI inference models. Even though the name contains Visual, OpenVINO also supports various additional tasks including language, audio, time series, etc. |
||||
|
||||
## Usage Examples |
||||
|
||||
Export a YOLOv8n model to OpenVINO format and run inference with the exported model. |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a YOLOv8n PyTorch model |
||||
model = YOLO('yolov8n.pt') |
||||
|
||||
# Export the model |
||||
model.export(format='openvino') # creates 'yolov8n_openvino_model/' |
||||
|
||||
# Load the exported OpenVINO model |
||||
ov_model = YOLO('yolov8n_openvino_model/') |
||||
|
||||
# Run inference |
||||
results = ov_model('https://ultralytics.com/images/bus.jpg') |
||||
``` |
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Export a YOLOv8n PyTorch model to OpenVINO format |
||||
yolo export model=yolov8n.pt format=openvino # creates 'yolov8n_openvino_model/' |
||||
|
||||
# Run inference with the exported model |
||||
yolo predict model=yolov8n_openvino_model source='https://ultralytics.com/images/bus.jpg' |
||||
``` |
||||
|
||||
## Arguments |
||||
|
||||
| Key | Value | Description | |
||||
|----------|--------------|------------------------------------------------------| |
||||
| `format` | `'openvino'` | format to export to | |
||||
| `imgsz` | `640` | image size as scalar or (h, w) list, i.e. (640, 480) | |
||||
| `half` | `False` | FP16 quantization | |
||||
|
||||
## Benefits of OpenVINO |
||||
|
||||
1. **Performance**: OpenVINO delivers high-performance inference by utilizing the power of Intel CPUs, integrated and discrete GPUs, and FPGAs. |
||||
2. **Support for Heterogeneous Execution**: OpenVINO provides an API to write once and deploy on any supported Intel hardware (CPU, GPU, FPGA, VPU, etc.). |
||||
3. **Model Optimizer**: OpenVINO provides a Model Optimizer that imports, converts, and optimizes models from popular deep learning frameworks such as PyTorch, TensorFlow, TensorFlow Lite, Keras, ONNX, PaddlePaddle, and Caffe. |
||||
4. **Ease of Use**: The toolkit comes with more than [80 tutorial notebooks](https://github.com/openvinotoolkit/openvino_notebooks) (including [YOLOv8 optimization](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/230-yolov8-optimization)) teaching different aspects of the toolkit. |
||||
|
||||
## OpenVINO Export Structure |
||||
|
||||
When you export a model to OpenVINO format, it results in a directory containing the following: |
||||
|
||||
1. **XML file**: Describes the network topology. |
||||
2. **BIN file**: Contains the weights and biases binary data. |
||||
3. **Mapping file**: Holds mapping of original model output tensors to OpenVINO tensor names. |
||||
|
||||
You can use these files to run inference with the OpenVINO Inference Engine. |
||||
|
||||
## Using OpenVINO Export in Deployment |
||||
|
||||
Once you have the OpenVINO files, you can use the OpenVINO Runtime to run the model. The Runtime provides a unified API to inference across all supported Intel hardware. It also provides advanced capabilities like load balancing across Intel hardware and asynchronous execution. For more information on running the inference, refer to the [Inference with OpenVINO Runtime Guide](https://docs.openvino.ai/2023.0/openvino_docs_OV_UG_OV_Runtime_User_Guide.html). |
||||
|
||||
Remember, you'll need the XML and BIN files as well as any application-specific settings like input size, scale factor for normalization, etc., to correctly set up and use the model with the Runtime. |
||||
|
||||
In your deployment application, you would typically do the following steps: |
||||
|
||||
1. Initialize OpenVINO by creating `core = Core()`. |
||||
2. Load the model using the `core.read_model()` method. |
||||
3. Compile the model using the `core.compile_model()` function. |
||||
4. Prepare the input (image, text, audio, etc.). |
||||
5. Run inference using `compiled_model(input_data)`. |
||||
|
||||
For more detailed steps and code snippets, refer to the [OpenVINO documentation](https://docs.openvino.ai/) or [API tutorial](https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/002-openvino-api/002-openvino-api.ipynb). |
||||
|
||||
## OpenVINO YOLOv8 Benchmarks |
||||
|
||||
YOLOv8 benchmarks below were run by the Ultralytics team on 4 different model formats measuring speed and accuracy: PyTorch, TorchScript, ONNX and OpenVINO. Benchmarks were run on Intel Flex and Arc GPUs, and on Intel Xeon CPUs at FP32 precision (with the `half=False` argument). |
||||
|
||||
!!! note |
||||
|
||||
The benchmarking results below are for reference and might vary based on the exact hardware and software configuration of a system, as well as the current workload of the system at the time the benchmarks are run. |
||||
|
||||
All benchmarks run with `openvino` python package version [2023.0.1](https://pypi.org/project/openvino/2023.0.1/). |
||||
|
||||
### Intel Flex GPU |
||||
|
||||
The Intel® Data Center GPU Flex Series is a versatile and robust solution designed for the intelligent visual cloud. This GPU supports a wide array of workloads including media streaming, cloud gaming, AI visual inference, and virtual desktop Infrastructure workloads. It stands out for its open architecture and built-in support for the AV1 encode, providing a standards-based software stack for high-performance, cross-architecture applications. The Flex Series GPU is optimized for density and quality, offering high reliability, availability, and scalability. |
||||
|
||||
Benchmarks below run on Intel® Data Center GPU Flex 170 at FP32 precision. |
||||
|
||||
<div align="center"> |
||||
<img width="800" src="https://user-images.githubusercontent.com/26833433/253741543-62659bf8-1765-4d0b-b71c-8a4f9885506a.jpg"> |
||||
</div> |
||||
|
||||
| Model | Format | Status | Size (MB) | mAP50-95(B) | Inference time (ms/im) | |
||||
|---------|-------------|--------|-----------|-------------|------------------------| |
||||
| YOLOv8n | PyTorch | ✅ | 6.2 | 0.3709 | 21.79 | |
||||
| YOLOv8n | TorchScript | ✅ | 12.4 | 0.3704 | 23.24 | |
||||
| YOLOv8n | ONNX | ✅ | 12.2 | 0.3704 | 37.22 | |
||||
| YOLOv8n | OpenVINO | ✅ | 12.3 | 0.3703 | 3.29 | |
||||
| YOLOv8s | PyTorch | ✅ | 21.5 | 0.4471 | 31.89 | |
||||
| YOLOv8s | TorchScript | ✅ | 42.9 | 0.4472 | 32.71 | |
||||
| YOLOv8s | ONNX | ✅ | 42.8 | 0.4472 | 43.42 | |
||||
| YOLOv8s | OpenVINO | ✅ | 42.9 | 0.4470 | 3.92 | |
||||
| YOLOv8m | PyTorch | ✅ | 49.7 | 0.5013 | 50.75 | |
||||
| YOLOv8m | TorchScript | ✅ | 99.2 | 0.4999 | 47.90 | |
||||
| YOLOv8m | ONNX | ✅ | 99.0 | 0.4999 | 63.16 | |
||||
| YOLOv8m | OpenVINO | ✅ | 49.8 | 0.4997 | 7.11 | |
||||
| YOLOv8l | PyTorch | ✅ | 83.7 | 0.5293 | 77.45 | |
||||
| YOLOv8l | TorchScript | ✅ | 167.2 | 0.5268 | 85.71 | |
||||
| YOLOv8l | ONNX | ✅ | 166.8 | 0.5268 | 88.94 | |
||||
| YOLOv8l | OpenVINO | ✅ | 167.0 | 0.5264 | 9.37 | |
||||
| YOLOv8x | PyTorch | ✅ | 130.5 | 0.5404 | 100.09 | |
||||
| YOLOv8x | TorchScript | ✅ | 260.7 | 0.5371 | 114.64 | |
||||
| YOLOv8x | ONNX | ✅ | 260.4 | 0.5371 | 110.32 | |
||||
| YOLOv8x | OpenVINO | ✅ | 260.6 | 0.5367 | 15.02 | |
||||
|
||||
This table represents the benchmark results for five different models (YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, YOLOv8x) across four different formats (PyTorch, TorchScript, ONNX, OpenVINO), giving us the status, size, mAP50-95(B) metric, and inference time for each combination. |
||||
|
||||
### Intel Arc GPU |
||||
|
||||
Intel® Arc™ represents Intel's foray into the dedicated GPU market. The Arc™ series, designed to compete with leading GPU manufacturers like AMD and Nvidia, caters to both the laptop and desktop markets. The series includes mobile versions for compact devices like laptops, and larger, more powerful versions for desktop computers. |
||||
|
||||
The Arc™ series is divided into three categories: Arc™ 3, Arc™ 5, and Arc™ 7, with each number indicating the performance level. Each category includes several models, and the 'M' in the GPU model name signifies a mobile, integrated variant. |
||||
|
||||
Early reviews have praised the Arc™ series, particularly the integrated A770M GPU, for its impressive graphics performance. The availability of the Arc™ series varies by region, and additional models are expected to be released soon. Intel® Arc™ GPUs offer high-performance solutions for a range of computing needs, from gaming to content creation. |
||||
|
||||
Benchmarks below run on Intel® Arc 770 GPU at FP32 precision. |
||||
|
||||
<div align="center"> |
||||
<img width="800" src="https://user-images.githubusercontent.com/26833433/253741545-8530388f-8fd1-44f7-a4ae-f875d59dc282.jpg"> |
||||
</div> |
||||
|
||||
| Model | Format | Status | Size (MB) | metrics/mAP50-95(B) | Inference time (ms/im) | |
||||
|---------|-------------|--------|-----------|---------------------|------------------------| |
||||
| YOLOv8n | PyTorch | ✅ | 6.2 | 0.3709 | 88.79 | |
||||
| YOLOv8n | TorchScript | ✅ | 12.4 | 0.3704 | 102.66 | |
||||
| YOLOv8n | ONNX | ✅ | 12.2 | 0.3704 | 57.98 | |
||||
| YOLOv8n | OpenVINO | ✅ | 12.3 | 0.3703 | 8.52 | |
||||
| YOLOv8s | PyTorch | ✅ | 21.5 | 0.4471 | 189.83 | |
||||
| YOLOv8s | TorchScript | ✅ | 42.9 | 0.4472 | 227.58 | |
||||
| YOLOv8s | ONNX | ✅ | 42.7 | 0.4472 | 142.03 | |
||||
| YOLOv8s | OpenVINO | ✅ | 42.9 | 0.4469 | 9.19 | |
||||
| YOLOv8m | PyTorch | ✅ | 49.7 | 0.5013 | 411.64 | |
||||
| YOLOv8m | TorchScript | ✅ | 99.2 | 0.4999 | 517.12 | |
||||
| YOLOv8m | ONNX | ✅ | 98.9 | 0.4999 | 298.68 | |
||||
| YOLOv8m | OpenVINO | ✅ | 99.1 | 0.4996 | 12.55 | |
||||
| YOLOv8l | PyTorch | ✅ | 83.7 | 0.5293 | 725.73 | |
||||
| YOLOv8l | TorchScript | ✅ | 167.1 | 0.5268 | 892.83 | |
||||
| YOLOv8l | ONNX | ✅ | 166.8 | 0.5268 | 576.11 | |
||||
| YOLOv8l | OpenVINO | ✅ | 167.0 | 0.5262 | 17.62 | |
||||
| YOLOv8x | PyTorch | ✅ | 130.5 | 0.5404 | 988.92 | |
||||
| YOLOv8x | TorchScript | ✅ | 260.7 | 0.5371 | 1186.42 | |
||||
| YOLOv8x | ONNX | ✅ | 260.4 | 0.5371 | 768.90 | |
||||
| YOLOv8x | OpenVINO | ✅ | 260.6 | 0.5367 | 19 | |
||||
|
||||
### Intel Xeon CPU |
||||
|
||||
The Intel® Xeon® CPU is a high-performance, server-grade processor designed for complex and demanding workloads. From high-end cloud computing and virtualization to artificial intelligence and machine learning applications, Xeon® CPUs provide the power, reliability, and flexibility required for today's data centers. |
||||
|
||||
Notably, Xeon® CPUs deliver high compute density and scalability, making them ideal for both small businesses and large enterprises. By choosing Intel® Xeon® CPUs, organizations can confidently handle their most demanding computing tasks and foster innovation while maintaining cost-effectiveness and operational efficiency. |
||||
|
||||
Benchmarks below run on 4th Gen Intel® Xeon® Scalable CPU at FP32 precision. |
||||
|
||||
<div align="center"> |
||||
<img width="800" src="https://user-images.githubusercontent.com/26833433/253741546-dcd8e52a-fc38-424f-b87e-c8365b6f28dc.jpg"> |
||||
</div> |
||||
|
||||
| Model | Format | Status | Size (MB) | metrics/mAP50-95(B) | Inference time (ms/im) | |
||||
|---------|-------------|--------|-----------|---------------------|------------------------| |
||||
| YOLOv8n | PyTorch | ✅ | 6.2 | 0.3709 | 24.36 | |
||||
| YOLOv8n | TorchScript | ✅ | 12.4 | 0.3704 | 23.93 | |
||||
| YOLOv8n | ONNX | ✅ | 12.2 | 0.3704 | 39.86 | |
||||
| YOLOv8n | OpenVINO | ✅ | 12.3 | 0.3704 | 11.34 | |
||||
| YOLOv8s | PyTorch | ✅ | 21.5 | 0.4471 | 33.77 | |
||||
| YOLOv8s | TorchScript | ✅ | 42.9 | 0.4472 | 34.84 | |
||||
| YOLOv8s | ONNX | ✅ | 42.8 | 0.4472 | 43.23 | |
||||
| YOLOv8s | OpenVINO | ✅ | 42.9 | 0.4471 | 13.86 | |
||||
| YOLOv8m | PyTorch | ✅ | 49.7 | 0.5013 | 53.91 | |
||||
| YOLOv8m | TorchScript | ✅ | 99.2 | 0.4999 | 53.51 | |
||||
| YOLOv8m | ONNX | ✅ | 99.0 | 0.4999 | 64.16 | |
||||
| YOLOv8m | OpenVINO | ✅ | 99.1 | 0.4996 | 28.79 | |
||||
| YOLOv8l | PyTorch | ✅ | 83.7 | 0.5293 | 75.78 | |
||||
| YOLOv8l | TorchScript | ✅ | 167.2 | 0.5268 | 79.13 | |
||||
| YOLOv8l | ONNX | ✅ | 166.8 | 0.5268 | 88.45 | |
||||
| YOLOv8l | OpenVINO | ✅ | 167.0 | 0.5263 | 56.23 | |
||||
| YOLOv8x | PyTorch | ✅ | 130.5 | 0.5404 | 96.60 | |
||||
| YOLOv8x | TorchScript | ✅ | 260.7 | 0.5371 | 114.28 | |
||||
| YOLOv8x | ONNX | ✅ | 260.4 | 0.5371 | 111.02 | |
||||
| YOLOv8x | OpenVINO | ✅ | 260.6 | 0.5371 | 83.28 | |
||||
|
||||
### Intel Core CPU |
||||
|
||||
The Intel® Core® series is a range of high-performance processors by Intel. The lineup includes Core i3 (entry-level), Core i5 (mid-range), Core i7 (high-end), and Core i9 (extreme performance). Each series caters to different computing needs and budgets, from everyday tasks to demanding professional workloads. With each new generation, improvements are made to performance, energy efficiency, and features. |
||||
|
||||
Benchmarks below run on 13th Gen Intel® Core® i7-13700H CPU at FP32 precision. |
||||
|
||||
<div align="center"> |
||||
<img width="800" src="https://user-images.githubusercontent.com/26833433/254559985-727bfa43-93fa-4fec-a417-800f869f3f9e.jpg"> |
||||
</div> |
||||
|
||||
| Model | Format | Status | Size (MB) | metrics/mAP50-95(B) | Inference time (ms/im) | |
||||
|---------|-------------|--------|-----------|---------------------|------------------------| |
||||
| YOLOv8n | PyTorch | ✅ | 6.2 | 0.4478 | 104.61 | |
||||
| YOLOv8n | TorchScript | ✅ | 12.4 | 0.4525 | 112.39 | |
||||
| YOLOv8n | ONNX | ✅ | 12.2 | 0.4525 | 28.02 | |
||||
| YOLOv8n | OpenVINO | ✅ | 12.3 | 0.4504 | 23.53 | |
||||
| YOLOv8s | PyTorch | ✅ | 21.5 | 0.5885 | 194.83 | |
||||
| YOLOv8s | TorchScript | ✅ | 43.0 | 0.5962 | 202.01 | |
||||
| YOLOv8s | ONNX | ✅ | 42.8 | 0.5962 | 65.74 | |
||||
| YOLOv8s | OpenVINO | ✅ | 42.9 | 0.5966 | 38.66 | |
||||
| YOLOv8m | PyTorch | ✅ | 49.7 | 0.6101 | 355.23 | |
||||
| YOLOv8m | TorchScript | ✅ | 99.2 | 0.6120 | 424.78 | |
||||
| YOLOv8m | ONNX | ✅ | 99.0 | 0.6120 | 173.39 | |
||||
| YOLOv8m | OpenVINO | ✅ | 99.1 | 0.6091 | 69.80 | |
||||
| YOLOv8l | PyTorch | ✅ | 83.7 | 0.6591 | 593.00 | |
||||
| YOLOv8l | TorchScript | ✅ | 167.2 | 0.6580 | 697.54 | |
||||
| YOLOv8l | ONNX | ✅ | 166.8 | 0.6580 | 342.15 | |
||||
| YOLOv8l | OpenVINO | ✅ | 167.0 | 0.0708 | 117.69 | |
||||
| YOLOv8x | PyTorch | ✅ | 130.5 | 0.6651 | 804.65 | |
||||
| YOLOv8x | TorchScript | ✅ | 260.8 | 0.6650 | 921.46 | |
||||
| YOLOv8x | ONNX | ✅ | 260.4 | 0.6650 | 526.66 | |
||||
| YOLOv8x | OpenVINO | ✅ | 260.6 | 0.6619 | 158.73 | |
||||
|
||||
## Reproduce Our Results |
||||
|
||||
To reproduce the Ultralytics benchmarks above on all export [formats](../modes/export.md) run this code: |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a YOLOv8n PyTorch model |
||||
model = YOLO('yolov8n.pt') |
||||
|
||||
# Benchmark YOLOv8n speed and accuracy on the COCO128 dataset for all all export formats |
||||
results= model.benchmarks(data='coco128.yaml') |
||||
``` |
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Benchmark YOLOv8n speed and accuracy on the COCO128 dataset for all all export formats |
||||
yolo benchmark model=yolov8n.pt data=coco128.yaml |
||||
``` |
||||
|
||||
Note that benchmarking results might vary based on the exact hardware and software configuration of a system, as well as the current workload of the system at the time the benchmarks are run. For the most reliable results use a dataset with a large number of images, i.e. `data='coco128.yaml' (128 val images), or `data='coco.yaml'` (5000 val images). |
||||
|
||||
## Conclusion |
||||
|
||||
The benchmarking results clearly demonstrate the benefits of exporting the YOLOv8 model to the OpenVINO format. Across different models and hardware platforms, the OpenVINO format consistently outperforms other formats in terms of inference speed while maintaining comparable accuracy. |
||||
|
||||
For the Intel® Data Center GPU Flex Series, the OpenVINO format was able to deliver inference speeds almost 10 times faster than the original PyTorch format. On the Xeon CPU, the OpenVINO format was twice as fast as the PyTorch format. The accuracy of the models remained nearly identical across the different formats. |
||||
|
||||
The benchmarks underline the effectiveness of OpenVINO as a tool for deploying deep learning models. By converting models to the OpenVINO format, developers can achieve significant performance improvements, making it easier to deploy these models in real-world applications. |
||||
|
||||
For more detailed information and instructions on using OpenVINO, refer to the [official OpenVINO documentation](https://docs.openvinotoolkit.org/latest/index.html). |
@ -0,0 +1,174 @@ |
||||
--- |
||||
comments: true |
||||
description: Discover how to streamline hyperparameter tuning for YOLOv8 models with Ray Tune. Learn to accelerate tuning, integrate with Weights & Biases, and analyze results. |
||||
keywords: Ultralytics, YOLOv8, Ray Tune, hyperparameter tuning, machine learning optimization, Weights & Biases integration, result analysis |
||||
--- |
||||
|
||||
# Efficient Hyperparameter Tuning with Ray Tune and YOLOv8 |
||||
|
||||
Hyperparameter tuning is vital in achieving peak model performance by discovering the optimal set of hyperparameters. This involves running trials with different hyperparameters and evaluating each trial’s performance. |
||||
|
||||
## Accelerate Tuning with Ultralytics YOLOv8 and Ray Tune |
||||
|
||||
[Ultralytics YOLOv8](https://ultralytics.com) incorporates Ray Tune for hyperparameter tuning, streamlining the optimization of YOLOv8 model hyperparameters. With Ray Tune, you can utilize advanced search strategies, parallelism, and early stopping to expedite the tuning process. |
||||
|
||||
### Ray Tune |
||||
|
||||
<p align="center"> |
||||
<img width="640" src="https://docs.ray.io/en/latest/_images/tune_overview.png" alt="Ray Tune Overview"> |
||||
</p> |
||||
|
||||
[Ray Tune](https://docs.ray.io/en/latest/tune/index.html) is a hyperparameter tuning library designed for efficiency and flexibility. It supports various search strategies, parallelism, and early stopping strategies, and seamlessly integrates with popular machine learning frameworks, including Ultralytics YOLOv8. |
||||
|
||||
### Integration with Weights & Biases |
||||
|
||||
YOLOv8 also allows optional integration with [Weights & Biases](https://wandb.ai/site) for monitoring the tuning process. |
||||
|
||||
## Installation |
||||
|
||||
To install the required packages, run: |
||||
|
||||
!!! tip "Installation" |
||||
|
||||
```bash |
||||
# Install and update Ultralytics and Ray Tune packages |
||||
pip install -U ultralytics "ray[tune]" |
||||
|
||||
# Optionally install W&B for logging |
||||
pip install wandb |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
!!! example "Usage" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a YOLOv8n model |
||||
model = YOLO("yolov8n.pt") |
||||
|
||||
# Start tuning hyperparameters for YOLOv8n training on the COCO128 dataset |
||||
result_grid = model.tune(data="coco128.yaml") |
||||
``` |
||||
|
||||
## `tune()` Method Parameters |
||||
|
||||
The `tune()` method in YOLOv8 provides an easy-to-use interface for hyperparameter tuning with Ray Tune. It accepts several arguments that allow you to customize the tuning process. Below is a detailed explanation of each parameter: |
||||
|
||||
| Parameter | Type | Description | Default Value | |
||||
|-----------------|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------| |
||||
| `data` | `str` | The dataset configuration file (in YAML format) to run the tuner on. This file should specify the training and validation data paths, as well as other dataset-specific settings. | | |
||||
| `space` | `dict, optional` | A dictionary defining the hyperparameter search space for Ray Tune. Each key corresponds to a hyperparameter name, and the value specifies the range of values to explore during tuning. If not provided, YOLOv8 uses a default search space with various hyperparameters. | | |
||||
| `grace_period` | `int, optional` | The grace period in epochs for the [ASHA scheduler](https://docs.ray.io/en/latest/tune/api/schedulers.html) in Ray Tune. The scheduler will not terminate any trial before this number of epochs, allowing the model to have some minimum training before making a decision on early stopping. | 10 | |
||||
| `gpu_per_trial` | `int, optional` | The number of GPUs to allocate per trial during tuning. This helps manage GPU usage, particularly in multi-GPU environments. If not provided, the tuner will use all available GPUs. | None | |
||||
| `max_samples` | `int, optional` | The maximum number of trials to run during tuning. This parameter helps control the total number of hyperparameter combinations tested, ensuring the tuning process does not run indefinitely. | 10 | |
||||
| `**train_args` | `dict, optional` | Additional arguments to pass to the `train()` method during tuning. These arguments can include settings like the number of training epochs, batch size, and other training-specific configurations. | {} | |
||||
|
||||
By customizing these parameters, you can fine-tune the hyperparameter optimization process to suit your specific needs and available computational resources. |
||||
|
||||
## Default Search Space Description |
||||
|
||||
The following table lists the default search space parameters for hyperparameter tuning in YOLOv8 with Ray Tune. Each parameter has a specific value range defined by `tune.uniform()`. |
||||
|
||||
| Parameter | Value Range | Description | |
||||
|-------------------|----------------------------|------------------------------------------| |
||||
| `lr0` | `tune.uniform(1e-5, 1e-1)` | Initial learning rate | |
||||
| `lrf` | `tune.uniform(0.01, 1.0)` | Final learning rate factor | |
||||
| `momentum` | `tune.uniform(0.6, 0.98)` | Momentum | |
||||
| `weight_decay` | `tune.uniform(0.0, 0.001)` | Weight decay | |
||||
| `warmup_epochs` | `tune.uniform(0.0, 5.0)` | Warmup epochs | |
||||
| `warmup_momentum` | `tune.uniform(0.0, 0.95)` | Warmup momentum | |
||||
| `box` | `tune.uniform(0.02, 0.2)` | Box loss weight | |
||||
| `cls` | `tune.uniform(0.2, 4.0)` | Class loss weight | |
||||
| `hsv_h` | `tune.uniform(0.0, 0.1)` | Hue augmentation range | |
||||
| `hsv_s` | `tune.uniform(0.0, 0.9)` | Saturation augmentation range | |
||||
| `hsv_v` | `tune.uniform(0.0, 0.9)` | Value (brightness) augmentation range | |
||||
| `degrees` | `tune.uniform(0.0, 45.0)` | Rotation augmentation range (degrees) | |
||||
| `translate` | `tune.uniform(0.0, 0.9)` | Translation augmentation range | |
||||
| `scale` | `tune.uniform(0.0, 0.9)` | Scaling augmentation range | |
||||
| `shear` | `tune.uniform(0.0, 10.0)` | Shear augmentation range (degrees) | |
||||
| `perspective` | `tune.uniform(0.0, 0.001)` | Perspective augmentation range | |
||||
| `flipud` | `tune.uniform(0.0, 1.0)` | Vertical flip augmentation probability | |
||||
| `fliplr` | `tune.uniform(0.0, 1.0)` | Horizontal flip augmentation probability | |
||||
| `mosaic` | `tune.uniform(0.0, 1.0)` | Mosaic augmentation probability | |
||||
| `mixup` | `tune.uniform(0.0, 1.0)` | Mixup augmentation probability | |
||||
| `copy_paste` | `tune.uniform(0.0, 1.0)` | Copy-paste augmentation probability | |
||||
|
||||
## Custom Search Space Example |
||||
|
||||
In this example, we demonstrate how to use a custom search space for hyperparameter tuning with Ray Tune and YOLOv8. By providing a custom search space, you can focus the tuning process on specific hyperparameters of interest. |
||||
|
||||
!!! example "Usage" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Define a YOLO model |
||||
model = YOLO("yolov8n.pt") |
||||
|
||||
# Run Ray Tune on the model |
||||
result_grid = model.tune(data="coco128.yaml", |
||||
space={"lr0": tune.uniform(1e-5, 1e-1)}, |
||||
epochs=50) |
||||
``` |
||||
|
||||
In the code snippet above, we create a YOLO model with the "yolov8n.pt" pretrained weights. Then, we call the `tune()` method, specifying the dataset configuration with "coco128.yaml". We provide a custom search space for the initial learning rate `lr0` using a dictionary with the key "lr0" and the value `tune.uniform(1e-5, 1e-1)`. Finally, we pass additional training arguments, such as the number of epochs directly to the tune method as `epochs=50`. |
||||
|
||||
# Processing Ray Tune Results |
||||
|
||||
After running a hyperparameter tuning experiment with Ray Tune, you might want to perform various analyses on the obtained results. This guide will take you through common workflows for processing and analyzing these results. |
||||
|
||||
## Loading Tune Experiment Results from a Directory |
||||
|
||||
After running the tuning experiment with `tuner.fit()`, you can load the results from a directory. This is useful, especially if you're performing the analysis after the initial training script has exited. |
||||
|
||||
```python |
||||
experiment_path = f"{storage_path}/{exp_name}" |
||||
print(f"Loading results from {experiment_path}...") |
||||
|
||||
restored_tuner = tune.Tuner.restore(experiment_path, trainable=train_mnist) |
||||
result_grid = restored_tuner.get_results() |
||||
``` |
||||
|
||||
## Basic Experiment-Level Analysis |
||||
|
||||
Get an overview of how trials performed. You can quickly check if there were any errors during the trials. |
||||
|
||||
```python |
||||
if result_grid.errors: |
||||
print("One or more trials failed!") |
||||
else: |
||||
print("No errors!") |
||||
``` |
||||
|
||||
## Basic Trial-Level Analysis |
||||
|
||||
Access individual trial hyperparameter configurations and the last reported metrics. |
||||
|
||||
```python |
||||
for i, result in enumerate(result_grid): |
||||
print(f"Trial #{i}: Configuration: {result.config}, Last Reported Metrics: {result.metrics}") |
||||
``` |
||||
|
||||
## Plotting the Entire History of Reported Metrics for a Trial |
||||
|
||||
You can plot the history of reported metrics for each trial to see how the metrics evolved over time. |
||||
|
||||
```python |
||||
import matplotlib.pyplot as plt |
||||
|
||||
for result in result_grid: |
||||
plt.plot(result.metrics_dataframe["training_iteration"], result.metrics_dataframe["mean_accuracy"], label=f"Trial {i}") |
||||
|
||||
plt.xlabel('Training Iterations') |
||||
plt.ylabel('Mean Accuracy') |
||||
plt.legend() |
||||
plt.show() |
||||
``` |
||||
|
||||
## Summary |
||||
|
||||
In this documentation, we covered common workflows to analyze the results of experiments run with Ray Tune using Ultralytics. The key steps include loading the experiment results from a directory, performing basic experiment-level and trial-level analysis and plotting metrics. |
||||
|
||||
Explore further by looking into Ray Tune’s [Analyze Results](https://docs.ray.io/en/latest/tune/examples/tune_analyze_results.html) docs page to get the most out of your hyperparameter tuning experiments. |
@ -0,0 +1,186 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore FastSAM, a CNN-based solution for real-time object segmentation in images. Enhanced user interaction, computational efficiency and adaptable across vision tasks. |
||||
keywords: FastSAM, machine learning, CNN-based solution, object segmentation, real-time solution, Ultralytics, vision tasks, image processing, industrial applications, user interaction |
||||
--- |
||||
|
||||
# Fast Segment Anything Model (FastSAM) |
||||
|
||||
The Fast Segment Anything Model (FastSAM) is a novel, real-time CNN-based solution for the Segment Anything task. This task is designed to segment any object within an image based on various possible user interaction prompts. FastSAM significantly reduces computational demands while maintaining competitive performance, making it a practical choice for a variety of vision tasks. |
||||
|
||||
 |
||||
|
||||
## Overview |
||||
|
||||
FastSAM is designed to address the limitations of the [Segment Anything Model (SAM)](sam.md), a heavy Transformer model with substantial computational resource requirements. The FastSAM decouples the segment anything task into two sequential stages: all-instance segmentation and prompt-guided selection. The first stage uses [YOLOv8-seg](../tasks/segment.md) to produce the segmentation masks of all instances in the image. In the second stage, it outputs the region-of-interest corresponding to the prompt. |
||||
|
||||
## Key Features |
||||
|
||||
1. **Real-time Solution:** By leveraging the computational efficiency of CNNs, FastSAM provides a real-time solution for the segment anything task, making it valuable for industrial applications that require quick results. |
||||
|
||||
2. **Efficiency and Performance:** FastSAM offers a significant reduction in computational and resource demands without compromising on performance quality. It achieves comparable performance to SAM but with drastically reduced computational resources, enabling real-time application. |
||||
|
||||
3. **Prompt-guided Segmentation:** FastSAM can segment any object within an image guided by various possible user interaction prompts, providing flexibility and adaptability in different scenarios. |
||||
|
||||
4. **Based on YOLOv8-seg:** FastSAM is based on [YOLOv8-seg](../tasks/segment.md), an object detector equipped with an instance segmentation branch. This allows it to effectively produce the segmentation masks of all instances in an image. |
||||
|
||||
5. **Competitive Results on Benchmarks:** On the object proposal task on MS COCO, FastSAM achieves high scores at a significantly faster speed than [SAM](sam.md) on a single NVIDIA RTX 3090, demonstrating its efficiency and capability. |
||||
|
||||
6. **Practical Applications:** The proposed approach provides a new, practical solution for a large number of vision tasks at a really high speed, tens or hundreds of times faster than current methods. |
||||
|
||||
7. **Model Compression Feasibility:** FastSAM demonstrates the feasibility of a path that can significantly reduce the computational effort by introducing an artificial prior to the structure, thus opening new possibilities for large model architecture for general vision tasks. |
||||
|
||||
## Usage |
||||
|
||||
### Python API |
||||
|
||||
The FastSAM models are easy to integrate into your Python applications. Ultralytics provides a user-friendly Python API to streamline the process. |
||||
|
||||
#### Predict Usage |
||||
|
||||
To perform object detection on an image, use the `predict` method as shown below: |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
```python |
||||
from ultralytics import FastSAM |
||||
from ultralytics.models.fastsam import FastSAMPrompt |
||||
|
||||
# Define an inference source |
||||
source = 'path/to/bus.jpg' |
||||
|
||||
# Create a FastSAM model |
||||
model = FastSAM('FastSAM-s.pt') # or FastSAM-x.pt |
||||
|
||||
# Run inference on an image |
||||
everything_results = model(source, device='cpu', retina_masks=True, imgsz=1024, conf=0.4, iou=0.9) |
||||
|
||||
# Prepare a Prompt Process object |
||||
prompt_process = FastSAMPrompt(source, everything_results, device='cpu') |
||||
|
||||
# Everything prompt |
||||
ann = prompt_process.everything_prompt() |
||||
|
||||
# Bbox default shape [0,0,0,0] -> [x1,y1,x2,y2] |
||||
ann = prompt_process.box_prompt(bbox=[200, 200, 300, 300]) |
||||
|
||||
# Text prompt |
||||
ann = prompt_process.text_prompt(text='a photo of a dog') |
||||
|
||||
# Point prompt |
||||
# points default [[0,0]] [[x1,y1],[x2,y2]] |
||||
# point_label default [0] [1,0] 0:background, 1:foreground |
||||
ann = prompt_process.point_prompt(points=[[200, 200]], pointlabel=[1]) |
||||
prompt_process.plot(annotations=ann, output='./') |
||||
``` |
||||
|
||||
=== "CLI" |
||||
```bash |
||||
# Load a FastSAM model and segment everything with it |
||||
yolo segment predict model=FastSAM-s.pt source=path/to/bus.jpg imgsz=640 |
||||
``` |
||||
|
||||
This snippet demonstrates the simplicity of loading a pre-trained model and running a prediction on an image. |
||||
|
||||
#### Val Usage |
||||
|
||||
Validation of the model on a dataset can be done as follows: |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
```python |
||||
from ultralytics import FastSAM |
||||
|
||||
# Create a FastSAM model |
||||
model = FastSAM('FastSAM-s.pt') # or FastSAM-x.pt |
||||
|
||||
# Validate the model |
||||
results = model.val(data='coco8-seg.yaml') |
||||
``` |
||||
|
||||
=== "CLI" |
||||
```bash |
||||
# Load a FastSAM model and validate it on the COCO8 example dataset at image size 640 |
||||
yolo segment val model=FastSAM-s.pt data=coco8.yaml imgsz=640 |
||||
``` |
||||
|
||||
Please note that FastSAM only supports detection and segmentation of a single class of object. This means it will recognize and segment all objects as the same class. Therefore, when preparing the dataset, you need to convert all object category IDs to 0. |
||||
|
||||
### FastSAM official Usage |
||||
|
||||
FastSAM is also available directly from the [https://github.com/CASIA-IVA-Lab/FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM) repository. Here is a brief overview of the typical steps you might take to use FastSAM: |
||||
|
||||
#### Installation |
||||
|
||||
1. Clone the FastSAM repository: |
||||
```shell |
||||
git clone https://github.com/CASIA-IVA-Lab/FastSAM.git |
||||
``` |
||||
|
||||
2. Create and activate a Conda environment with Python 3.9: |
||||
```shell |
||||
conda create -n FastSAM python=3.9 |
||||
conda activate FastSAM |
||||
``` |
||||
|
||||
3. Navigate to the cloned repository and install the required packages: |
||||
```shell |
||||
cd FastSAM |
||||
pip install -r requirements.txt |
||||
``` |
||||
|
||||
4. Install the CLIP model: |
||||
```shell |
||||
pip install git+https://github.com/openai/CLIP.git |
||||
``` |
||||
|
||||
#### Example Usage |
||||
|
||||
1. Download a [model checkpoint](https://drive.google.com/file/d/1m1sjY4ihXBU1fZXdQ-Xdj-mDltW-2Rqv/view?usp=sharing). |
||||
|
||||
2. Use FastSAM for inference. Example commands: |
||||
|
||||
- Segment everything in an image: |
||||
```shell |
||||
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg |
||||
``` |
||||
|
||||
- Segment specific objects using text prompt: |
||||
```shell |
||||
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --text_prompt "the yellow dog" |
||||
``` |
||||
|
||||
- Segment objects within a bounding box (provide box coordinates in xywh format): |
||||
```shell |
||||
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --box_prompt "[570,200,230,400]" |
||||
``` |
||||
|
||||
- Segment objects near specific points: |
||||
```shell |
||||
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --point_prompt "[[520,360],[620,300]]" --point_label "[1,0]" |
||||
``` |
||||
|
||||
Additionally, you can try FastSAM through a [Colab demo](https://colab.research.google.com/drive/1oX14f6IneGGw612WgVlAiy91UHwFAvr9?usp=sharing) or on the [HuggingFace web demo](https://huggingface.co/spaces/An-619/FastSAM) for a visual experience. |
||||
|
||||
## Citations and Acknowledgements |
||||
|
||||
We would like to acknowledge the FastSAM authors for their significant contributions in the field of real-time instance segmentation: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{zhao2023fast, |
||||
title={Fast Segment Anything}, |
||||
author={Xu Zhao and Wenchao Ding and Yongqi An and Yinglong Du and Tao Yu and Min Li and Ming Tang and Jinqiao Wang}, |
||||
year={2023}, |
||||
eprint={2306.12156}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
The original FastSAM paper can be found on [arXiv](https://arxiv.org/abs/2306.12156). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/CASIA-IVA-Lab/FastSAM). We appreciate their efforts in advancing the field and making their work accessible to the broader community. |
@ -0,0 +1,109 @@ |
||||
--- |
||||
comments: true |
||||
description: Learn more about MobileSAM, its implementation, comparison with the original SAM, and how to download and test it in the Ultralytics framework. Improve your mobile applications today. |
||||
keywords: MobileSAM, Ultralytics, SAM, mobile applications, Arxiv, GPU, API, image encoder, mask decoder, model download, testing method |
||||
--- |
||||
|
||||
 |
||||
|
||||
# Mobile Segment Anything (MobileSAM) |
||||
|
||||
The MobileSAM paper is now available on [arXiv](https://arxiv.org/pdf/2306.14289.pdf). |
||||
|
||||
A demonstration of MobileSAM running on a CPU can be accessed at this [demo link](https://huggingface.co/spaces/dhkim2810/MobileSAM). The performance on a Mac i5 CPU takes approximately 3 seconds. On the Hugging Face demo, the interface and lower-performance CPUs contribute to a slower response, but it continues to function effectively. |
||||
|
||||
MobileSAM is implemented in various projects including [Grounding-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything), [AnyLabeling](https://github.com/vietanhdev/anylabeling), and [Segment Anything in 3D](https://github.com/Jumpat/SegmentAnythingin3D). |
||||
|
||||
MobileSAM is trained on a single GPU with a 100k dataset (1% of the original images) in less than a day. The code for this training will be made available in the future. |
||||
|
||||
## Adapting from SAM to MobileSAM |
||||
|
||||
Since MobileSAM retains the same pipeline as the original SAM, we have incorporated the original's pre-processing, post-processing, and all other interfaces. Consequently, those currently using the original SAM can transition to MobileSAM with minimal effort. |
||||
|
||||
MobileSAM performs comparably to the original SAM and retains the same pipeline except for a change in the image encoder. Specifically, we replace the original heavyweight ViT-H encoder (632M) with a smaller Tiny-ViT (5M). On a single GPU, MobileSAM operates at about 12ms per image: 8ms on the image encoder and 4ms on the mask decoder. |
||||
|
||||
The following table provides a comparison of ViT-based image encoders: |
||||
|
||||
| Image Encoder | Original SAM | MobileSAM | |
||||
|---------------|--------------|-----------| |
||||
| Parameters | 611M | 5M | |
||||
| Speed | 452ms | 8ms | |
||||
|
||||
Both the original SAM and MobileSAM utilize the same prompt-guided mask decoder: |
||||
|
||||
| Mask Decoder | Original SAM | MobileSAM | |
||||
|--------------|--------------|-----------| |
||||
| Parameters | 3.876M | 3.876M | |
||||
| Speed | 4ms | 4ms | |
||||
|
||||
Here is the comparison of the whole pipeline: |
||||
|
||||
| Whole Pipeline (Enc+Dec) | Original SAM | MobileSAM | |
||||
|--------------------------|--------------|-----------| |
||||
| Parameters | 615M | 9.66M | |
||||
| Speed | 456ms | 12ms | |
||||
|
||||
The performance of MobileSAM and the original SAM are demonstrated using both a point and a box as prompts. |
||||
|
||||
 |
||||
|
||||
 |
||||
|
||||
With its superior performance, MobileSAM is approximately 5 times smaller and 7 times faster than the current FastSAM. More details are available at the [MobileSAM project page](https://github.com/ChaoningZhang/MobileSAM). |
||||
|
||||
## Testing MobileSAM in Ultralytics |
||||
|
||||
Just like the original SAM, we offer a straightforward testing method in Ultralytics, including modes for both Point and Box prompts. |
||||
|
||||
### Model Download |
||||
|
||||
You can download the model [here](https://github.com/ChaoningZhang/MobileSAM/blob/master/weights/mobile_sam.pt). |
||||
|
||||
### Point Prompt |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
```python |
||||
from ultralytics import SAM |
||||
|
||||
# Load the model |
||||
model = SAM('mobile_sam.pt') |
||||
|
||||
# Predict a segment based on a point prompt |
||||
model.predict('ultralytics/assets/zidane.jpg', points=[900, 370], labels=[1]) |
||||
``` |
||||
|
||||
### Box Prompt |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
```python |
||||
from ultralytics import SAM |
||||
|
||||
# Load the model |
||||
model = SAM('mobile_sam.pt') |
||||
|
||||
# Predict a segment based on a box prompt |
||||
model.predict('ultralytics/assets/zidane.jpg', bboxes=[439, 437, 524, 709]) |
||||
``` |
||||
|
||||
We have implemented `MobileSAM` and `SAM` using the same API. For more usage information, please see the [SAM page](./sam.md). |
||||
|
||||
## Citations and Acknowledgements |
||||
|
||||
If you find MobileSAM useful in your research or development work, please consider citing our paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@article{mobile_sam, |
||||
title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications}, |
||||
author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon}, |
||||
journal={arXiv preprint arXiv:2306.14289}, |
||||
year={2023} |
||||
} |
||||
``` |
@ -0,0 +1,101 @@ |
||||
--- |
||||
comments: true |
||||
description: Discover the features and benefits of RT-DETR, Baidu’s efficient and adaptable real-time object detector powered by Vision Transformers, including pre-trained models. |
||||
keywords: RT-DETR, Baidu, Vision Transformers, object detection, real-time performance, CUDA, TensorRT, IoU-aware query selection, Ultralytics, Python API, PaddlePaddle |
||||
--- |
||||
|
||||
# Baidu's RT-DETR: A Vision Transformer-Based Real-Time Object Detector |
||||
|
||||
## Overview |
||||
|
||||
Real-Time Detection Transformer (RT-DETR), developed by Baidu, is a cutting-edge end-to-end object detector that provides real-time performance while maintaining high accuracy. It leverages the power of Vision Transformers (ViT) to efficiently process multiscale features by decoupling intra-scale interaction and cross-scale fusion. RT-DETR is highly adaptable, supporting flexible adjustment of inference speed using different decoder layers without retraining. The model excels on accelerated backends like CUDA with TensorRT, outperforming many other real-time object detectors. |
||||
|
||||
 |
||||
**Overview of Baidu's RT-DETR.** The RT-DETR model architecture diagram shows the last three stages of the backbone {S3, S4, S5} as the input to the encoder. The efficient hybrid encoder transforms multiscale features into a sequence of image features through intrascale feature interaction (AIFI) and cross-scale feature-fusion module (CCFM). The IoU-aware query selection is employed to select a fixed number of image features to serve as initial object queries for the decoder. Finally, the decoder with auxiliary prediction heads iteratively optimizes object queries to generate boxes and confidence scores ([source](https://arxiv.org/pdf/2304.08069.pdf)). |
||||
|
||||
### Key Features |
||||
|
||||
- **Efficient Hybrid Encoder:** Baidu's RT-DETR uses an efficient hybrid encoder that processes multiscale features by decoupling intra-scale interaction and cross-scale fusion. This unique Vision Transformers-based design reduces computational costs and allows for real-time object detection. |
||||
- **IoU-aware Query Selection:** Baidu's RT-DETR improves object query initialization by utilizing IoU-aware query selection. This allows the model to focus on the most relevant objects in the scene, enhancing the detection accuracy. |
||||
- **Adaptable Inference Speed:** Baidu's RT-DETR supports flexible adjustments of inference speed by using different decoder layers without the need for retraining. This adaptability facilitates practical application in various real-time object detection scenarios. |
||||
|
||||
## Pre-trained Models |
||||
|
||||
The Ultralytics Python API provides pre-trained PaddlePaddle RT-DETR models with different scales: |
||||
|
||||
- RT-DETR-L: 53.0% AP on COCO val2017, 114 FPS on T4 GPU |
||||
- RT-DETR-X: 54.8% AP on COCO val2017, 74 FPS on T4 GPU |
||||
|
||||
## Usage |
||||
|
||||
You can use RT-DETR for object detection tasks using the `ultralytics` pip package. The following is a sample code snippet showing how to use RT-DETR models for training and inference: |
||||
|
||||
!!! example "" |
||||
|
||||
This example provides simple inference code for RT-DETR. For more options including handling inference results see [Predict](../modes/predict.md) mode. For using RT-DETR with additional modes see [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md). |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import RTDETR |
||||
|
||||
# Load a COCO-pretrained RT-DETR-l model |
||||
model = RTDETR('rtdetr-l.pt') |
||||
|
||||
# Display model information (optional) |
||||
model.info() |
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs |
||||
results = model.train(data='coco8.yaml', epochs=100, imgsz=640) |
||||
|
||||
# Run inference with the RT-DETR-l model on the 'bus.jpg' image |
||||
results = model('path/to/bus.jpg') |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Load a COCO-pretrained RT-DETR-l model and train it on the COCO8 example dataset for 100 epochs |
||||
yolo train model=rtdetr-l.pt data=coco8.yaml epochs=100 imgsz=640 |
||||
|
||||
# Load a COCO-pretrained RT-DETR-l model and run inference on the 'bus.jpg' image |
||||
yolo predict model=rtdetr-l.pt source=path/to/bus.jpg |
||||
``` |
||||
|
||||
### Supported Tasks |
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | |
||||
|---------------------|---------------------|------------------| |
||||
| RT-DETR Large | `rtdetr-l.pt` | Object Detection | |
||||
| RT-DETR Extra-Large | `rtdetr-x.pt` | Object Detection | |
||||
|
||||
### Supported Modes |
||||
|
||||
| Mode | Supported | |
||||
|------------|--------------------| |
||||
| Inference | :heavy_check_mark: | |
||||
| Validation | :heavy_check_mark: | |
||||
| Training | :heavy_check_mark: | |
||||
|
||||
## Citations and Acknowledgements |
||||
|
||||
If you use Baidu's RT-DETR in your research or development work, please cite the [original paper](https://arxiv.org/abs/2304.08069): |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{lv2023detrs, |
||||
title={DETRs Beat YOLOs on Real-time Object Detection}, |
||||
author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu}, |
||||
year={2023}, |
||||
eprint={2304.08069}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
We would like to acknowledge Baidu and the [PaddlePaddle](https://github.com/PaddlePaddle/PaddleDetection) team for creating and maintaining this valuable resource for the computer vision community. Their contribution to the field with the development of the Vision Transformers-based real-time object detector, RT-DETR, is greatly appreciated. |
||||
|
||||
*Keywords: RT-DETR, Transformer, ViT, Vision Transformers, Baidu RT-DETR, PaddlePaddle, Paddle Paddle RT-DETR, real-time object detection, Vision Transformers-based object detection, pre-trained PaddlePaddle RT-DETR models, Baidu's RT-DETR usage, Ultralytics Python API* |
@ -1,36 +1,233 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore the cutting-edge Segment Anything Model (SAM) from Ultralytics that allows real-time image segmentation. Learn about its promptable segmentation, zero-shot performance, and how to use it. |
||||
keywords: Ultralytics, image segmentation, Segment Anything Model, SAM, SA-1B dataset, real-time performance, zero-shot transfer, object detection, image analysis, machine learning |
||||
--- |
||||
|
||||
# Vision Transformers |
||||
# Segment Anything Model (SAM) |
||||
|
||||
Vit models currently support Python environment: |
||||
Welcome to the frontier of image segmentation with the Segment Anything Model, or SAM. This revolutionary model has changed the game by introducing promptable image segmentation with real-time performance, setting new standards in the field. |
||||
|
||||
```python |
||||
from ultralytics.vit import SAM |
||||
## Introduction to SAM: The Segment Anything Model |
||||
|
||||
# from ultralytics.vit import MODEL_TYPe |
||||
The Segment Anything Model, or SAM, is a cutting-edge image segmentation model that allows for promptable segmentation, providing unparalleled versatility in image analysis tasks. SAM forms the heart of the Segment Anything initiative, a groundbreaking project that introduces a novel model, task, and dataset for image segmentation. |
||||
|
||||
model = SAM("sam_b.pt") |
||||
model.info() # display model information |
||||
model.predict(...) # train the model |
||||
``` |
||||
SAM's advanced design allows it to adapt to new image distributions and tasks without prior knowledge, a feature known as zero-shot transfer. Trained on the expansive [SA-1B dataset](https://ai.facebook.com/datasets/segment-anything/), which contains more than 1 billion masks spread over 11 million carefully curated images, SAM has displayed impressive zero-shot performance, surpassing previous fully supervised results in many cases. |
||||
|
||||
# Segment Anything |
||||
 |
||||
Example images with overlaid masks from our newly introduced dataset, SA-1B. SA-1B contains 11M diverse, high-resolution, licensed, and privacy protecting images and 1.1B high-quality segmentation masks. These masks were annotated fully automatically by SAM, and as verified by human ratings and numerous experiments, are of high quality and diversity. Images are grouped by number of masks per image for visualization (there are ∼100 masks per image on average). |
||||
|
||||
## About |
||||
## Key Features of the Segment Anything Model (SAM) |
||||
|
||||
## Supported Tasks |
||||
- **Promptable Segmentation Task:** SAM was designed with a promptable segmentation task in mind, allowing it to generate valid segmentation masks from any given prompt, such as spatial or text clues identifying an object. |
||||
- **Advanced Architecture:** The Segment Anything Model employs a powerful image encoder, a prompt encoder, and a lightweight mask decoder. This unique architecture enables flexible prompting, real-time mask computation, and ambiguity awareness in segmentation tasks. |
||||
- **The SA-1B Dataset:** Introduced by the Segment Anything project, the SA-1B dataset features over 1 billion masks on 11 million images. As the largest segmentation dataset to date, it provides SAM with a diverse and large-scale training data source. |
||||
- **Zero-Shot Performance:** SAM displays outstanding zero-shot performance across various segmentation tasks, making it a ready-to-use tool for diverse applications with minimal need for prompt engineering. |
||||
|
||||
For an in-depth look at the Segment Anything Model and the SA-1B dataset, please visit the [Segment Anything website](https://segment-anything.com) and check out the research paper [Segment Anything](https://arxiv.org/abs/2304.02643). |
||||
|
||||
## How to Use SAM: Versatility and Power in Image Segmentation |
||||
|
||||
The Segment Anything Model can be employed for a multitude of downstream tasks that go beyond its training data. This includes edge detection, object proposal generation, instance segmentation, and preliminary text-to-mask prediction. With prompt engineering, SAM can swiftly adapt to new tasks and data distributions in a zero-shot manner, establishing it as a versatile and potent tool for all your image segmentation needs. |
||||
|
||||
### SAM prediction example |
||||
|
||||
!!! example "Segment with prompts" |
||||
|
||||
Segment image with given prompts. |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import SAM |
||||
|
||||
# Load a model |
||||
model = SAM('sam_b.pt') |
||||
|
||||
# Display model information (optional) |
||||
model.info() |
||||
|
||||
# Run inference with bboxes prompt |
||||
model('ultralytics/assets/zidane.jpg', bboxes=[439, 437, 524, 709]) |
||||
|
||||
# Run inference with points prompt |
||||
model.predict('ultralytics/assets/zidane.jpg', points=[900, 370], labels=[1]) |
||||
``` |
||||
|
||||
!!! example "Segment everything" |
||||
|
||||
Segment the whole image. |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import SAM |
||||
|
||||
# Load a model |
||||
model = SAM('sam_b.pt') |
||||
|
||||
# Display model information (optional) |
||||
model.info() |
||||
|
||||
# Run inference |
||||
model('path/to/image.jpg') |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Run inference with a SAM model |
||||
yolo predict model=sam_b.pt source=path/to/image.jpg |
||||
``` |
||||
|
||||
- The logic here is to segment the whole image if you don't pass any prompts(bboxes/points/masks). |
||||
|
||||
!!! example "SAMPredictor example" |
||||
|
||||
This way you can set image once and run prompts inference multiple times without running image encoder multiple times. |
||||
|
||||
=== "Prompt inference" |
||||
|
||||
```python |
||||
from ultralytics.models.sam import Predictor as SAMPredictor |
||||
|
||||
# Create SAMPredictor |
||||
overrides = dict(conf=0.25, task='segment', mode='predict', imgsz=1024, model="mobile_sam.pt") |
||||
predictor = SAMPredictor(overrides=overrides) |
||||
|
||||
# Set image |
||||
predictor.set_image("ultralytics/assets/zidane.jpg") # set with image file |
||||
predictor.set_image(cv2.imread("ultralytics/assets/zidane.jpg")) # set with np.ndarray |
||||
results = predictor(bboxes=[439, 437, 524, 709]) |
||||
results = predictor(points=[900, 370], labels=[1]) |
||||
|
||||
# Reset image |
||||
predictor.reset_image() |
||||
``` |
||||
|
||||
Segment everything with additional args. |
||||
|
||||
=== "Segment everything" |
||||
|
||||
```python |
||||
from ultralytics.models.sam import Predictor as SAMPredictor |
||||
|
||||
# Create SAMPredictor |
||||
overrides = dict(conf=0.25, task='segment', mode='predict', imgsz=1024, model="mobile_sam.pt") |
||||
predictor = SAMPredictor(overrides=overrides) |
||||
|
||||
# Segment with additional args |
||||
results = predictor(source="ultralytics/assets/zidane.jpg", crop_n_layers=1, points_stride=64) |
||||
``` |
||||
|
||||
- More additional args for `Segment everything` see [`Predictor/generate` Reference](../reference/models/sam/predict.md). |
||||
|
||||
## Available Models and Supported Tasks |
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | |
||||
|------------|---------------------|-----------------------| |
||||
| sam base | `sam_b.pt` | Instance Segmentation | |
||||
| sam large | `sam_l.pt` | Instance Segmentation | |
||||
| SAM base | `sam_b.pt` | Instance Segmentation | |
||||
| SAM large | `sam_l.pt` | Instance Segmentation | |
||||
|
||||
## Supported Modes |
||||
## Operating Modes |
||||
|
||||
| Mode | Supported | |
||||
|------------|--------------------| |
||||
| Inference | :heavy_check_mark: | |
||||
| Validation | :x: | |
||||
| Training | :x: | |
||||
|
||||
## SAM comparison vs YOLOv8 |
||||
|
||||
Here we compare Meta's smallest SAM model, SAM-b, with Ultralytics smallest segmentation model, [YOLOv8n-seg](../tasks/segment.md): |
||||
|
||||
| Model | Size | Parameters | Speed (CPU) | |
||||
|------------------------------------------------|----------------------------|------------------------|----------------------------| |
||||
| Meta's SAM-b | 358 MB | 94.7 M | 51096 ms/im | |
||||
| [MobileSAM](mobile-sam.md) | 40.7 MB | 10.1 M | 46122 ms/im | |
||||
| [FastSAM-s](fast-sam.md) with YOLOv8 backbone | 23.7 MB | 11.8 M | 115 ms/im | |
||||
| Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7 MB** (53.4x smaller) | **3.4 M** (27.9x less) | **59 ms/im** (866x faster) | |
||||
|
||||
This comparison shows the order-of-magnitude differences in the model sizes and speeds between models. Whereas SAM presents unique capabilities for automatic segmenting, it is not a direct competitor to YOLOv8 segment models, which are smaller, faster and more efficient. |
||||
|
||||
Tests run on a 2023 Apple M2 Macbook with 16GB of RAM. To reproduce this test: |
||||
|
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
```python |
||||
from ultralytics import FastSAM, SAM, YOLO |
||||
|
||||
# Profile SAM-b |
||||
model = SAM('sam_b.pt') |
||||
model.info() |
||||
model('ultralytics/assets') |
||||
|
||||
# Profile MobileSAM |
||||
model = SAM('mobile_sam.pt') |
||||
model.info() |
||||
model('ultralytics/assets') |
||||
|
||||
# Profile FastSAM-s |
||||
model = FastSAM('FastSAM-s.pt') |
||||
model.info() |
||||
model('ultralytics/assets') |
||||
|
||||
# Profile YOLOv8n-seg |
||||
model = YOLO('yolov8n-seg.pt') |
||||
model.info() |
||||
model('ultralytics/assets') |
||||
``` |
||||
|
||||
## Auto-Annotation: A Quick Path to Segmentation Datasets |
||||
|
||||
Auto-annotation is a key feature of SAM, allowing users to generate a [segmentation dataset](https://docs.ultralytics.com/datasets/segment) using a pre-trained detection model. This feature enables rapid and accurate annotation of a large number of images, bypassing the need for time-consuming manual labeling. |
||||
|
||||
### Generate Your Segmentation Dataset Using a Detection Model |
||||
|
||||
To auto-annotate your dataset with the Ultralytics framework, use the `auto_annotate` function as shown below: |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
```python |
||||
from ultralytics.data.annotator import auto_annotate |
||||
|
||||
auto_annotate(data="path/to/images", det_model="yolov8x.pt", sam_model='sam_b.pt') |
||||
``` |
||||
|
||||
| Argument | Type | Description | Default | |
||||
|------------|---------------------|---------------------------------------------------------------------------------------------------------|--------------| |
||||
| data | str | Path to a folder containing images to be annotated. | | |
||||
| det_model | str, optional | Pre-trained YOLO detection model. Defaults to 'yolov8x.pt'. | 'yolov8x.pt' | |
||||
| sam_model | str, optional | Pre-trained SAM segmentation model. Defaults to 'sam_b.pt'. | 'sam_b.pt' | |
||||
| device | str, optional | Device to run the models on. Defaults to an empty string (CPU or GPU, if available). | | |
||||
| output_dir | str, None, optional | Directory to save the annotated results. Defaults to a 'labels' folder in the same directory as 'data'. | None | |
||||
|
||||
The `auto_annotate` function takes the path to your images, with optional arguments for specifying the pre-trained detection and SAM segmentation models, the device to run the models on, and the output directory for saving the annotated results. |
||||
|
||||
Auto-annotation with pre-trained models can dramatically cut down the time and effort required for creating high-quality segmentation datasets. This feature is especially beneficial for researchers and developers dealing with large image collections, as it allows them to focus on model development and evaluation rather than manual annotation. |
||||
|
||||
## Citations and Acknowledgements |
||||
|
||||
If you find SAM useful in your research or development work, please consider citing our paper: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{kirillov2023segment, |
||||
title={Segment Anything}, |
||||
author={Alexander Kirillov and Eric Mintun and Nikhila Ravi and Hanzi Mao and Chloe Rolland and Laura Gustafson and Tete Xiao and Spencer Whitehead and Alexander C. Berg and Wan-Yen Lo and Piotr Dollár and Ross Girshick}, |
||||
year={2023}, |
||||
eprint={2304.02643}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
We would like to express our gratitude to Meta AI for creating and maintaining this valuable resource for the computer vision community. |
||||
|
||||
*keywords: Segment Anything, Segment Anything Model, SAM, Meta SAM, image segmentation, promptable segmentation, zero-shot performance, SA-1B dataset, advanced architecture, auto-annotation, Ultralytics, pre-trained models, SAM base, SAM large, instance segmentation, computer vision, AI, artificial intelligence, machine learning, data annotation, segmentation masks, detection model, YOLO detection model, bibtex, Meta AI.* |
||||
|
@ -0,0 +1,127 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore detailed documentation of YOLO-NAS, a superior object detection model. Learn about its features, pre-trained models, usage with Ultralytics Python API, and more. |
||||
keywords: YOLO-NAS, Deci AI, object detection, deep learning, neural architecture search, Ultralytics Python API, YOLO model, pre-trained models, quantization, optimization, COCO, Objects365, Roboflow 100 |
||||
--- |
||||
|
||||
# YOLO-NAS |
||||
|
||||
## Overview |
||||
|
||||
Developed by Deci AI, YOLO-NAS is a groundbreaking object detection foundational model. It is the product of advanced Neural Architecture Search technology, meticulously designed to address the limitations of previous YOLO models. With significant improvements in quantization support and accuracy-latency trade-offs, YOLO-NAS represents a major leap in object detection. |
||||
|
||||
 |
||||
**Overview of YOLO-NAS.** YOLO-NAS employs quantization-aware blocks and selective quantization for optimal performance. The model, when converted to its INT8 quantized version, experiences a minimal precision drop, a significant improvement over other models. These advancements culminate in a superior architecture with unprecedented object detection capabilities and outstanding performance. |
||||
|
||||
### Key Features |
||||
|
||||
- **Quantization-Friendly Basic Block:** YOLO-NAS introduces a new basic block that is friendly to quantization, addressing one of the significant limitations of previous YOLO models. |
||||
- **Sophisticated Training and Quantization:** YOLO-NAS leverages advanced training schemes and post-training quantization to enhance performance. |
||||
- **AutoNAC Optimization and Pre-training:** YOLO-NAS utilizes AutoNAC optimization and is pre-trained on prominent datasets such as COCO, Objects365, and Roboflow 100. This pre-training makes it extremely suitable for downstream object detection tasks in production environments. |
||||
|
||||
## Pre-trained Models |
||||
|
||||
Experience the power of next-generation object detection with the pre-trained YOLO-NAS models provided by Ultralytics. These models are designed to deliver top-notch performance in terms of both speed and accuracy. Choose from a variety of options tailored to your specific needs: |
||||
|
||||
| Model | mAP | Latency (ms) | |
||||
|------------------|-------|--------------| |
||||
| YOLO-NAS S | 47.5 | 3.21 | |
||||
| YOLO-NAS M | 51.55 | 5.85 | |
||||
| YOLO-NAS L | 52.22 | 7.87 | |
||||
| YOLO-NAS S INT-8 | 47.03 | 2.36 | |
||||
| YOLO-NAS M INT-8 | 51.0 | 3.78 | |
||||
| YOLO-NAS L INT-8 | 52.1 | 4.78 | |
||||
|
||||
Each model variant is designed to offer a balance between Mean Average Precision (mAP) and latency, helping you optimize your object detection tasks for both performance and speed. |
||||
|
||||
## Usage |
||||
|
||||
Ultralytics has made YOLO-NAS models easy to integrate into your Python applications via our `ultralytics` python package. The package provides a user-friendly Python API to streamline the process. |
||||
|
||||
The following examples show how to use YOLO-NAS models with the `ultralytics` package for inference and validation: |
||||
|
||||
### Inference and Validation Examples |
||||
|
||||
In this example we validate YOLO-NAS-s on the COCO8 dataset. |
||||
|
||||
!!! example "" |
||||
|
||||
This example provides simple inference and validation code for YOLO-NAS. For handling inference results see [Predict](../modes/predict.md) mode. For using YOLO-NAS with additional modes see [Val](../modes/val.md) and [Export](../modes/export.md). YOLO-NAS on the `ultralytics` package does not support training. |
||||
|
||||
=== "Python" |
||||
|
||||
PyTorch pretrained `*.pt` models files can be passed to the `NAS()` class to create a model instance in python: |
||||
|
||||
```python |
||||
from ultralytics import NAS |
||||
|
||||
# Load a COCO-pretrained YOLO-NAS-s model |
||||
model = NAS('yolo_nas_s.pt') |
||||
|
||||
# Display model information (optional) |
||||
model.info() |
||||
|
||||
# Validate the model on the COCO8 example dataset |
||||
results = model.val(data='coco8.yaml') |
||||
|
||||
# Run inference with the YOLO-NAS-s model on the 'bus.jpg' image |
||||
results = model('path/to/bus.jpg') |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
CLI commands are available to directly run the models: |
||||
|
||||
```bash |
||||
# Load a COCO-pretrained YOLO-NAS-s model and validate it's performance on the COCO8 example dataset |
||||
yolo val model=yolo_nas_s.pt data=coco8.yaml |
||||
|
||||
# Load a COCO-pretrained YOLO-NAS-s model and run inference on the 'bus.jpg' image |
||||
yolo predict model=yolo_nas_s.pt source=path/to/bus.jpg |
||||
``` |
||||
|
||||
### Supported Tasks |
||||
|
||||
The YOLO-NAS models are primarily designed for object detection tasks. You can download the pre-trained weights for each variant of the model as follows: |
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | |
||||
|------------|-----------------------------------------------------------------------------------------------|------------------| |
||||
| YOLO-NAS-s | [yolo_nas_s.pt](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolo_nas_s.pt) | Object Detection | |
||||
| YOLO-NAS-m | [yolo_nas_m.pt](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolo_nas_m.pt) | Object Detection | |
||||
| YOLO-NAS-l | [yolo_nas_l.pt](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolo_nas_l.pt) | Object Detection | |
||||
|
||||
### Supported Modes |
||||
|
||||
The YOLO-NAS models support both inference and validation modes, allowing you to predict and validate results with ease. Training mode, however, is currently not supported. |
||||
|
||||
| Mode | Supported | |
||||
|------------|--------------------| |
||||
| Inference | :heavy_check_mark: | |
||||
| Validation | :heavy_check_mark: | |
||||
| Training | :x: | |
||||
|
||||
Harness the power of the YOLO-NAS models to drive your object detection tasks to new heights of performance and speed. |
||||
|
||||
## Citations and Acknowledgements |
||||
|
||||
If you employ YOLO-NAS in your research or development work, please cite SuperGradients: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{supergradients, |
||||
doi = {10.5281/ZENODO.7789328}, |
||||
url = {https://zenodo.org/record/7789328}, |
||||
author = {Aharon, Shay and {Louis-Dupont} and {Ofri Masad} and Yurkova, Kate and {Lotem Fridman} and {Lkdci} and Khvedchenya, Eugene and Rubin, Ran and Bagrov, Natan and Tymchenko, Borys and Keren, Tomer and Zhilko, Alexander and {Eran-Deci}}, |
||||
title = {Super-Gradients}, |
||||
publisher = {GitHub}, |
||||
journal = {GitHub repository}, |
||||
year = {2021}, |
||||
} |
||||
``` |
||||
|
||||
We express our gratitude to Deci AI's [SuperGradients](https://github.com/Deci-AI/super-gradients/) team for their efforts in creating and maintaining this valuable resource for the computer vision community. We believe YOLO-NAS, with its innovative architecture and superior object detection capabilities, will become a critical tool for developers and researchers alike. |
||||
|
||||
*Keywords: YOLO-NAS, Deci AI, object detection, deep learning, neural architecture search, Ultralytics Python API, YOLO model, SuperGradients, pre-trained models, quantization-friendly basic block, advanced training schemes, post-training quantization, AutoNAC optimization, COCO, Objects365, Roboflow 100* |
@ -0,0 +1,71 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore our detailed guide on YOLOv4, a state-of-the-art real-time object detector. Understand its architectural highlights, innovative features, and application examples. |
||||
keywords: ultralytics, YOLOv4, object detection, neural network, real-time detection, object detector, machine learning |
||||
--- |
||||
|
||||
# YOLOv4: High-Speed and Precise Object Detection |
||||
|
||||
Welcome to the Ultralytics documentation page for YOLOv4, a state-of-the-art, real-time object detector launched in 2020 by Alexey Bochkovskiy at [https://github.com/AlexeyAB/darknet](https://github.com/AlexeyAB/darknet). YOLOv4 is designed to provide the optimal balance between speed and accuracy, making it an excellent choice for many applications. |
||||
|
||||
 |
||||
**YOLOv4 architecture diagram**. Showcasing the intricate network design of YOLOv4, including the backbone, neck, and head components, and their interconnected layers for optimal real-time object detection. |
||||
|
||||
## Introduction |
||||
|
||||
YOLOv4 stands for You Only Look Once version 4. It is a real-time object detection model developed to address the limitations of previous YOLO versions like [YOLOv3](./yolov3.md) and other object detection models. Unlike other convolutional neural network (CNN) based object detectors, YOLOv4 is not only applicable for recommendation systems but also for standalone process management and human input reduction. Its operation on conventional graphics processing units (GPUs) allows for mass usage at an affordable price, and it is designed to work in real-time on a conventional GPU while requiring only one such GPU for training. |
||||
|
||||
## Architecture |
||||
|
||||
YOLOv4 makes use of several innovative features that work together to optimize its performance. These include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT), Mish-activation, Mosaic data augmentation, DropBlock regularization, and CIoU loss. These features are combined to achieve state-of-the-art results. |
||||
|
||||
A typical object detector is composed of several parts including the input, the backbone, the neck, and the head. The backbone of YOLOv4 is pre-trained on ImageNet and is used to predict classes and bounding boxes of objects. The backbone could be from several models including VGG, ResNet, ResNeXt, or DenseNet. The neck part of the detector is used to collect feature maps from different stages and usually includes several bottom-up paths and several top-down paths. The head part is what is used to make the final object detections and classifications. |
||||
|
||||
## Bag of Freebies |
||||
|
||||
YOLOv4 also makes use of methods known as "bag of freebies," which are techniques that improve the accuracy of the model during training without increasing the cost of inference. Data augmentation is a common bag of freebies technique used in object detection, which increases the variability of the input images to improve the robustness of the model. Some examples of data augmentation include photometric distortions (adjusting the brightness, contrast, hue, saturation, and noise of an image) and geometric distortions (adding random scaling, cropping, flipping, and rotating). These techniques help the model to generalize better to different types of images. |
||||
|
||||
## Features and Performance |
||||
|
||||
YOLOv4 is designed for optimal speed and accuracy in object detection. The architecture of YOLOv4 includes CSPDarknet53 as the backbone, PANet as the neck, and YOLOv3 as the detection head. This design allows YOLOv4 to perform object detection at an impressive speed, making it suitable for real-time applications. YOLOv4 also excels in accuracy, achieving state-of-the-art results in object detection benchmarks. |
||||
|
||||
## Usage Examples |
||||
|
||||
As of the time of writing, Ultralytics does not currently support YOLOv4 models. Therefore, any users interested in using YOLOv4 will need to refer directly to the YOLOv4 GitHub repository for installation and usage instructions. |
||||
|
||||
Here is a brief overview of the typical steps you might take to use YOLOv4: |
||||
|
||||
1. Visit the YOLOv4 GitHub repository: [https://github.com/AlexeyAB/darknet](https://github.com/AlexeyAB/darknet). |
||||
|
||||
2. Follow the instructions provided in the README file for installation. This typically involves cloning the repository, installing necessary dependencies, and setting up any necessary environment variables. |
||||
|
||||
3. Once installation is complete, you can train and use the model as per the usage instructions provided in the repository. This usually involves preparing your dataset, configuring the model parameters, training the model, and then using the trained model to perform object detection. |
||||
|
||||
Please note that the specific steps may vary depending on your specific use case and the current state of the YOLOv4 repository. Therefore, it is strongly recommended to refer directly to the instructions provided in the YOLOv4 GitHub repository. |
||||
|
||||
We regret any inconvenience this may cause and will strive to update this document with usage examples for Ultralytics once support for YOLOv4 is implemented. |
||||
|
||||
## Conclusion |
||||
|
||||
YOLOv4 is a powerful and efficient object detection model that strikes a balance between speed and accuracy. Its use of unique features and bag of freebies techniques during training allows it to perform excellently in real-time object detection tasks. YOLOv4 can be trained and used by anyone with a conventional GPU, making it accessible and practical for a wide range of applications. |
||||
|
||||
## Citations and Acknowledgements |
||||
|
||||
We would like to acknowledge the YOLOv4 authors for their significant contributions in the field of real-time object detection: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{bochkovskiy2020yolov4, |
||||
title={YOLOv4: Optimal Speed and Accuracy of Object Detection}, |
||||
author={Alexey Bochkovskiy and Chien-Yao Wang and Hong-Yuan Mark Liao}, |
||||
year={2020}, |
||||
eprint={2004.10934}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
The original YOLOv4 paper can be found on [arXiv](https://arxiv.org/pdf/2004.10934.pdf). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/AlexeyAB/darknet). We appreciate their efforts in advancing the field and making their work accessible to the broader community. |
@ -0,0 +1,114 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore Meituan YOLOv6, a state-of-the-art object detection model striking a balance between speed and accuracy. Dive into features, pre-trained models, and Python usage. |
||||
keywords: Meituan YOLOv6, object detection, Ultralytics, YOLOv6 docs, Bi-directional Concatenation, Anchor-Aided Training, pretrained models, real-time applications |
||||
--- |
||||
|
||||
# Meituan YOLOv6 |
||||
|
||||
## Overview |
||||
|
||||
[Meituan](https://about.meituan.com/) YOLOv6 is a cutting-edge object detector that offers remarkable balance between speed and accuracy, making it a popular choice for real-time applications. This model introduces several notable enhancements on its architecture and training scheme, including the implementation of a Bi-directional Concatenation (BiC) module, an anchor-aided training (AAT) strategy, and an improved backbone and neck design for state-of-the-art accuracy on the COCO dataset. |
||||
|
||||
 |
||||
 |
||||
**Overview of YOLOv6.** Model architecture diagram showing the redesigned network components and training strategies that have led to significant performance improvements. (a) The neck of YOLOv6 (N and S are shown). Note for M/L, RepBlocks is replaced with CSPStackRep. (b) The |
||||
structure of a BiC module. (c) A SimCSPSPPF block. ([source](https://arxiv.org/pdf/2301.05586.pdf)). |
||||
|
||||
### Key Features |
||||
|
||||
- **Bidirectional Concatenation (BiC) Module:** YOLOv6 introduces a BiC module in the neck of the detector, enhancing localization signals and delivering performance gains with negligible speed degradation. |
||||
- **Anchor-Aided Training (AAT) Strategy:** This model proposes AAT to enjoy the benefits of both anchor-based and anchor-free paradigms without compromising inference efficiency. |
||||
- **Enhanced Backbone and Neck Design:** By deepening YOLOv6 to include another stage in the backbone and neck, this model achieves state-of-the-art performance on the COCO dataset at high-resolution input. |
||||
- **Self-Distillation Strategy:** A new self-distillation strategy is implemented to boost the performance of smaller models of YOLOv6, enhancing the auxiliary regression branch during training and removing it at inference to avoid a marked speed decline. |
||||
|
||||
## Pre-trained Models |
||||
|
||||
YOLOv6 provides various pre-trained models with different scales: |
||||
|
||||
- YOLOv6-N: 37.5% AP on COCO val2017 at 1187 FPS with NVIDIA Tesla T4 GPU. |
||||
- YOLOv6-S: 45.0% AP at 484 FPS. |
||||
- YOLOv6-M: 50.0% AP at 226 FPS. |
||||
- YOLOv6-L: 52.8% AP at 116 FPS. |
||||
- YOLOv6-L6: State-of-the-art accuracy in real-time. |
||||
|
||||
YOLOv6 also provides quantized models for different precisions and models optimized for mobile platforms. |
||||
|
||||
## Usage |
||||
|
||||
You can use YOLOv6 for object detection tasks using the Ultralytics pip package. The following is a sample code snippet showing how to use YOLOv6 models for training: |
||||
|
||||
!!! example "" |
||||
|
||||
This example provides simple training code for YOLOv6. For more options including training settings see [Train](../modes/train.md) mode. For using YOLOv6 with additional modes see [Predict](../modes/predict.md), [Val](../modes/val.md) and [Export](../modes/export.md). |
||||
|
||||
=== "Python" |
||||
|
||||
PyTorch pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLO()` class to create a model instance in python: |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Build a YOLOv6n model from scratch |
||||
model = YOLO('yolov6n.yaml') |
||||
|
||||
# Display model information (optional) |
||||
model.info() |
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs |
||||
results = model.train(data='coco8.yaml', epochs=100, imgsz=640) |
||||
|
||||
# Run inference with the YOLOv6n model on the 'bus.jpg' image |
||||
results = model('path/to/bus.jpg') |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
CLI commands are available to directly run the models: |
||||
|
||||
```bash |
||||
# Build a YOLOv6n model from scratch and train it on the COCO8 example dataset for 100 epochs |
||||
yolo train model=yolov6n.yaml data=coco8.yaml epochs=100 imgsz=640 |
||||
|
||||
# Build a YOLOv6n model from scratch and run inference on the 'bus.jpg' image |
||||
yolo predict model=yolov6n.yaml source=path/to/bus.jpg |
||||
``` |
||||
|
||||
### Supported Tasks |
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | |
||||
|------------|---------------------|------------------| |
||||
| YOLOv6-N | `yolov6-n.pt` | Object Detection | |
||||
| YOLOv6-S | `yolov6-s.pt` | Object Detection | |
||||
| YOLOv6-M | `yolov6-m.pt` | Object Detection | |
||||
| YOLOv6-L | `yolov6-l.pt` | Object Detection | |
||||
| YOLOv6-L6 | `yolov6-l6.pt` | Object Detection | |
||||
|
||||
## Supported Modes |
||||
|
||||
| Mode | Supported | |
||||
|------------|--------------------| |
||||
| Inference | :heavy_check_mark: | |
||||
| Validation | :heavy_check_mark: | |
||||
| Training | :heavy_check_mark: | |
||||
|
||||
## Citations and Acknowledgements |
||||
|
||||
We would like to acknowledge the authors for their significant contributions in the field of real-time object detection: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@misc{li2023yolov6, |
||||
title={YOLOv6 v3.0: A Full-Scale Reloading}, |
||||
author={Chuyi Li and Lulu Li and Yifei Geng and Hongliang Jiang and Meng Cheng and Bo Zhang and Zaidan Ke and Xiaoming Xu and Xiangxiang Chu}, |
||||
year={2023}, |
||||
eprint={2301.05586}, |
||||
archivePrefix={arXiv}, |
||||
primaryClass={cs.CV} |
||||
} |
||||
``` |
||||
|
||||
The original YOLOv6 paper can be found on [arXiv](https://arxiv.org/abs/2301.05586). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/meituan/YOLOv6). We appreciate their efforts in advancing the field and making their work accessible to the broader community. |
@ -0,0 +1,65 @@ |
||||
--- |
||||
comments: true |
||||
description: Explore the YOLOv7, a real-time object detector. Understand its superior speed, impressive accuracy, and unique trainable bag-of-freebies optimization focus. |
||||
keywords: YOLOv7, real-time object detector, state-of-the-art, Ultralytics, MS COCO dataset, model re-parameterization, dynamic label assignment, extended scaling, compound scaling |
||||
--- |
||||
|
||||
# YOLOv7: Trainable Bag-of-Freebies |
||||
|
||||
YOLOv7 is a state-of-the-art real-time object detector that surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS. It has the highest accuracy (56.8% AP) among all known real-time object detectors with 30 FPS or higher on GPU V100. Moreover, YOLOv7 outperforms other object detectors such as YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, and many others in speed and accuracy. The model is trained on the MS COCO dataset from scratch without using any other datasets or pre-trained weights. Source code for YOLOv7 is available on GitHub. |
||||
|
||||
 |
||||
**Comparison of state-of-the-art object detectors.** From the results in Table 2 we know that the proposed method has the best speed-accuracy trade-off comprehensively. If we compare YOLOv7-tiny-SiLU with YOLOv5-N (r6.1), our method is 127 fps faster and 10.7% more accurate on AP. In addition, YOLOv7 has 51.4% AP at frame rate of 161 fps, while PPYOLOE-L with the same AP has only 78 fps frame rate. In terms of parameter usage, YOLOv7 is 41% less than PPYOLOE-L. If we compare YOLOv7-X with 114 fps inference speed to YOLOv5-L (r6.1) with 99 fps inference speed, YOLOv7-X can improve AP by 3.9%. If YOLOv7-X is compared with YOLOv5-X (r6.1) of similar scale, the inference speed of YOLOv7-X is 31 fps faster. In addition, in terms the amount of parameters and computation, YOLOv7-X reduces 22% of parameters and 8% of computation compared to YOLOv5-X (r6.1), but improves AP by 2.2% ([Source](https://arxiv.org/pdf/2207.02696.pdf)). |
||||
|
||||
## Overview |
||||
|
||||
Real-time object detection is an important component in many computer vision systems, including multi-object tracking, autonomous driving, robotics, and medical image analysis. In recent years, real-time object detection development has focused on designing efficient architectures and improving the inference speed of various CPUs, GPUs, and neural processing units (NPUs). YOLOv7 supports both mobile GPU and GPU devices, from the edge to the cloud. |
||||
|
||||
Unlike traditional real-time object detectors that focus on architecture optimization, YOLOv7 introduces a focus on the optimization of the training process. This includes modules and optimization methods designed to improve the accuracy of object detection without increasing the inference cost, a concept known as the "trainable bag-of-freebies". |
||||
|
||||
## Key Features |
||||
|
||||
YOLOv7 introduces several key features: |
||||
|
||||
1. **Model Re-parameterization**: YOLOv7 proposes a planned re-parameterized model, which is a strategy applicable to layers in different networks with the concept of gradient propagation path. |
||||
|
||||
2. **Dynamic Label Assignment**: The training of the model with multiple output layers presents a new issue: "How to assign dynamic targets for the outputs of different branches?" To solve this problem, YOLOv7 introduces a new label assignment method called coarse-to-fine lead guided label assignment. |
||||
|
||||
3. **Extended and Compound Scaling**: YOLOv7 proposes "extend" and "compound scaling" methods for the real-time object detector that can effectively utilize parameters and computation. |
||||
|
||||
4. **Efficiency**: The method proposed by YOLOv7 can effectively reduce about 40% parameters and 50% computation of state-of-the-art real-time object detector, and has faster inference speed and higher detection accuracy. |
||||
|
||||
## Usage Examples |
||||
|
||||
As of the time of writing, Ultralytics does not currently support YOLOv7 models. Therefore, any users interested in using YOLOv7 will need to refer directly to the YOLOv7 GitHub repository for installation and usage instructions. |
||||
|
||||
Here is a brief overview of the typical steps you might take to use YOLOv7: |
||||
|
||||
1. Visit the YOLOv7 GitHub repository: [https://github.com/WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7). |
||||
|
||||
2. Follow the instructions provided in the README file for installation. This typically involves cloning the repository, installing necessary dependencies, and setting up any necessary environment variables. |
||||
|
||||
3. Once installation is complete, you can train and use the model as per the usage instructions provided in the repository. This usually involves preparing your dataset, configuring the model parameters, training the model, and then using the trained model to perform object detection. |
||||
|
||||
Please note that the specific steps may vary depending on your specific use case and the current state of the YOLOv7 repository. Therefore, it is strongly recommended to refer directly to the instructions provided in the YOLOv7 GitHub repository. |
||||
|
||||
We regret any inconvenience this may cause and will strive to update this document with usage examples for Ultralytics once support for YOLOv7 is implemented. |
||||
|
||||
## Citations and Acknowledgements |
||||
|
||||
We would like to acknowledge the YOLOv7 authors for their significant contributions in the field of real-time object detection: |
||||
|
||||
!!! note "" |
||||
|
||||
=== "BibTeX" |
||||
|
||||
```bibtex |
||||
@article{wang2022yolov7, |
||||
title={{YOLOv7}: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors}, |
||||
author={Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark}, |
||||
journal={arXiv preprint arXiv:2207.02696}, |
||||
year={2022} |
||||
} |
||||
``` |
||||
|
||||
The original YOLOv7 paper can be found on [arXiv](https://arxiv.org/pdf/2207.02696.pdf). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/WongKinYiu/yolov7). We appreciate their efforts in advancing the field and making their work accessible to the broader community. |
@ -1,100 +1,286 @@ |
||||
--- |
||||
comments: true |
||||
description: Learn how to use Ultralytics YOLO for object tracking in video streams. Guides to use different trackers and customise tracker configurations. |
||||
keywords: Ultralytics, YOLO, object tracking, video streams, BoT-SORT, ByteTrack, Python guide, CLI guide |
||||
--- |
||||
|
||||
<img width="1024" src="https://github.com/ultralytics/assets/raw/main/yolov8/banner-integrations.png"> |
||||
<img width="1024" src="https://user-images.githubusercontent.com/26833433/243418637-1d6250fd-1515-4c10-a844-a32818ae6d46.png"> |
||||
|
||||
Object tracking is a task that involves identifying the location and class of objects, then assigning a unique ID to |
||||
that detection in video streams. |
||||
Object tracking is a task that involves identifying the location and class of objects, then assigning a unique ID to that detection in video streams. |
||||
|
||||
The output of tracker is the same as detection with an added object ID. |
||||
|
||||
## Available Trackers |
||||
|
||||
The following tracking algorithms have been implemented and can be enabled by passing `tracker=tracker_type.yaml` |
||||
Ultralytics YOLO supports the following tracking algorithms. They can be enabled by passing the relevant YAML configuration file such as `tracker=tracker_type.yaml`: |
||||
|
||||
* [BoT-SORT](https://github.com/NirAharon/BoT-SORT) - `botsort.yaml` |
||||
* [ByteTrack](https://github.com/ifzhang/ByteTrack) - `bytetrack.yaml` |
||||
* [BoT-SORT](https://github.com/NirAharon/BoT-SORT) - Use `botsort.yaml` to enable this tracker. |
||||
* [ByteTrack](https://github.com/ifzhang/ByteTrack) - Use `bytetrack.yaml` to enable this tracker. |
||||
|
||||
The default tracker is BoT-SORT. |
||||
|
||||
## Tracking |
||||
|
||||
Use a trained YOLOv8n/YOLOv8n-seg model to run tracker on video streams. |
||||
To run the tracker on video streams, use a trained Detect, Segment or Pose model such as YOLOv8n, YOLOv8n-seg and YOLOv8n-pose. |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n.pt') # load an official detection model |
||||
model = YOLO('yolov8n-seg.pt') # load an official segmentation model |
||||
model = YOLO('path/to/best.pt') # load a custom model |
||||
|
||||
# Track with the model |
||||
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", show=True) |
||||
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", show=True, tracker="bytetrack.yaml") |
||||
|
||||
# Load an official or custom model |
||||
model = YOLO('yolov8n.pt') # Load an official Detect model |
||||
model = YOLO('yolov8n-seg.pt') # Load an official Segment model |
||||
model = YOLO('yolov8n-pose.pt') # Load an official Pose model |
||||
model = YOLO('path/to/best.pt') # Load a custom trained model |
||||
|
||||
# Perform tracking with the model |
||||
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", show=True) # Tracking with default tracker |
||||
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", show=True, tracker="bytetrack.yaml") # Tracking with ByteTrack tracker |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
|
||||
```bash |
||||
yolo track model=yolov8n.pt source="https://youtu.be/Zgi9g1ksQHc" # official detection model |
||||
yolo track model=yolov8n-seg.pt source=... # official segmentation model |
||||
yolo track model=path/to/best.pt source=... # custom model |
||||
yolo track model=path/to/best.pt tracker="bytetrack.yaml" # bytetrack tracker |
||||
# Perform tracking with various models using the command line interface |
||||
yolo track model=yolov8n.pt source="https://youtu.be/Zgi9g1ksQHc" # Official Detect model |
||||
yolo track model=yolov8n-seg.pt source="https://youtu.be/Zgi9g1ksQHc" # Official Segment model |
||||
yolo track model=yolov8n-pose.pt source="https://youtu.be/Zgi9g1ksQHc" # Official Pose model |
||||
yolo track model=path/to/best.pt source="https://youtu.be/Zgi9g1ksQHc" # Custom trained model |
||||
|
||||
# Track using ByteTrack tracker |
||||
yolo track model=path/to/best.pt tracker="bytetrack.yaml" |
||||
``` |
||||
|
||||
As in the above usage, we support both the detection and segmentation models for tracking and the only thing you need to |
||||
do is loading the corresponding (detection or segmentation) model. |
||||
As can be seen in the above usage, tracking is available for all Detect, Segment and Pose models run on videos or streaming sources. |
||||
|
||||
## Configuration |
||||
|
||||
### Tracking |
||||
### Tracking Arguments |
||||
|
||||
Tracking configuration shares properties with Predict mode, such as `conf`, `iou`, and `show`. For further configurations, refer to the [Predict](https://docs.ultralytics.com/modes/predict/) model page. |
||||
|
||||
Tracking shares the configuration with predict, i.e `conf`, `iou`, `show`. More configurations please refer |
||||
to [predict page](https://docs.ultralytics.com/modes/predict/). |
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
|
||||
# Configure the tracking parameters and run the tracker |
||||
model = YOLO('yolov8n.pt') |
||||
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", conf=0.3, iou=0.5, show=True) |
||||
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", conf=0.3, iou=0.5, show=True) |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
|
||||
```bash |
||||
# Configure tracking parameters and run the tracker using the command line interface |
||||
yolo track model=yolov8n.pt source="https://youtu.be/Zgi9g1ksQHc" conf=0.3, iou=0.5 show |
||||
|
||||
``` |
||||
|
||||
### Tracker |
||||
### Tracker Selection |
||||
|
||||
Ultralytics also allows you to use a modified tracker configuration file. To do this, simply make a copy of a tracker config file (for example, `custom_tracker.yaml`) from [ultralytics/cfg/trackers](https://github.com/ultralytics/ultralytics/tree/main/ultralytics/cfg/trackers) and modify any configurations (except the `tracker_type`) as per your needs. |
||||
|
||||
We also support using a modified tracker config file, just copy a config file i.e `custom_tracker.yaml` |
||||
from [ultralytics/tracker/cfg](https://github.com/ultralytics/ultralytics/tree/main/ultralytics/tracker/cfg) and modify |
||||
any configurations(expect the `tracker_type`) you need to. |
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
|
||||
# Load the model and run the tracker with a custom configuration file |
||||
model = YOLO('yolov8n.pt') |
||||
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", tracker='custom_tracker.yaml') |
||||
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", tracker='custom_tracker.yaml') |
||||
``` |
||||
|
||||
=== "CLI" |
||||
|
||||
|
||||
```bash |
||||
# Load the model and run the tracker with a custom configuration file using the command line interface |
||||
yolo track model=yolov8n.pt source="https://youtu.be/Zgi9g1ksQHc" tracker='custom_tracker.yaml' |
||||
``` |
||||
|
||||
Please refer to [ultralytics/tracker/cfg](https://github.com/ultralytics/ultralytics/tree/main/ultralytics/tracker/cfg) |
||||
page |
||||
For a comprehensive list of tracking arguments, refer to the [ultralytics/cfg/trackers](https://github.com/ultralytics/ultralytics/tree/main/ultralytics/cfg/trackers) page. |
||||
|
||||
## Python Examples |
||||
|
||||
### Persisting Tracks Loop |
||||
|
||||
Here is a Python script using OpenCV (`cv2`) and YOLOv8 to run object tracking on video frames. This script still assumes you have already installed the necessary packages (`opencv-python` and `ultralytics`). |
||||
|
||||
!!! example "Streaming for-loop with tracking" |
||||
|
||||
```python |
||||
import cv2 |
||||
from ultralytics import YOLO |
||||
|
||||
# Load the YOLOv8 model |
||||
model = YOLO('yolov8n.pt') |
||||
|
||||
# Open the video file |
||||
video_path = "path/to/video.mp4" |
||||
cap = cv2.VideoCapture(video_path) |
||||
|
||||
# Loop through the video frames |
||||
while cap.isOpened(): |
||||
# Read a frame from the video |
||||
success, frame = cap.read() |
||||
|
||||
if success: |
||||
# Run YOLOv8 tracking on the frame, persisting tracks between frames |
||||
results = model.track(frame, persist=True) |
||||
|
||||
# Visualize the results on the frame |
||||
annotated_frame = results[0].plot() |
||||
|
||||
# Display the annotated frame |
||||
cv2.imshow("YOLOv8 Tracking", annotated_frame) |
||||
|
||||
# Break the loop if 'q' is pressed |
||||
if cv2.waitKey(1) & 0xFF == ord("q"): |
||||
break |
||||
else: |
||||
# Break the loop if the end of the video is reached |
||||
break |
||||
|
||||
# Release the video capture object and close the display window |
||||
cap.release() |
||||
cv2.destroyAllWindows() |
||||
``` |
||||
|
||||
Please note the change from `model(frame)` to `model.track(frame)`, which enables object tracking instead of simple detection. This modified script will run the tracker on each frame of the video, visualize the results, and display them in a window. The loop can be exited by pressing 'q'. |
||||
|
||||
### Plotting Tracks Over Time |
||||
|
||||
Visualizing object tracks over consecutive frames can provide valuable insights into the movement patterns and behavior of detected objects within a video. With Ultralytics YOLOv8, plotting these tracks is a seamless and efficient process. |
||||
|
||||
In the following example, we demonstrate how to utilize YOLOv8's tracking capabilities to plot the movement of detected objects across multiple video frames. This script involves opening a video file, reading it frame by frame, and utilizing the YOLO model to identify and track various objects. By retaining the center points of the detected bounding boxes and connecting them, we can draw lines that represent the paths followed by the tracked objects. |
||||
|
||||
!!! example "Plotting tracks over multiple video frames" |
||||
|
||||
```python |
||||
from collections import defaultdict |
||||
|
||||
import cv2 |
||||
import numpy as np |
||||
|
||||
from ultralytics import YOLO |
||||
|
||||
# Load the YOLOv8 model |
||||
model = YOLO('yolov8n.pt') |
||||
|
||||
# Open the video file |
||||
video_path = "path/to/video.mp4" |
||||
cap = cv2.VideoCapture(video_path) |
||||
|
||||
# Store the track history |
||||
track_history = defaultdict(lambda: []) |
||||
|
||||
# Loop through the video frames |
||||
while cap.isOpened(): |
||||
# Read a frame from the video |
||||
success, frame = cap.read() |
||||
|
||||
if success: |
||||
# Run YOLOv8 tracking on the frame, persisting tracks between frames |
||||
results = model.track(frame, persist=True) |
||||
|
||||
# Get the boxes and track IDs |
||||
boxes = results[0].boxes.xywh.cpu() |
||||
track_ids = results[0].boxes.id.int().cpu().tolist() |
||||
|
||||
# Visualize the results on the frame |
||||
annotated_frame = results[0].plot() |
||||
|
||||
# Plot the tracks |
||||
for box, track_id in zip(boxes, track_ids): |
||||
x, y, w, h = box |
||||
track = track_history[track_id] |
||||
track.append((float(x), float(y))) # x, y center point |
||||
if len(track) > 30: # retain 90 tracks for 90 frames |
||||
track.pop(0) |
||||
|
||||
# Draw the tracking lines |
||||
points = np.hstack(track).astype(np.int32).reshape((-1, 1, 2)) |
||||
cv2.polylines(annotated_frame, [points], isClosed=False, color=(230, 230, 230), thickness=10) |
||||
|
||||
# Display the annotated frame |
||||
cv2.imshow("YOLOv8 Tracking", annotated_frame) |
||||
|
||||
# Break the loop if 'q' is pressed |
||||
if cv2.waitKey(1) & 0xFF == ord("q"): |
||||
break |
||||
else: |
||||
# Break the loop if the end of the video is reached |
||||
break |
||||
|
||||
# Release the video capture object and close the display window |
||||
cap.release() |
||||
cv2.destroyAllWindows() |
||||
``` |
||||
|
||||
### Multithreaded Tracking |
||||
|
||||
Multithreaded tracking provides the capability to run object tracking on multiple video streams simultaneously. This is particularly useful when handling multiple video inputs, such as from multiple surveillance cameras, where concurrent processing can greatly enhance efficiency and performance. |
||||
|
||||
In the provided Python script, we make use of Python's `threading` module to run multiple instances of the tracker concurrently. Each thread is responsible for running the tracker on one video file, and all the threads run simultaneously in the background. |
||||
|
||||
To ensure that each thread receives the correct parameters (the video file and the model to use), we define a function `run_tracker_in_thread` that accepts these parameters and contains the main tracking loop. This function reads the video frame by frame, runs the tracker, and displays the results. |
||||
|
||||
Two different models are used in this example: `yolov8n.pt` and `yolov8n-seg.pt`, each tracking objects in a different video file. The video files are specified in `video_file1` and `video_file2`. |
||||
|
||||
The `daemon=True` parameter in `threading.Thread` means that these threads will be closed as soon as the main program finishes. We then start the threads with `start()` and use `join()` to make the main thread wait until both tracker threads have finished. |
||||
|
||||
Finally, after all threads have completed their task, the windows displaying the results are closed using `cv2.destroyAllWindows()`. |
||||
|
||||
!!! example "Streaming for-loop with tracking" |
||||
|
||||
```python |
||||
import threading |
||||
|
||||
import cv2 |
||||
from ultralytics import YOLO |
||||
|
||||
|
||||
def run_tracker_in_thread(filename, model): |
||||
video = cv2.VideoCapture(filename) |
||||
frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT)) |
||||
for _ in range(frames): |
||||
ret, frame = video.read() |
||||
if ret: |
||||
results = model.track(source=frame, persist=True) |
||||
res_plotted = results[0].plot() |
||||
cv2.imshow('p', res_plotted) |
||||
if cv2.waitKey(1) == ord('q'): |
||||
break |
||||
|
||||
|
||||
# Load the models |
||||
model1 = YOLO('yolov8n.pt') |
||||
model2 = YOLO('yolov8n-seg.pt') |
||||
|
||||
# Define the video files for the trackers |
||||
video_file1 = 'path/to/video1.mp4' |
||||
video_file2 = 'path/to/video2.mp4' |
||||
|
||||
# Create the tracker threads |
||||
tracker_thread1 = threading.Thread(target=run_tracker_in_thread, args=(video_file1, model1), daemon=True) |
||||
tracker_thread2 = threading.Thread(target=run_tracker_in_thread, args=(video_file2, model2), daemon=True) |
||||
|
||||
# Start the tracker threads |
||||
tracker_thread1.start() |
||||
tracker_thread2.start() |
||||
|
||||
# Wait for the tracker threads to finish |
||||
tracker_thread1.join() |
||||
tracker_thread2.join() |
||||
|
||||
# Clean up and close windows |
||||
cv2.destroyAllWindows() |
||||
``` |
||||
|
||||
This example can easily be extended to handle more video files and models by creating more threads and applying the same methodology. |
||||
|
@ -0,0 +1,26 @@ |
||||
{% import "partials/language.html" as lang with context %} |
||||
|
||||
<!-- taken from |
||||
https://github.com/squidfunk/mkdocs-material/blob/master/src/partials/source-file.html --> |
||||
|
||||
<br> |
||||
<div class="md-source-file"> |
||||
<small> |
||||
|
||||
<!-- mkdocs-git-revision-date-localized-plugin --> |
||||
{% if page.meta.git_revision_date_localized %} |
||||
📅 {{ lang.t("source.file.date.updated") }}: |
||||
{{ page.meta.git_revision_date_localized }} |
||||
{% if page.meta.git_creation_date_localized %} |
||||
<br/> |
||||
🎂 {{ lang.t("source.file.date.created") }}: |
||||
{{ page.meta.git_creation_date_localized }} |
||||
{% endif %} |
||||
|
||||
<!-- mkdocs-git-revision-date-plugin --> |
||||
{% elif page.meta.revision_date %} |
||||
📅 {{ lang.t("source.file.date.updated") }}: |
||||
{{ page.meta.revision_date }} |
||||
{% endif %} |
||||
</small> |
||||
</div> |
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in new issue