`ultralytics 8.0.94` HUBDatasetStats() Segment and Pose support (#2450)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: JF Chen <k-2feng@hotmail.com> Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Laughing-q <1185102784@qq.com>pull/2482/head v8.0.94
parent
af49a85cf3
commit
e21428ca4e
51 changed files with 948 additions and 81 deletions
@ -0,0 +1,101 @@ |
||||
--- |
||||
comments: true |
||||
--- |
||||
|
||||
# Image Classification Datasets Overview |
||||
|
||||
## Dataset format |
||||
|
||||
The folder structure for classification datasets in torchvision typically follows a standard format: |
||||
|
||||
``` |
||||
root/ |
||||
|-- class1/ |
||||
| |-- img1.jpg |
||||
| |-- img2.jpg |
||||
| |-- ... |
||||
| |
||||
|-- class2/ |
||||
| |-- img1.jpg |
||||
| |-- img2.jpg |
||||
| |-- ... |
||||
| |
||||
|-- class3/ |
||||
| |-- img1.jpg |
||||
| |-- img2.jpg |
||||
| |-- ... |
||||
| |
||||
|-- ... |
||||
``` |
||||
|
||||
In this folder structure, the `root` directory contains one subdirectory for each class in the dataset. Each subdirectory is named after the corresponding class and contains all the images for that class. Each image file is named uniquely and is typically in a common image file format such as JPEG or PNG. |
||||
|
||||
** Example ** |
||||
|
||||
For example, in the CIFAR10 dataset, the folder structure would look like this: |
||||
|
||||
``` |
||||
cifar-10-/ |
||||
| |
||||
|-- train/ |
||||
| |-- airplane/ |
||||
| | |-- 10008_airplane.png |
||||
| | |-- 10009_airplane.png |
||||
| | |-- ... |
||||
| | |
||||
| |-- automobile/ |
||||
| | |-- 1000_automobile.png |
||||
| | |-- 1001_automobile.png |
||||
| | |-- ... |
||||
| | |
||||
| |-- bird/ |
||||
| | |-- 10014_bird.png |
||||
| | |-- 10015_bird.png |
||||
| | |-- ... |
||||
| | |
||||
| |-- ... |
||||
| |
||||
|-- test/ |
||||
| |-- airplane/ |
||||
| | |-- 10_airplane.png |
||||
| | |-- 11_airplane.png |
||||
| | |-- ... |
||||
| | |
||||
| |-- automobile/ |
||||
| | |-- 100_automobile.png |
||||
| | |-- 101_automobile.png |
||||
| | |-- ... |
||||
| | |
||||
| |-- bird/ |
||||
| | |-- 1000_bird.png |
||||
| | |-- 1001_bird.png |
||||
| | |-- ... |
||||
| | |
||||
| |-- ... |
||||
``` |
||||
|
||||
In this example, the `train` directory contains subdirectories for each class in the dataset, and each class subdirectory contains all the images for that class. The `test` directory has a similar structure. The `root` directory also contains other files that are part of the CIFAR10 dataset. |
||||
|
||||
## Usage |
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-cls.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
model.train(data='path/to/dataset', epochs=100, imgsz=640) |
||||
``` |
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=path/to/data model=yolov8n-seg.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Supported Datasets |
||||
TODO |
@ -0,0 +1,106 @@ |
||||
--- |
||||
comments: true |
||||
--- |
||||
|
||||
# Object Detection Datasets Overview |
||||
|
||||
## Supported Dataset Formats |
||||
|
||||
### Ultralytics YOLO format |
||||
|
||||
** Label Format ** |
||||
|
||||
The dataset format used for training YOLO detection models is as follows: |
||||
|
||||
1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension. |
||||
2. One row per object: Each row in the text file corresponds to one object instance in the image. |
||||
3. Object information per row: Each row contains the following information about the object instance: |
||||
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.). |
||||
- Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1. |
||||
- Object width and height: The width and height of the object, normalized to be between 0 and 1. |
||||
|
||||
The format for a single row in the detection dataset file is as follows: |
||||
``` |
||||
<object-class> <x> <y> <width> <height> |
||||
``` |
||||
|
||||
Here is an example of the YOLO dataset format for a single image with two object instances: |
||||
|
||||
``` |
||||
0 0.5 0.4 0.3 0.6 |
||||
1 0.3 0.7 0.4 0.2 |
||||
``` |
||||
|
||||
In this example, the first object is of class 0 (person), with its center at (0.5, 0.4), width of 0.3, and height of 0.6. The second object is of class 1 (car), with its center at (0.3, 0.7), width of 0.4, and height of 0.2. |
||||
|
||||
** Dataset file format ** |
||||
|
||||
The Ultralytics framework uses a YAML file format to define the dataset and model configuration for training Detection Models. Here is an example of the YAML format used for defining a detection dataset: |
||||
|
||||
``` |
||||
train: <path-to-training-images> |
||||
val: <path-to-validation-images> |
||||
|
||||
nc: <number-of-classes> |
||||
names: [<class-1>, <class-2>, ..., <class-n>] |
||||
|
||||
``` |
||||
|
||||
The `train` and `val` fields specify the paths to the directories containing the training and validation images, respectively. |
||||
|
||||
The `nc` field specifies the number of object classes in the dataset. |
||||
|
||||
The `names` field is a list of the names of the object classes. The order of the names should match the order of the object class indices in the YOLO dataset files. |
||||
|
||||
NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory |
||||
|
||||
Alternatively, you can directly define class names like this: |
||||
```yaml |
||||
names: |
||||
0: person |
||||
1: bicycle |
||||
``` |
||||
|
||||
** Example ** |
||||
|
||||
```yaml |
||||
train: data/train/ |
||||
val: data/val/ |
||||
|
||||
nc: 2 |
||||
names: ['person', 'car'] |
||||
``` |
||||
|
||||
## Usage |
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
model.train(data='coco128.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=coco128.yaml model=yolov8n.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Supported Datasets |
||||
TODO |
||||
|
||||
## Port or Convert label formats |
||||
|
||||
### COCO dataset format to YOLO format |
||||
|
||||
``` |
||||
from ultralytics.yolo.data.converter import convert_coco |
||||
|
||||
convert_coco(labels_dir='../coco/annotations/') |
||||
``` |
@ -0,0 +1,57 @@ |
||||
--- |
||||
comments: true |
||||
--- |
||||
|
||||
# Datasets Overview |
||||
|
||||
Ultralytics provides support for various datasets to facilitate computer vision tasks such as detection, instance segmentation, pose estimation, classification, and multi-object tracking. Below is a list of the main Ultralytics datasets, followed by a summary of each computer vision task and the respective datasets. |
||||
|
||||
## [Detection Datasets](detect/index.md) |
||||
|
||||
Bounding box object detection is a computer vision technique that involves detecting and localizing objects in an image by drawing a bounding box around each object. |
||||
|
||||
* [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations. |
||||
* [COCO](detect/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning with over 200K labeled images. |
||||
* [COCO8](detect/coco8.md): Contains the first 4 images from COCO train and COCO val, suitable for quick tests. |
||||
* [Global Wheat 2020](detect/globalwheat2020.md): A dataset of wheat head images collected from around the world for object detection and localization tasks. |
||||
* [Objects365](detect/objects365.md): A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images. |
||||
* [SKU-110K](detect/sku-110k.md): A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes. |
||||
* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences. |
||||
* [VOC](detect/voc.md): The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images. |
||||
* [xView](detect/xview.md): A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects. |
||||
|
||||
## [Instance Segmentation Datasets](segment/index.md) |
||||
|
||||
Instance segmentation is a computer vision technique that involves identifying and localizing objects in an image at the pixel level. |
||||
|
||||
* [COCO](segment/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images. |
||||
* [COCO8-seg](segment/coco8-seg.md): A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations. |
||||
|
||||
## [Pose Estimation](pose/index.md) |
||||
|
||||
Pose estimation is a technique used to determine the pose of the object relative to the camera or the world coordinate system. |
||||
|
||||
* [COCO](pose/coco.md): A large-scale dataset with human pose annotations designed for pose estimation tasks. |
||||
* [COCO8-pose](pose/coco8-pose.md): A smaller dataset for pose estimation tasks, containing a subset of 8 COCO images with human pose annotations. |
||||
|
||||
## [Classification](classify/index.md) |
||||
|
||||
Image classification is a computer vision task that involves categorizing an image into one or more predefined classes or categories based on its visual content. |
||||
|
||||
* [Caltech 101](classify/caltech101.md): A dataset containing images of 101 object categories for image classification tasks. |
||||
* [Caltech 256](classify/caltech256.md): An extended version of Caltech 101 with 256 object categories and more challenging images. |
||||
* [CIFAR-10](classify/cifar10.md): A dataset of 60K 32x32 color images in 10 classes, with 6K images per class. |
||||
* [CIFAR-100](classify/cifar100.md): An extended version of CIFAR-10 with 100 object categories and 600 images per class. |
||||
* [Fashion-MNIST](classify/fashion-mnist.md): A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks. |
||||
* [ImageNet](classify/imagenet.md): A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories. |
||||
* [ImageNet-10](classify/imagenet10.md): A smaller subset of ImageNet with 10 categories for faster experimentation and testing. |
||||
* [Imagenette](classify/imagenette.md): A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing. |
||||
* [Imagewoof](classify/imagewoof.md): A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks. |
||||
* [MNIST](classify/mnist.md): A dataset of 70,000 grayscale images of handwritten digits for image classification tasks. |
||||
|
||||
## [Multi-Object Tracking](track/index.md) |
||||
|
||||
Multi-object tracking is a computer vision technique that involves detecting and tracking multiple objects over time in a video sequence. |
||||
|
||||
* [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations for multi-object tracking tasks. |
||||
* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences. |
@ -0,0 +1,120 @@ |
||||
--- |
||||
comments: true |
||||
--- |
||||
|
||||
# Pose Estimation Datasets Overview |
||||
|
||||
## Supported Dataset Formats |
||||
|
||||
### Ultralytics YOLO format |
||||
|
||||
** Label Format ** |
||||
|
||||
The dataset format used for training YOLO segmentation models is as follows: |
||||
|
||||
1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension. |
||||
2. One row per object: Each row in the text file corresponds to one object instance in the image. |
||||
3. Object information per row: Each row contains the following information about the object instance: |
||||
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.). |
||||
- Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1. |
||||
- Object width and height: The width and height of the object, normalized to be between 0 and 1. |
||||
- Object keypoint coordinates: The keypoints of the object, normalized to be between 0 and 1. |
||||
|
||||
Here is an example of the label format for pose estimation task: |
||||
|
||||
Format with Dim = 2 |
||||
|
||||
``` |
||||
<class-index> <x> <y> <width> <height> <px1> <py1> <px2> <py2> <pxn> <pyn> |
||||
``` |
||||
Format with Dim = 3 |
||||
|
||||
``` |
||||
<class-index> <x> <y> <width> <height> <px1> <py1> <p1-visibility> <px2> <py2> <p2-visibility> <pxn> <pyn> <p2-visibility> |
||||
``` |
||||
|
||||
In this format, `<class-index>` is the index of the class for the object,`<x> <y> <width> <height>` are coordinates of boudning box, and `<px1> <py1> <px2> <py2> <pxn> <pyn>` are the pixel coordinates of the keypoints. The coordinates are separated by spaces. |
||||
|
||||
|
||||
** Dataset file format ** |
||||
|
||||
The Ultralytics framework uses a YAML file format to define the dataset and model configuration for training Detection Models. Here is an example of the YAML format used for defining a detection dataset: |
||||
|
||||
```yaml |
||||
train: <path-to-training-images> |
||||
val: <path-to-validation-images> |
||||
|
||||
nc: <number-of-classes> |
||||
names: [<class-1>, <class-2>, ..., <class-n>] |
||||
|
||||
# Keypoints |
||||
kpt_shape: [num_kpts, dim] # number of keypoints, number of dims (2 for x,y or 3 for x,y,visible) |
||||
flip_idx: [n1, n2 ... , n(num_kpts)] |
||||
|
||||
``` |
||||
|
||||
The `train` and `val` fields specify the paths to the directories containing the training and validation images, respectively. |
||||
|
||||
The `nc` field specifies the number of object classes in the dataset. |
||||
|
||||
The `names` field is a list of the names of the object classes. The order of the names should match the order of the object class indices in the YOLO dataset files. |
||||
|
||||
NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory |
||||
|
||||
Alternatively, you can directly define class names like this: |
||||
``` |
||||
names: |
||||
0: person |
||||
1: bicycle |
||||
``` |
||||
|
||||
(Optional) if the points are symmetric then need flip_idx, like left-right side of human or face. |
||||
For example let's say there're five keypoints of facial landmark: [left eye, right eye, nose, left point of mouth, right point of mouse], and the original index is [0, 1, 2, 3, 4], then flip_idx is [1, 0, 2, 4, 3].(just exchange the left-right index, i.e 0-1 and 3-4, and do not modify others like nose in this example) |
||||
|
||||
** Example ** |
||||
|
||||
```yaml |
||||
train: data/train/ |
||||
val: data/val/ |
||||
|
||||
nc: 2 |
||||
names: ['person', 'car'] |
||||
|
||||
# Keypoints |
||||
kpt_shape: [17, 3] # number of keypoints, number of dims (2 for x,y or 3 for x,y,visible) |
||||
flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15] |
||||
``` |
||||
|
||||
## Usage |
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-pose.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
model.train(data='coco128-pose.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=coco128-pose.yaml model=yolov8n-pose.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Supported Datasets |
||||
TODO |
||||
|
||||
## Port or Convert label formats |
||||
|
||||
### COCO dataset format to YOLO format |
||||
|
||||
``` |
||||
from ultralytics.yolo.data.converter import convert_coco |
||||
|
||||
convert_coco(labels_dir='../coco/annotations/', use_keypoints=True) |
||||
``` |
@ -0,0 +1,106 @@ |
||||
--- |
||||
comments: true |
||||
--- |
||||
|
||||
# Instance Segmentation Datasets Overview |
||||
|
||||
## Supported Dataset Formats |
||||
|
||||
### Ultralytics YOLO format |
||||
|
||||
** Label Format ** |
||||
|
||||
The dataset format used for training YOLO segmentation models is as follows: |
||||
|
||||
1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension. |
||||
2. One row per object: Each row in the text file corresponds to one object instance in the image. |
||||
3. Object information per row: Each row contains the following information about the object instance: |
||||
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.). |
||||
- Object bounding coordinates: The bounding coordinates around the mask area, normalized to be between 0 and 1. |
||||
|
||||
The format for a single row in the segmentation dataset file is as follows: |
||||
|
||||
``` |
||||
<class-index> <x1> <y1> <x2> <y2> ... <xn> <yn> |
||||
``` |
||||
|
||||
In this format, `<class-index>` is the index of the class for the object, and `<x1> <y1> <x2> <y2> ... <xn> <yn>` are the bounding coordinates of the object's segmentation mask. The coordinates are separated by spaces. |
||||
|
||||
Here is an example of the YOLO dataset format for a single image with two object instances: |
||||
|
||||
``` |
||||
0 0.6812 0.48541 0.67 0.4875 0.67656 0.487 0.675 0.489 0.66 |
||||
1 0.5046 0.0 0.5015 0.004 0.4984 0.00416 0.4937 0.010 0.492 0.0104 |
||||
``` |
||||
Note: The length of each row does not have to be equal. |
||||
|
||||
** Dataset file format ** |
||||
|
||||
The Ultralytics framework uses a YAML file format to define the dataset and model configuration for training Detection Models. Here is an example of the YAML format used for defining a detection dataset: |
||||
|
||||
```yaml |
||||
train: <path-to-training-images> |
||||
val: <path-to-validation-images> |
||||
|
||||
nc: <number-of-classes> |
||||
names: [<class-1>, <class-2>, ..., <class-n>] |
||||
|
||||
``` |
||||
|
||||
The `train` and `val` fields specify the paths to the directories containing the training and validation images, respectively. |
||||
|
||||
The `nc` field specifies the number of object classes in the dataset. |
||||
|
||||
The `names` field is a list of the names of the object classes. The order of the names should match the order of the object class indices in the YOLO dataset files. |
||||
|
||||
NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory. |
||||
|
||||
Alternatively, you can directly define class names like this: |
||||
```yaml |
||||
names: |
||||
0: person |
||||
1: bicycle |
||||
``` |
||||
|
||||
** Example ** |
||||
|
||||
```yaml |
||||
train: data/train/ |
||||
val: data/val/ |
||||
|
||||
nc: 2 |
||||
names: ['person', 'car'] |
||||
``` |
||||
|
||||
## Usage |
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
# Load a model |
||||
model = YOLO('yolov8n-seg.pt') # load a pretrained model (recommended for training) |
||||
|
||||
# Train the model |
||||
model.train(data='coco128-seg.yaml', epochs=100, imgsz=640) |
||||
``` |
||||
=== "CLI" |
||||
|
||||
```bash |
||||
# Start training from a pretrained *.pt model |
||||
yolo detect train data=coco128-seg.yaml model=yolov8n-seg.pt epochs=100 imgsz=640 |
||||
``` |
||||
|
||||
## Supported Datasets |
||||
|
||||
## Port or Convert label formats |
||||
|
||||
### COCO dataset format to YOLO format |
||||
|
||||
``` |
||||
from ultralytics.yolo.data.converter import convert_coco |
||||
|
||||
convert_coco(labels_dir='../coco/annotations/', use_segments=True) |
||||
``` |
@ -0,0 +1,29 @@ |
||||
--- |
||||
comments: true |
||||
--- |
||||
|
||||
# Multi-object Tracking Datasets Overview |
||||
|
||||
## Dataset Format (Coming Soon) |
||||
|
||||
Multi-Object Detector doesn't need standalone training and directly supports pre-trained detection, segmentation or Pose models. |
||||
Support for training trackers alone is coming soon |
||||
|
||||
## Usage |
||||
|
||||
!!! example "" |
||||
|
||||
=== "Python" |
||||
|
||||
```python |
||||
from ultralytics import YOLO |
||||
|
||||
model = YOLO('yolov8n.pt') |
||||
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", conf=0.3, iou=0.5, show=True) |
||||
``` |
||||
=== "CLI" |
||||
|
||||
```bash |
||||
yolo track model=yolov8n.pt source="https://youtu.be/Zgi9g1ksQHc" conf=0.3, iou=0.5 show |
||||
``` |
||||
|
Loading…
Reference in new issue