`py-cpuinfo` Exception context manager fix (#14814)

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: UltralyticsAssistant <web@ultralytics.com>
pull/14817/head
Glenn Jocher 4 months ago committed by GitHub
parent f955fedb7f
commit 7ecab94b29
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
  1. 1
      .github/workflows/publish.yml
  2. 2
      docs/en/models/index.md
  3. 128
      docs/en/models/sam-2.md
  4. 3
      mkdocs.yml
  5. 15
      ultralytics/utils/torch_utils.py

@ -168,6 +168,7 @@ jobs:
PERSONAL_ACCESS_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }} PERSONAL_ACCESS_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
INDEXNOW_KEY: ${{ secrets.INDEXNOW_KEY_DOCS }} INDEXNOW_KEY: ${{ secrets.INDEXNOW_KEY_DOCS }}
run: | run: |
pip install black
export JUPYTER_PLATFORM_DIRS=1 export JUPYTER_PLATFORM_DIRS=1
python docs/build_docs.py python docs/build_docs.py
git clone https://github.com/ultralytics/docs.git docs-repo git clone https://github.com/ultralytics/docs.git docs-repo

@ -21,7 +21,7 @@ Here are some of the key models supported:
7. **[YOLOv9](yolov9.md)**: An experimental model trained on the Ultralytics [YOLOv5](yolov5.md) codebase implementing Programmable Gradient Information (PGI). 7. **[YOLOv9](yolov9.md)**: An experimental model trained on the Ultralytics [YOLOv5](yolov5.md) codebase implementing Programmable Gradient Information (PGI).
8. **[YOLOv10](yolov10.md)**: By Tsinghua University, featuring NMS-free training and efficiency-accuracy driven architecture, delivering state-of-the-art performance and latency. 8. **[YOLOv10](yolov10.md)**: By Tsinghua University, featuring NMS-free training and efficiency-accuracy driven architecture, delivering state-of-the-art performance and latency.
9. **[Segment Anything Model (SAM)](sam.md)**: Meta's original Segment Anything Model (SAM). 9. **[Segment Anything Model (SAM)](sam.md)**: Meta's original Segment Anything Model (SAM).
10. **[Segment Anything Model 2 (SAM2)](sam2.md)**: The next generation of Meta's Segment Anything Model (SAM) for videos and images. 10. **[Segment Anything Model 2 (SAM2)](sam-2.md)**: The next generation of Meta's Segment Anything Model (SAM) for videos and images.
11. **[Mobile Segment Anything Model (MobileSAM)](mobile-sam.md)**: MobileSAM for mobile applications, by Kyung Hee University. 11. **[Mobile Segment Anything Model (MobileSAM)](mobile-sam.md)**: MobileSAM for mobile applications, by Kyung Hee University.
12. **[Fast Segment Anything Model (FastSAM)](fast-sam.md)**: FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences. 12. **[Fast Segment Anything Model (FastSAM)](fast-sam.md)**: FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
13. **[YOLO-NAS](yolo-nas.md)**: YOLO Neural Architecture Search (NAS) Models. 13. **[YOLO-NAS](yolo-nas.md)**: YOLO Neural Architecture Search (NAS) Models.

@ -1,32 +1,32 @@
--- ---
comments: true comments: true
description: Discover SAM2, the next generation of Meta's Segment Anything Model, supporting real-time promptable segmentation in both images and videos with state-of-the-art performance. Learn about its key features, datasets, and how to use it. description: Discover SAM 2, the next generation of Meta's Segment Anything Model, supporting real-time promptable segmentation in both images and videos with state-of-the-art performance. Learn about its key features, datasets, and how to use it.
keywords: SAM2, Segment Anything, video segmentation, image segmentation, promptable segmentation, zero-shot performance, SA-V dataset, Ultralytics, real-time segmentation, AI, machine learning keywords: SAM 2, Segment Anything, video segmentation, image segmentation, promptable segmentation, zero-shot performance, SA-V dataset, Ultralytics, real-time segmentation, AI, machine learning
--- ---
# SAM2: Segment Anything Model 2 # SAM 2: Segment Anything Model 2
!!! Note "🚧 SAM2 Integration In Progress 🚧" !!! Note "🚧 SAM 2 Integration In Progress 🚧"
The SAM2 features described in this documentation are currently not enabled in the `ultralytics` package. The Ultralytics team is actively working on integrating SAM2, and these capabilities should be available soon. We appreciate your patience as we work to implement this exciting new model. The SAM 2 features described in this documentation are currently not enabled in the `ultralytics` package. The Ultralytics team is actively working on integrating SAM 2, and these capabilities should be available soon. We appreciate your patience as we work to implement this exciting new model.
SAM2, the successor to Meta's [Segment Anything Model (SAM)](sam.md), is a cutting-edge tool designed for comprehensive object segmentation in both images and videos. It excels in handling complex visual data through a unified, promptable model architecture that supports real-time processing and zero-shot generalization. SAM 2, the successor to Meta's [Segment Anything Model (SAM)](sam.md), is a cutting-edge tool designed for comprehensive object segmentation in both images and videos. It excels in handling complex visual data through a unified, promptable model architecture that supports real-time processing and zero-shot generalization.
![SAM2 Example Results](https://github.com/facebookresearch/segment-anything-2/raw/main/assets/sa_v_dataset.jpg?raw=true) ![SAM 2 Example Results](https://github.com/facebookresearch/segment-anything-2/raw/main/assets/sa_v_dataset.jpg?raw=true)
## Key Features ## Key Features
### Unified Model Architecture ### Unified Model Architecture
SAM2 combines the capabilities of image and video segmentation in a single model. This unification simplifies deployment and allows for consistent performance across different media types. It leverages a flexible prompt-based interface, enabling users to specify objects of interest through various prompt types, such as points, bounding boxes, or masks. SAM 2 combines the capabilities of image and video segmentation in a single model. This unification simplifies deployment and allows for consistent performance across different media types. It leverages a flexible prompt-based interface, enabling users to specify objects of interest through various prompt types, such as points, bounding boxes, or masks.
### Real-Time Performance ### Real-Time Performance
The model achieves real-time inference speeds, processing approximately 44 frames per second. This makes SAM2 suitable for applications requiring immediate feedback, such as video editing and augmented reality. The model achieves real-time inference speeds, processing approximately 44 frames per second. This makes SAM 2 suitable for applications requiring immediate feedback, such as video editing and augmented reality.
### Zero-Shot Generalization ### Zero-Shot Generalization
SAM2 can segment objects it has never encountered before, demonstrating strong zero-shot generalization. This is particularly useful in diverse or evolving visual domains where pre-defined categories may not cover all possible objects. SAM 2 can segment objects it has never encountered before, demonstrating strong zero-shot generalization. This is particularly useful in diverse or evolving visual domains where pre-defined categories may not cover all possible objects.
### Interactive Refinement ### Interactive Refinement
@ -34,15 +34,15 @@ Users can iteratively refine the segmentation results by providing additional pr
### Advanced Handling of Visual Challenges ### Advanced Handling of Visual Challenges
SAM2 includes mechanisms to manage common video segmentation challenges, such as object occlusion and reappearance. It uses a sophisticated memory mechanism to keep track of objects across frames, ensuring continuity even when objects are temporarily obscured or exit and re-enter the scene. SAM 2 includes mechanisms to manage common video segmentation challenges, such as object occlusion and reappearance. It uses a sophisticated memory mechanism to keep track of objects across frames, ensuring continuity even when objects are temporarily obscured or exit and re-enter the scene.
For a deeper understanding of SAM2's architecture and capabilities, explore the [SAM2 research paper](https://arxiv.org/abs/2401.12741). For a deeper understanding of SAM 2's architecture and capabilities, explore the [SAM 2 research paper](https://arxiv.org/abs/2401.12741).
## Performance and Technical Details ## Performance and Technical Details
SAM2 sets a new benchmark in the field, outperforming previous models on various metrics: SAM 2 sets a new benchmark in the field, outperforming previous models on various metrics:
| Metric | SAM2 | Previous SOTA | | Metric | SAM 2 | Previous SOTA |
| ---------------------------------- | ------------- | ------------- | | ---------------------------------- | ------------- | ------------- |
| **Interactive Video Segmentation** | **Best** | - | | **Interactive Video Segmentation** | **Best** | - |
| **Human Interactions Required** | **3x fewer** | Baseline | | **Human Interactions Required** | **3x fewer** | Baseline |
@ -54,23 +54,23 @@ SAM2 sets a new benchmark in the field, outperforming previous models on various
### Core Components ### Core Components
- **Image and Video Encoder**: Utilizes a transformer-based architecture to extract high-level features from both images and video frames. This component is responsible for understanding the visual content at each timestep. - **Image and Video Encoder**: Utilizes a transformer-based architecture to extract high-level features from both images and video frames. This component is responsible for understanding the visual content at each timestep.
- **Prompt Encoder**: Processes user-provided prompts (points, boxes, masks) to guide the segmentation task. This allows SAM2 to adapt to user input and target specific objects within a scene. - **Prompt Encoder**: Processes user-provided prompts (points, boxes, masks) to guide the segmentation task. This allows SAM 2 to adapt to user input and target specific objects within a scene.
- **Memory Mechanism**: Includes a memory encoder, memory bank, and memory attention module. These components collectively store and utilize information from past frames, enabling the model to maintain consistent object tracking over time. - **Memory Mechanism**: Includes a memory encoder, memory bank, and memory attention module. These components collectively store and utilize information from past frames, enabling the model to maintain consistent object tracking over time.
- **Mask Decoder**: Generates the final segmentation masks based on the encoded image features and prompts. In video, it also uses memory context to ensure accurate tracking across frames. - **Mask Decoder**: Generates the final segmentation masks based on the encoded image features and prompts. In video, it also uses memory context to ensure accurate tracking across frames.
![SAM2 Architecture Diagram](https://github.com/facebookresearch/segment-anything-2/blob/main/assets/model_diagram.png?raw=true) ![SAM 2 Architecture Diagram](https://github.com/facebookresearch/segment-anything-2/blob/main/assets/model_diagram.png?raw=true)
### Memory Mechanism and Occlusion Handling ### Memory Mechanism and Occlusion Handling
The memory mechanism allows SAM2 to handle temporal dependencies and occlusions in video data. As objects move and interact, SAM2 records their features in a memory bank. When an object becomes occluded, the model can rely on this memory to predict its position and appearance when it reappears. The occlusion head specifically handles scenarios where objects are not visible, predicting the likelihood of an object being occluded. The memory mechanism allows SAM 2 to handle temporal dependencies and occlusions in video data. As objects move and interact, SAM 2 records their features in a memory bank. When an object becomes occluded, the model can rely on this memory to predict its position and appearance when it reappears. The occlusion head specifically handles scenarios where objects are not visible, predicting the likelihood of an object being occluded.
### Multi-Mask Ambiguity Resolution ### Multi-Mask Ambiguity Resolution
In situations with ambiguity (e.g., overlapping objects), SAM2 can generate multiple mask predictions. This feature is crucial for accurately representing complex scenes where a single mask might not sufficiently describe the scene's nuances. In situations with ambiguity (e.g., overlapping objects), SAM 2 can generate multiple mask predictions. This feature is crucial for accurately representing complex scenes where a single mask might not sufficiently describe the scene's nuances.
## SA-V Dataset ## SA-V Dataset
The SA-V dataset, developed for SAM2's training, is one of the largest and most diverse video segmentation datasets available. It includes: The SA-V dataset, developed for SAM 2's training, is one of the largest and most diverse video segmentation datasets available. It includes:
- **51,000+ Videos**: Captured across 47 countries, providing a wide range of real-world scenarios. - **51,000+ Videos**: Captured across 47 countries, providing a wide range of real-world scenarios.
- **600,000+ Mask Annotations**: Detailed spatio-temporal mask annotations, referred to as "masklets," covering whole objects and parts. - **600,000+ Mask Annotations**: Detailed spatio-temporal mask annotations, referred to as "masklets," covering whole objects and parts.
@ -80,7 +80,7 @@ The SA-V dataset, developed for SAM2's training, is one of the largest and most
### Video Object Segmentation ### Video Object Segmentation
SAM2 has demonstrated superior performance across major video segmentation benchmarks: SAM 2 has demonstrated superior performance across major video segmentation benchmarks:
| Dataset | J&F | J | F | | Dataset | J&F | J | F |
| --------------- | ---- | ---- | ---- | | --------------- | ---- | ---- | ---- |
@ -89,7 +89,7 @@ SAM2 has demonstrated superior performance across major video segmentation bench
### Interactive Segmentation ### Interactive Segmentation
In interactive segmentation tasks, SAM2 shows significant efficiency and accuracy: In interactive segmentation tasks, SAM 2 shows significant efficiency and accuracy:
| Dataset | NoC@90 | AUC | | Dataset | NoC@90 | AUC |
| --------------------- | ------ | ----- | | --------------------- | ------ | ----- |
@ -97,28 +97,28 @@ In interactive segmentation tasks, SAM2 shows significant efficiency and accurac
## Installation ## Installation
To install SAM2, use the following command. All SAM2 models will automatically download on first use. To install SAM 2, use the following command. All SAM 2 models will automatically download on first use.
```bash ```bash
pip install ultralytics pip install ultralytics
``` ```
## How to Use SAM2: Versatility in Image and Video Segmentation ## How to Use SAM 2: Versatility in Image and Video Segmentation
!!! Note "🚧 SAM2 Integration In Progress 🚧" !!! Note "🚧 SAM 2 Integration In Progress 🚧"
The SAM2 features described in this documentation are currently not enabled in the `ultralytics` package. The Ultralytics team is actively working on integrating SAM2, and these capabilities should be available soon. We appreciate your patience as we work to implement this exciting new model. The SAM 2 features described in this documentation are currently not enabled in the `ultralytics` package. The Ultralytics team is actively working on integrating SAM 2, and these capabilities should be available soon. We appreciate your patience as we work to implement this exciting new model.
The following table details the available SAM2 models, their pre-trained weights, supported tasks, and compatibility with different operating modes like [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md). The following table details the available SAM 2 models, their pre-trained weights, supported tasks, and compatibility with different operating modes like [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md).
| Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export | | Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export |
| ---------- | ------------------------------------------------------------------------------------- | -------------------------------------------- | --------- | ---------- | -------- | ------ | | ----------- | ------------------------------------------------------------------------------------- | -------------------------------------------- | --------- | ---------- | -------- | ------ |
| SAM2 base | [sam2_b.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/sam2_b.pt) | [Instance Segmentation](../tasks/segment.md) | ✅ | ❌ | ❌ | ❌ | | SAM 2 base | [sam2_b.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/sam2_b.pt) | [Instance Segmentation](../tasks/segment.md) | ✅ | ❌ | ❌ | ❌ |
| SAM2 large | [sam2_l.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/sam2_l.pt) | [Instance Segmentation](../tasks/segment.md) | ✅ | ❌ | ❌ | ❌ | | SAM 2 large | [sam2_l.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/sam2_l.pt) | [Instance Segmentation](../tasks/segment.md) | ✅ | ❌ | ❌ | ❌ |
### SAM2 Prediction Examples ### SAM 2 Prediction Examples
SAM2 can be utilized across a broad spectrum of tasks, including real-time video editing, medical imaging, and autonomous systems. Its ability to segment both static and dynamic visual data makes it a versatile tool for researchers and developers. SAM 2 can be utilized across a broad spectrum of tasks, including real-time video editing, medical imaging, and autonomous systems. Its ability to segment both static and dynamic visual data makes it a versatile tool for researchers and developers.
#### Segment with Prompts #### Segment with Prompts
@ -129,10 +129,10 @@ SAM2 can be utilized across a broad spectrum of tasks, including real-time video
=== "Python" === "Python"
```python ```python
from ultralytics import SAM2 from ultralytics import SAM
# Load a model # Load a model
model = SAM2("sam2_b.pt") model = SAM("sam2_b.pt")
# Display model information (optional) # Display model information (optional)
model.info() model.info()
@ -153,10 +153,10 @@ SAM2 can be utilized across a broad spectrum of tasks, including real-time video
=== "Python" === "Python"
```python ```python
from ultralytics import SAM2 from ultralytics import SAM
# Load a model # Load a model
model = SAM2("sam2_b.pt") model = SAM("sam2_b.pt")
# Display model information (optional) # Display model information (optional)
model.info() model.info()
@ -168,11 +168,11 @@ SAM2 can be utilized across a broad spectrum of tasks, including real-time video
=== "CLI" === "CLI"
```bash ```bash
# Run inference with a SAM2 model # Run inference with a SAM 2 model
yolo predict model=sam2_b.pt source=path/to/video.mp4 yolo predict model=sam2_b.pt source=path/to/video.mp4
``` ```
- This example demonstrates how SAM2 can be used to segment the entire content of an image or video if no prompts (bboxes/points/masks) are provided. - This example demonstrates how SAM 2 can be used to segment the entire content of an image or video if no prompts (bboxes/points/masks) are provided.
## SAM comparison vs YOLOv8 ## SAM comparison vs YOLOv8
@ -219,11 +219,11 @@ Tests run on a 2023 Apple M2 Macbook with 16GB of RAM. To reproduce this test:
## Auto-Annotation: Efficient Dataset Creation ## Auto-Annotation: Efficient Dataset Creation
Auto-annotation is a powerful feature of SAM2, enabling users to generate segmentation datasets quickly and accurately by leveraging pre-trained models. This capability is particularly useful for creating large, high-quality datasets without extensive manual effort. Auto-annotation is a powerful feature of SAM 2, enabling users to generate segmentation datasets quickly and accurately by leveraging pre-trained models. This capability is particularly useful for creating large, high-quality datasets without extensive manual effort.
### How to Auto-Annotate with SAM2 ### How to Auto-Annotate with SAM 2
To auto-annotate your dataset using SAM2, follow this example: To auto-annotate your dataset using SAM 2, follow this example:
!!! Example "Auto-Annotation Example" !!! Example "Auto-Annotation Example"
@ -237,7 +237,7 @@ To auto-annotate your dataset using SAM2, follow this example:
| ------------ | ----------------------- | ------------------------------------------------------------------------------------------------------- | -------------- | | ------------ | ----------------------- | ------------------------------------------------------------------------------------------------------- | -------------- |
| `data` | `str` | Path to a folder containing images to be annotated. | | | `data` | `str` | Path to a folder containing images to be annotated. | |
| `det_model` | `str`, optional | Pre-trained YOLO detection model. Defaults to 'yolov8x.pt'. | `'yolov8x.pt'` | | `det_model` | `str`, optional | Pre-trained YOLO detection model. Defaults to 'yolov8x.pt'. | `'yolov8x.pt'` |
| `sam_model` | `str`, optional | Pre-trained SAM2 segmentation model. Defaults to 'sam2_b.pt'. | `'sam2_b.pt'` | | `sam_model` | `str`, optional | Pre-trained SAM 2 segmentation model. Defaults to 'sam2_b.pt'. | `'sam2_b.pt'` |
| `device` | `str`, optional | Device to run the models on. Defaults to an empty string (CPU or GPU, if available). | | | `device` | `str`, optional | Device to run the models on. Defaults to an empty string (CPU or GPU, if available). | |
| `output_dir` | `str`, `None`, optional | Directory to save the annotated results. Defaults to a 'labels' folder in the same directory as 'data'. | `None` | | `output_dir` | `str`, `None`, optional | Directory to save the annotated results. Defaults to a 'labels' folder in the same directory as 'data'. | `None` |
@ -245,26 +245,26 @@ This function facilitates the rapid creation of high-quality segmentation datase
## Limitations ## Limitations
Despite its strengths, SAM2 has certain limitations: Despite its strengths, SAM 2 has certain limitations:
- **Tracking Stability**: SAM2 may lose track of objects during extended sequences or significant viewpoint changes. - **Tracking Stability**: SAM 2 may lose track of objects during extended sequences or significant viewpoint changes.
- **Object Confusion**: The model can sometimes confuse similar-looking objects, particularly in crowded scenes. - **Object Confusion**: The model can sometimes confuse similar-looking objects, particularly in crowded scenes.
- **Efficiency with Multiple Objects**: Segmentation efficiency decreases when processing multiple objects simultaneously due to the lack of inter-object communication. - **Efficiency with Multiple Objects**: Segmentation efficiency decreases when processing multiple objects simultaneously due to the lack of inter-object communication.
- **Detail Accuracy**: May miss fine details, especially with fast-moving objects. Additional prompts can partially address this issue, but temporal smoothness is not guaranteed. - **Detail Accuracy**: May miss fine details, especially with fast-moving objects. Additional prompts can partially address this issue, but temporal smoothness is not guaranteed.
## Citations and Acknowledgements ## Citations and Acknowledgements
If SAM2 is a crucial part of your research or development work, please cite it using the following reference: If SAM 2 is a crucial part of your research or development work, please cite it using the following reference:
!!! Quote "" !!! Quote ""
=== "BibTeX" === "BibTeX"
```bibtex ```bibtex
@article{kirillov2024sam2, @article{ravi2024sam2,
title={SAM2: Segment Anything Model 2}, title={SAM 2: Segment Anything in Images and Videos},
author={Alexander Kirillov and others}, author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
journal={arXiv preprint arXiv:2401.12741}, journal={arXiv preprint},
year={2024} year={2024}
} }
``` ```
@ -273,9 +273,9 @@ We extend our gratitude to Meta AI for their contributions to the AI community w
## FAQ ## FAQ
### What is SAM2 and how does it improve upon the original Segment Anything Model (SAM)? ### What is SAM 2 and how does it improve upon the original Segment Anything Model (SAM)?
SAM2, the successor to Meta's [Segment Anything Model (SAM)](sam.md), is a cutting-edge tool designed for comprehensive object segmentation in both images and videos. It excels in handling complex visual data through a unified, promptable model architecture that supports real-time processing and zero-shot generalization. SAM2 offers several improvements over the original SAM, including: SAM 2, the successor to Meta's [Segment Anything Model (SAM)](sam.md), is a cutting-edge tool designed for comprehensive object segmentation in both images and videos. It excels in handling complex visual data through a unified, promptable model architecture that supports real-time processing and zero-shot generalization. SAM 2 offers several improvements over the original SAM, including:
- **Unified Model Architecture**: Combines image and video segmentation capabilities in a single model. - **Unified Model Architecture**: Combines image and video segmentation capabilities in a single model.
- **Real-Time Performance**: Processes approximately 44 frames per second, making it suitable for applications requiring immediate feedback. - **Real-Time Performance**: Processes approximately 44 frames per second, making it suitable for applications requiring immediate feedback.
@ -283,11 +283,11 @@ SAM2, the successor to Meta's [Segment Anything Model (SAM)](sam.md), is a cutti
- **Interactive Refinement**: Allows users to iteratively refine segmentation results by providing additional prompts. - **Interactive Refinement**: Allows users to iteratively refine segmentation results by providing additional prompts.
- **Advanced Handling of Visual Challenges**: Manages common video segmentation challenges like object occlusion and reappearance. - **Advanced Handling of Visual Challenges**: Manages common video segmentation challenges like object occlusion and reappearance.
For more details on SAM2's architecture and capabilities, explore the [SAM2 research paper](https://arxiv.org/abs/2401.12741). For more details on SAM 2's architecture and capabilities, explore the [SAM 2 research paper](https://arxiv.org/abs/2401.12741).
### How can I use SAM2 for real-time video segmentation? ### How can I use SAM 2 for real-time video segmentation?
SAM2 can be utilized for real-time video segmentation by leveraging its promptable interface and real-time inference capabilities. Here's a basic example: SAM 2 can be utilized for real-time video segmentation by leveraging its promptable interface and real-time inference capabilities. Here's a basic example:
!!! Example "Segment with Prompts" !!! Example "Segment with Prompts"
@ -296,10 +296,10 @@ SAM2 can be utilized for real-time video segmentation by leveraging its promptab
=== "Python" === "Python"
```python ```python
from ultralytics import SAM2 from ultralytics import SAM
# Load a model # Load a model
model = SAM2("sam2_b.pt") model = SAM("sam2_b.pt")
# Display model information (optional) # Display model information (optional)
model.info() model.info()
@ -311,21 +311,21 @@ SAM2 can be utilized for real-time video segmentation by leveraging its promptab
results = model("path/to/image.jpg", points=[150, 150], labels=[1]) results = model("path/to/image.jpg", points=[150, 150], labels=[1])
``` ```
For more comprehensive usage, refer to the [How to Use SAM2](#how-to-use-sam2-versatility-in-image-and-video-segmentation) section. For more comprehensive usage, refer to the [How to Use SAM 2](#how-to-use-sam-2-versatility-in-image-and-video-segmentation) section.
### What datasets are used to train SAM2, and how do they enhance its performance? ### What datasets are used to train SAM 2, and how do they enhance its performance?
SAM2 is trained on the SA-V dataset, one of the largest and most diverse video segmentation datasets available. The SA-V dataset includes: SAM 2 is trained on the SA-V dataset, one of the largest and most diverse video segmentation datasets available. The SA-V dataset includes:
- **51,000+ Videos**: Captured across 47 countries, providing a wide range of real-world scenarios. - **51,000+ Videos**: Captured across 47 countries, providing a wide range of real-world scenarios.
- **600,000+ Mask Annotations**: Detailed spatio-temporal mask annotations, referred to as "masklets," covering whole objects and parts. - **600,000+ Mask Annotations**: Detailed spatio-temporal mask annotations, referred to as "masklets," covering whole objects and parts.
- **Dataset Scale**: Features 4.5 times more videos and 53 times more annotations than previous largest datasets, offering unprecedented diversity and complexity. - **Dataset Scale**: Features 4.5 times more videos and 53 times more annotations than previous largest datasets, offering unprecedented diversity and complexity.
This extensive dataset allows SAM2 to achieve superior performance across major video segmentation benchmarks and enhances its zero-shot generalization capabilities. For more information, see the [SA-V Dataset](#sa-v-dataset) section. This extensive dataset allows SAM 2 to achieve superior performance across major video segmentation benchmarks and enhances its zero-shot generalization capabilities. For more information, see the [SA-V Dataset](#sa-v-dataset) section.
### How does SAM2 handle occlusions and object reappearances in video segmentation? ### How does SAM 2 handle occlusions and object reappearances in video segmentation?
SAM2 includes a sophisticated memory mechanism to manage temporal dependencies and occlusions in video data. The memory mechanism consists of: SAM 2 includes a sophisticated memory mechanism to manage temporal dependencies and occlusions in video data. The memory mechanism consists of:
- **Memory Encoder and Memory Bank**: Stores features from past frames. - **Memory Encoder and Memory Bank**: Stores features from past frames.
- **Memory Attention Module**: Utilizes stored information to maintain consistent object tracking over time. - **Memory Attention Module**: Utilizes stored information to maintain consistent object tracking over time.
@ -333,9 +333,9 @@ SAM2 includes a sophisticated memory mechanism to manage temporal dependencies a
This mechanism ensures continuity even when objects are temporarily obscured or exit and re-enter the scene. For more details, refer to the [Memory Mechanism and Occlusion Handling](#memory-mechanism-and-occlusion-handling) section. This mechanism ensures continuity even when objects are temporarily obscured or exit and re-enter the scene. For more details, refer to the [Memory Mechanism and Occlusion Handling](#memory-mechanism-and-occlusion-handling) section.
### How does SAM2 compare to other segmentation models like YOLOv8? ### How does SAM 2 compare to other segmentation models like YOLOv8?
SAM2 and Ultralytics YOLOv8 serve different purposes and excel in different areas. While SAM2 is designed for comprehensive object segmentation with advanced features like zero-shot generalization and real-time performance, YOLOv8 is optimized for speed and efficiency in object detection and segmentation tasks. Here's a comparison: SAM 2 and Ultralytics YOLOv8 serve different purposes and excel in different areas. While SAM 2 is designed for comprehensive object segmentation with advanced features like zero-shot generalization and real-time performance, YOLOv8 is optimized for speed and efficiency in object detection and segmentation tasks. Here's a comparison:
| Model | Size | Parameters | Speed (CPU) | | Model | Size | Parameters | Speed (CPU) |
| ---------------------------------------------- | -------------------------- | ---------------------- | -------------------------- | | ---------------------------------------------- | -------------------------- | ---------------------- | -------------------------- |

@ -239,7 +239,7 @@ nav:
- YOLOv9: models/yolov9.md - YOLOv9: models/yolov9.md
- YOLOv10: models/yolov10.md - YOLOv10: models/yolov10.md
- SAM (Segment Anything Model): models/sam.md - SAM (Segment Anything Model): models/sam.md
- SAM2 (Segment Anything Model 2): models/sam2.md - SAM2 (Segment Anything Model 2): models/sam-2.md
- MobileSAM (Mobile Segment Anything Model): models/mobile-sam.md - MobileSAM (Mobile Segment Anything Model): models/mobile-sam.md
- FastSAM (Fast Segment Anything Model): models/fast-sam.md - FastSAM (Fast Segment Anything Model): models/fast-sam.md
- YOLO-NAS (Neural Architecture Search): models/yolo-nas.md - YOLO-NAS (Neural Architecture Search): models/yolo-nas.md
@ -659,6 +659,7 @@ plugins:
sdk.md: index.md sdk.md: index.md
hub/inference_api.md: hub/inference-api.md hub/inference_api.md: hub/inference-api.md
usage/hyperparameter_tuning.md: integrations/ray-tune.md usage/hyperparameter_tuning.md: integrations/ray-tune.md
models/sam2.md: models/sam-2.md
reference/base_pred.md: reference/engine/predictor.md reference/base_pred.md: reference/engine/predictor.md
reference/base_trainer.md: reference/engine/trainer.md reference/base_trainer.md: reference/engine/trainer.md
reference/exporter.md: reference/engine/exporter.md reference/exporter.md: reference/engine/exporter.md

@ -1,5 +1,5 @@
# Ultralytics YOLO 🚀, AGPL-3.0 license # Ultralytics YOLO 🚀, AGPL-3.0 license
import contextlib
import gc import gc
import math import math
import os import os
@ -101,12 +101,15 @@ def autocast(enabled: bool, device: str = "cuda"):
def get_cpu_info(): def get_cpu_info():
"""Return a string with system CPU information, i.e. 'Apple M2'.""" """Return a string with system CPU information, i.e. 'Apple M2'."""
import cpuinfo # pip install py-cpuinfo with contextlib.suppress(Exception):
import cpuinfo # pip install py-cpuinfo
k = "brand_raw", "hardware_raw", "arch_string_raw" # keys sorted by preference (not all keys always available)
info = cpuinfo.get_cpu_info() # info dict
string = info.get(k[0] if k[0] in info else k[1] if k[1] in info else k[2], "unknown")
return string.replace("(R)", "").replace("CPU ", "").replace("@ ", "")
k = "brand_raw", "hardware_raw", "arch_string_raw" # info keys sorted by preference (not all keys always available) return "unknown"
info = cpuinfo.get_cpu_info() # info dict
string = info.get(k[0] if k[0] in info else k[1] if k[1] in info else k[2], "unknown")
return string.replace("(R)", "").replace("CPU ", "").replace("@ ", "")
def select_device(device="", batch=0, newline=False, verbose=True): def select_device(device="", batch=0, newline=False, verbose=True):

Loading…
Cancel
Save