New Meta Segment Anything Model 2 (SAM2) Docs page (#14794)

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: UltralyticsAssistant <web@ultralytics.com>
pull/14796/head
Glenn Jocher 3 months ago committed by GitHub
parent 6d5a68ec47
commit ff1fc55aac
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
  1. 35
      docs/en/guides/model-monitoring-and-maintenance.md
  2. 49
      docs/en/hub/index.md
  3. 87
      docs/en/integrations/ibm-watsonx.md
  4. 99
      docs/en/integrations/jupyterlab.md
  5. 47
      docs/en/integrations/kaggle.md
  6. 13
      docs/en/models/index.md
  7. 18
      docs/en/models/sam.md
  8. 347
      docs/en/models/sam2.md
  9. 1
      mkdocs.yml

@ -135,3 +135,38 @@ Using these resources will help you solve challenges and stay up-to-date with th
## Key Takeaways
We covered key tips for monitoring, maintaining, and documenting your computer vision models. Regular updates and re-training help the model adapt to new data patterns. Detecting and fixing data drift helps your model stay accurate. Continuous monitoring catches issues early, and good documentation makes collaboration and future updates easier. Following these steps will help your computer vision project stay successful and effective over time.
## FAQ
### How do I monitor the performance of my deployed computer vision model?
Monitoring the performance of your deployed computer vision model is crucial to ensure its accuracy and reliability over time. You can use tools like [Prometheus](https://prometheus.io/), [Grafana](https://grafana.com/), and [Evidently AI](https://www.evidentlyai.com/) to track key metrics, detect anomalies, and identify data drift. Regularly monitor inputs and outputs, set up alerts for unusual behavior, and use diverse data sources to get a comprehensive view of your model's performance. For more details, check out our section on [Model Monitoring](#model-monitoring-is-key).
### What are the best practices for maintaining computer vision models after deployment?
Maintaining computer vision models involves regular updates, retraining, and monitoring to ensure continued accuracy and relevance. Best practices include:
- **Continuous Monitoring**: Track performance metrics and data quality regularly.
- **Data Drift Detection**: Use statistical techniques to identify changes in data distributions.
- **Regular Updates and Retraining**: Implement incremental learning or periodic full retraining based on data changes.
- **Documentation**: Maintain detailed documentation of model architecture, training processes, and evaluation metrics. For more insights, visit our [Model Maintenance](#model-maintenance) section.
### Why is data drift detection important for AI models?
Data drift detection is essential because it helps identify when the statistical properties of the input data change over time, which can degrade model performance. Techniques like continuous monitoring, statistical tests (e.g., Kolmogorov-Smirnov test), and feature drift analysis can help spot issues early. Addressing data drift ensures that your model remains accurate and relevant in changing environments. Learn more about data drift detection in our [Data Drift Detection](#data-drift-detection) section.
### What tools can I use for anomaly detection in computer vision models?
For anomaly detection in computer vision models, tools like [Prometheus](https://prometheus.io/), [Grafana](https://grafana.com/), and [Evidently AI](https://www.evidentlyai.com/) are highly effective. These tools can help you set up alert systems to detect unusual data points or patterns that deviate from expected behavior. Configurable alerts and standardized messages can help you respond quickly to potential issues. Explore more in our [Anomaly Detection and Alert Systems](#anomaly-detection-and-alert-systems) section.
### How can I document my computer vision project effectively?
Effective documentation of a computer vision project should include:
- **Project Overview**: High-level summary, problem statement, and solution approach.
- **Model Architecture**: Details of the model structure, components, and hyperparameters.
- **Data Preparation**: Information on data sources, preprocessing steps, and transformations.
- **Training Process**: Description of the training procedure, datasets used, and challenges encountered.
- **Evaluation Metrics**: Metrics used for performance evaluation and analysis.
- **Deployment Steps**: Steps taken for model deployment and any specific challenges.
- **Monitoring and Maintenance Procedure**: Plan for ongoing monitoring and maintenance. For more comprehensive guidelines, refer to our [Documentation](#documentation) section.

@ -76,3 +76,52 @@ We hope that the resources here will help you get the most out of HUB. Please br
- [**Ultralytics HUB App**](app/index.md): Learn about the Ultralytics HUB App, which allows you to run models directly on your mobile device.
- [**iOS**](app/ios.md): Explore CoreML acceleration on iPhones and iPads.
- [**Android**](app/android.md): Explore TFLite acceleration on Android devices.
## FAQ
### How do I get started with Ultralytics HUB for training YOLO models?
To get started with [Ultralytics HUB](https://ultralytics.com/hub), follow these steps:
1. **Sign Up:** Create an account on the [Ultralytics HUB](https://ultralytics.com/hub).
2. **Upload Dataset:** Navigate to the [Datasets](datasets.md) section to upload your custom dataset.
3. **Train Model:** Go to the [Models](models.md) section and select a pre-trained YOLOv5 or YOLOv8 model to start training.
4. **Deploy Model:** Once trained, preview and deploy your model using the [Ultralytics HUB App](app/index.md) for real-time tasks.
For a detailed guide, refer to the [Quickstart](quickstart.md) page.
### What are the benefits of using Ultralytics HUB over other AI platforms?
[Ultralytics HUB](https://ultralytics.com/hub) offers several unique benefits:
- **User-Friendly Interface:** Intuitive design for easy dataset uploads and model training.
- **Pre-Trained Models:** Access to a variety of pre-trained YOLOv5 and YOLOv8 models.
- **Cloud Training:** Seamless cloud training capabilities, detailed on the [Cloud Training](cloud-training.md) page.
- **Real-Time Deployment:** Effortlessly deploy models for real-time applications using the [Ultralytics HUB App](app/index.md).
- **Team Collaboration:** Collaborate with your team efficiently through the [Teams](teams.md) feature.
Explore more about the advantages in our [Ultralytics HUB Blog](https://www.ultralytics.com/blog/ultralytics-hub-a-game-changer-for-computer-vision).
### Can I use Ultralytics HUB for object detection on mobile devices?
Yes, Ultralytics HUB supports object detection on mobile devices. You can run YOLOv5 and YOLOv8 models on both iOS and Android devices using the Ultralytics HUB App. For more details:
- **iOS:** Learn about CoreML acceleration on iPhones and iPads in the [iOS](app/ios.md) section.
- **Android:** Explore TFLite acceleration on Android devices in the [Android](app/android.md) section.
### How do I manage and organize my projects in Ultralytics HUB?
Ultralytics HUB allows you to manage and organize your projects efficiently. You can group your models into projects for better organization. To learn more:
- Visit the [Projects](projects.md) page for detailed instructions on creating, editing, and managing projects.
- Use the [Teams](teams.md) feature to collaborate with team members and share resources.
### What integrations are available with Ultralytics HUB?
Ultralytics HUB offers seamless integrations with various platforms to enhance your machine learning workflows. Some key integrations include:
- **Roboflow:** For dataset management and model training. Learn more on the [Integrations](integrations.md) page.
- **Google Colab:** Efficiently train models using Google Colab's cloud-based environment. Detailed steps are available in the [Google Colab](https://docs.ultralytics.com/integrations/google-colab) section.
- **Weights & Biases:** For enhanced experiment tracking and visualization. Explore the [Weights & Biases](https://docs.ultralytics.com/integrations/weights-biases) integration.
For a complete list of integrations, refer to the [Integrations](integrations.md) page.

@ -321,3 +321,90 @@ We explored IBM Watsonx key features, and how to train a YOLOv8 model using IBM
For further details on usage, visit [IBM Watsonx official documentation](https://www.ibm.com/watsonx).
Also, be sure to check out the [Ultralytics integration guide page](./index.md), to learn more about different exciting integrations.
## FAQ
### How do I train a YOLOv8 model using IBM Watsonx?
To train a YOLOv8 model using IBM Watsonx, follow these steps:
1. **Set Up Your Environment**: Create an IBM Cloud account and set up a Watsonx.ai project. Use a Jupyter Notebook for your coding environment.
2. **Install Libraries**: Install necessary libraries like `torch`, `opencv`, and `ultralytics`.
3. **Load Data**: Use the Kaggle API to load your dataset into Watsonx.
4. **Preprocess Data**: Organize your dataset into the required directory structure and update the `.yaml` configuration file.
5. **Train the Model**: Use the YOLO command-line interface to train your model with specific parameters like `epochs`, `batch size`, and `learning rate`.
6. **Test and Evaluate**: Run inference to test the model and evaluate its performance using metrics like precision and recall.
For detailed instructions, refer to our [YOLOv8 Model Training guide](../modes/train.md).
### What are the key features of IBM Watsonx for AI model training?
IBM Watsonx offers several key features for AI model training:
- **Watsonx.ai**: Provides tools for AI development, including access to IBM-supported custom models and third-party models like Llama 3. It includes the Prompt Lab, Tuning Studio, and Flows Engine for comprehensive AI lifecycle management.
- **Watsonx.data**: Supports cloud and on-premises deployments, offering centralized data access, efficient query engines like Presto and Spark, and an AI-powered semantic layer.
- **Watsonx.governance**: Automates compliance, manages risk with alerts, and provides tools for detecting issues like bias and drift. It also includes dashboards and reporting tools for collaboration.
For more information, visit the [IBM Watsonx official documentation](https://www.ibm.com/watsonx).
### Why should I use IBM Watsonx for training Ultralytics YOLOv8 models?
IBM Watsonx is an excellent choice for training Ultralytics YOLOv8 models due to its comprehensive suite of tools that streamline the AI lifecycle. Key benefits include:
- **Scalability**: Easily scale your model training with IBM Cloud services.
- **Integration**: Seamlessly integrate with various data sources and APIs.
- **User-Friendly Interface**: Simplifies the development process with a collaborative and intuitive interface.
- **Advanced Tools**: Access to powerful tools like the Prompt Lab, Tuning Studio, and Flows Engine for enhancing model performance.
Learn more about [Ultralytics YOLOv8](https://github.com/ultralytics/ultralytics) and how to train models using IBM Watsonx in our [integration guide](./index.md).
### How can I preprocess my dataset for YOLOv8 training on IBM Watsonx?
To preprocess your dataset for YOLOv8 training on IBM Watsonx:
1. **Organize Directories**: Ensure your dataset follows the YOLO directory structure with separate subdirectories for images and labels within the train/val/test split.
2. **Update .yaml File**: Modify the `.yaml` configuration file to reflect the new directory structure and class names.
3. **Run Preprocessing Script**: Use a Python script to reorganize your dataset and update the `.yaml` file accordingly.
Here's a sample script to organize your dataset:
```python
import os
import shutil
def organize_files(directory):
for subdir in ["train", "test", "val"]:
subdir_path = os.path.join(directory, subdir)
if not os.path.exists(subdir_path):
continue
images_dir = os.path.join(subdir_path, "images")
labels_dir = os.path.join(subdir_path, "labels")
os.makedirs(images_dir, exist_ok=True)
os.makedirs(labels_dir, exist_ok=True)
for filename in os.listdir(subdir_path):
if filename.endswith(".txt"):
shutil.move(os.path.join(subdir_path, filename), os.path.join(labels_dir, filename))
elif filename.endswith(".jpg") or filename.endswith(".png") or filename.endswith(".jpeg"):
shutil.move(os.path.join(subdir_path, filename), os.path.join(images_dir, filename))
if __name__ == "__main__":
directory = f"{work_dir}/trash_ICRA19/dataset"
organize_files(directory)
```
For more details, refer to our [data preprocessing guide](../guides/preprocessing_annotated_data.md).
### What are the prerequisites for training a YOLOv8 model on IBM Watsonx?
Before you start training a YOLOv8 model on IBM Watsonx, ensure you have the following prerequisites:
- **IBM Cloud Account**: Create an account on IBM Cloud to access Watsonx.ai.
- **Kaggle Account**: For loading datasets, you'll need a Kaggle account and an API key.
- **Jupyter Notebook**: Set up a Jupyter Notebook environment within Watsonx.ai for coding and model training.
For more information on setting up your environment, visit our [Ultralytics Installation guide](../quickstart.md).

@ -108,3 +108,102 @@ We've explored how JupyterLab can be a powerful tool for experimenting with Ultr
For more details, visit the [JupyterLab FAQ Page](https://jupyterlab.readthedocs.io/en/stable/getting_started/faq.html).
Interested in more YOLOv8 integrations? Check out the [Ultralytics integration guide](./index.md) to explore additional tools and capabilities for your machine learning projects.
## FAQ
### How do I use JupyterLab to train a YOLOv8 model?
To train a YOLOv8 model using JupyterLab:
1. Install JupyterLab and the Ultralytics package:
```python
pip install jupyterlab ultralytics
```
2. Launch JupyterLab and open a new notebook.
3. Import the YOLO model and load a pretrained model:
```python
from ultralytics import YOLO
model = YOLO("yolov8n.pt")
```
4. Train the model on your custom dataset:
```python
results = model.train(data="path/to/your/data.yaml", epochs=100, imgsz=640)
```
5. Visualize training results using JupyterLab's built-in plotting capabilities:
```python
%matplotlib inline
from ultralytics.utils.plotting import plot_results
plot_results(results)
```
JupyterLab's interactive environment allows you to easily modify parameters, visualize results, and iterate on your model training process.
### What are the key features of JupyterLab that make it suitable for YOLOv8 projects?
JupyterLab offers several features that make it ideal for YOLOv8 projects:
1. Interactive code execution: Test and debug YOLOv8 code snippets in real-time.
2. Integrated file browser: Easily manage datasets, model weights, and configuration files.
3. Flexible layout: Arrange multiple notebooks, terminals, and output windows side-by-side for efficient workflow.
4. Rich output display: Visualize YOLOv8 detection results, training curves, and model performance metrics inline.
5. Markdown support: Document your YOLOv8 experiments and findings with rich text and images.
6. Extension ecosystem: Enhance functionality with extensions for version control, [remote computing](google-colab.md), and more.
These features allow for a seamless development experience when working with YOLOv8 models, from data preparation to model deployment.
### How can I optimize YOLOv8 model performance using JupyterLab?
To optimize YOLOv8 model performance in JupyterLab:
1. Use the autobatch feature to determine the optimal batch size:
```python
from ultralytics.utils.autobatch import autobatch
optimal_batch_size = autobatch(model)
```
2. Implement [hyperparameter tuning](../guides/hyperparameter-tuning.md) using libraries like Ray Tune:
```python
from ultralytics.utils.tuner import run_ray_tune
best_results = run_ray_tune(model, data="path/to/data.yaml")
```
3. Visualize and analyze model metrics using JupyterLab's plotting capabilities:
```python
from ultralytics.utils.plotting import plot_results
plot_results(results.results_dict)
```
4. Experiment with different model architectures and [export formats](../modes/export.md) to find the best balance of speed and accuracy for your specific use case.
JupyterLab's interactive environment allows for quick iterations and real-time feedback, making it easier to optimize your YOLOv8 models efficiently.
### How do I handle common issues when working with JupyterLab and YOLOv8?
When working with JupyterLab and YOLOv8, you might encounter some common issues. Here's how to handle them:
1. GPU memory issues:
- Use `torch.cuda.empty_cache()` to clear GPU memory between runs.
- Adjust batch size or image size to fit your GPU memory.
2. Package conflicts:
- Create a separate conda environment for your YOLOv8 projects to avoid conflicts.
- Use `!pip install package_name` in a notebook cell to install missing packages.
3. Kernel crashes:
- Restart the kernel and run cells one by one to identify the problematic code.

@ -90,3 +90,50 @@ We've seen how Kaggle can boost your YOLOv8 projects by providing free access to
For more details, visit [Kaggle's documentation](https://www.kaggle.com/docs).
Interested in more YOLOv8 integrations? Check out the[ Ultralytics integration guide](https://docs.ultralytics.com/integrations/) to explore additional tools and capabilities for your machine learning projects.
## FAQ
### How do I train a YOLOv8 model on Kaggle?
Training a YOLOv8 model on Kaggle is straightforward. First, access the [Kaggle YOLOv8 Notebook](https://www.kaggle.com/ultralytics/yolov8). Sign in to your Kaggle account, copy and edit the notebook, and select a GPU under the accelerator settings. Run the notebook cells to start training. For more detailed steps, refer to our [YOLOv8 Model Training guide](../modes/train.md).
### What are the benefits of using Kaggle for YOLOv8 model training?
Kaggle offers several advantages for training YOLOv8 models:
- **Free GPU Access**: Utilize powerful GPUs like Nvidia Tesla P100 or T4 x2 for up to 30 hours per week.
- **Pre-installed Libraries**: Libraries like TensorFlow and PyTorch are pre-installed, simplifying the setup.
- **Community Collaboration**: Engage with a vast community of data scientists and machine learning enthusiasts.
- **Version Control**: Easily manage different versions of your notebooks and revert to previous versions if needed.
For more details, visit our [Ultralytics integration guide](https://docs.ultralytics.com/integrations/).
### What common issues might I encounter when using Kaggle for YOLOv8, and how can I resolve them?
Common issues include:
- **Access to GPUs**: Ensure you activate a GPU in your notebook settings. Kaggle allows up to 30 hours of GPU usage per week.
- **Dataset Licenses**: Check the license of each dataset to understand usage restrictions.
- **Saving and Committing Notebooks**: Click "Save Version" to save your notebook's state and access output files from the Output tab.
- **Collaboration**: Kaggle supports asynchronous collaboration; multiple users cannot edit a notebook simultaneously.
For more troubleshooting tips, see our [Common Issues guide](../guides/yolo-common-issues.md).
### Why should I choose Kaggle over other platforms like Google Colab for training YOLOv8 models?
Kaggle offers unique features that make it an excellent choice:
- **Public Notebooks**: Share your work with the community for feedback and collaboration.
- **Free Access to TPUs**: Speed up training with powerful TPUs without extra costs.
- **Comprehensive History**: Track changes over time with a detailed history of notebook commits.
- **Resource Availability**: Significant resources are provided for each notebook session, including 12 hours of execution time for CPU and GPU sessions.
For a comparison with Google Colab, refer to our [Google Colab guide](./google-colab.md).
### How can I revert to a previous version of my Kaggle notebook?
To revert to a previous version:
1. Open the notebook and click on the three vertical dots in the top right corner.
2. Select "View Versions."
3. Find the version you want to revert to, click on the "..." menu next to it, and select "Revert to Version."
4. Click "Save Version" to commit the changes.

@ -20,12 +20,13 @@ Here are some of the key models supported:
6. **[YOLOv8](yolov8.md) NEW 🚀**: The latest version of the YOLO family, featuring enhanced capabilities such as instance segmentation, pose/keypoints estimation, and classification.
7. **[YOLOv9](yolov9.md)**: An experimental model trained on the Ultralytics [YOLOv5](yolov5.md) codebase implementing Programmable Gradient Information (PGI).
8. **[YOLOv10](yolov10.md)**: By Tsinghua University, featuring NMS-free training and efficiency-accuracy driven architecture, delivering state-of-the-art performance and latency.
9. **[Segment Anything Model (SAM)](sam.md)**: Meta's Segment Anything Model (SAM).
10. **[Mobile Segment Anything Model (MobileSAM)](mobile-sam.md)**: MobileSAM for mobile applications, by Kyung Hee University.
11. **[Fast Segment Anything Model (FastSAM)](fast-sam.md)**: FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
12. **[YOLO-NAS](yolo-nas.md)**: YOLO Neural Architecture Search (NAS) Models.
13. **[Realtime Detection Transformers (RT-DETR)](rtdetr.md)**: Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.
14. **[YOLO-World](yolo-world.md)**: Real-time Open Vocabulary Object Detection models from Tencent AI Lab.
9. **[Segment Anything Model (SAM)](sam.md)**: Meta's original Segment Anything Model (SAM).
10. **[Segment Anything Model 2 (SAM2)](sam2.md)**: The next generation of Meta's Segment Anything Model (SAM) for videos and images.
11. **[Mobile Segment Anything Model (MobileSAM)](mobile-sam.md)**: MobileSAM for mobile applications, by Kyung Hee University.
12. **[Fast Segment Anything Model (FastSAM)](fast-sam.md)**: FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
13. **[YOLO-NAS](yolo-nas.md)**: YOLO Neural Architecture Search (NAS) Models.
14. **[Realtime Detection Transformers (RT-DETR)](rtdetr.md)**: Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.
15. **[YOLO-World](yolo-world.md)**: Real-time Open Vocabulary Object Detection models from Tencent AI Lab.
<p align="center">
<br>

@ -195,13 +195,13 @@ To auto-annotate your dataset with the Ultralytics framework, use the `auto_anno
auto_annotate(data="path/to/images", det_model="yolov8x.pt", sam_model="sam_b.pt")
```
| Argument | Type | Description | Default |
| ---------- | ------------------- | ------------------------------------------------------------------------------------------------------- | ------------ |
| data | str | Path to a folder containing images to be annotated. | |
| det_model | str, optional | Pre-trained YOLO detection model. Defaults to 'yolov8x.pt'. | 'yolov8x.pt' |
| sam_model | str, optional | Pre-trained SAM segmentation model. Defaults to 'sam_b.pt'. | 'sam_b.pt' |
| device | str, optional | Device to run the models on. Defaults to an empty string (CPU or GPU, if available). | |
| output_dir | str, None, optional | Directory to save the annotated results. Defaults to a 'labels' folder in the same directory as 'data'. | None |
| Argument | Type | Description | Default |
| ------------ | --------------------- | ------------------------------------------------------------------------------------------------------- | -------------- |
| `data` | `str` | Path to a folder containing images to be annotated. | |
| `det_model` | `str`, optional | Pre-trained YOLO detection model. Defaults to 'yolov8x.pt'. | `'yolov8x.pt'` |
| `sam_model` | `str`, optional | Pre-trained SAM segmentation model. Defaults to 'sam_b.pt'. | `'sam_b.pt'` |
| `device` | `str`, optional | Device to run the models on. Defaults to an empty string (CPU or GPU, if available). | |
| `output_dir` | `str`, None, optional | Directory to save the annotated results. Defaults to a 'labels' folder in the same directory as 'data'. | `None` |
The `auto_annotate` function takes the path to your images, with optional arguments for specifying the pre-trained detection and SAM segmentation models, the device to run the models on, and the output directory for saving the annotated results.
@ -278,7 +278,3 @@ This function takes the path to your images and optional arguments for pre-train
### What datasets are used to train the Segment Anything Model (SAM)?
SAM is trained on the extensive [SA-1B dataset](https://ai.facebook.com/datasets/segment-anything/) which comprises over 1 billion masks across 11 million images. SA-1B is the largest segmentation dataset to date, providing high-quality and diverse training data, ensuring impressive zero-shot performance in varied segmentation tasks. For more details, visit the [Dataset section](#key-features-of-the-segment-anything-model-sam).
---
This FAQ aims to address common questions related to the Segment Anything Model (SAM) from Ultralytics, enhancing user understanding and facilitating effective use of Ultralytics products. For additional information, explore the relevant sections linked throughout.

@ -0,0 +1,347 @@
---
comments: true
description: Discover SAM2, the next generation of Meta's Segment Anything Model, supporting real-time promptable segmentation in both images and videos with state-of-the-art performance. Learn about its key features, datasets, and how to use it.
keywords: SAM2, Segment Anything, video segmentation, image segmentation, promptable segmentation, zero-shot performance, SA-V dataset, Ultralytics, real-time segmentation, AI, machine learning
---
!!! Note "🚧 SAM2 Integration In Progress 🚧"
The SAM2 features described in this documentation are currently not enabled in the `ultralytics` package. The Ultralytics team is actively working on integrating SAM2, and these capabilities should be available soon. We appreciate your patience as we work to implement this exciting new model.
# SAM2: Segment Anything Model 2
SAM2, the successor to Meta's [Segment Anything Model (SAM)](sam.md), is a cutting-edge tool designed for comprehensive object segmentation in both images and videos. It excels in handling complex visual data through a unified, promptable model architecture that supports real-time processing and zero-shot generalization.
![SAM2 Example Results](https://github.com/facebookresearch/segment-anything-2/raw/main/assets/sa_v_dataset.jpg?raw=true)
## Key Features
### Unified Model Architecture
SAM2 combines the capabilities of image and video segmentation in a single model. This unification simplifies deployment and allows for consistent performance across different media types. It leverages a flexible prompt-based interface, enabling users to specify objects of interest through various prompt types, such as points, bounding boxes, or masks.
### Real-Time Performance
The model achieves real-time inference speeds, processing approximately 44 frames per second. This makes SAM2 suitable for applications requiring immediate feedback, such as video editing and augmented reality.
### Zero-Shot Generalization
SAM2 can segment objects it has never encountered before, demonstrating strong zero-shot generalization. This is particularly useful in diverse or evolving visual domains where pre-defined categories may not cover all possible objects.
### Interactive Refinement
Users can iteratively refine the segmentation results by providing additional prompts, allowing for precise control over the output. This interactivity is essential for fine-tuning results in applications like video annotation or medical imaging.
### Advanced Handling of Visual Challenges
SAM2 includes mechanisms to manage common video segmentation challenges, such as object occlusion and reappearance. It uses a sophisticated memory mechanism to keep track of objects across frames, ensuring continuity even when objects are temporarily obscured or exit and re-enter the scene.
For a deeper understanding of SAM2's architecture and capabilities, explore the [SAM2 research paper](https://arxiv.org/abs/2401.12741).
## Performance and Technical Details
SAM2 sets a new benchmark in the field, outperforming previous models on various metrics:
| Metric | SAM2 | Previous SOTA |
| ---------------------------------- | ------------- | ------------- |
| **Interactive Video Segmentation** | **Best** | - |
| **Human Interactions Required** | **3x fewer** | Baseline |
| **Image Segmentation Accuracy** | **Improved** | SAM |
| **Inference Speed** | **6x faster** | SAM |
## Model Architecture
### Core Components
- **Image and Video Encoder**: Utilizes a transformer-based architecture to extract high-level features from both images and video frames. This component is responsible for understanding the visual content at each timestep.
- **Prompt Encoder**: Processes user-provided prompts (points, boxes, masks) to guide the segmentation task. This allows SAM2 to adapt to user input and target specific objects within a scene.
- **Memory Mechanism**: Includes a memory encoder, memory bank, and memory attention module. These components collectively store and utilize information from past frames, enabling the model to maintain consistent object tracking over time.
- **Mask Decoder**: Generates the final segmentation masks based on the encoded image features and prompts. In video, it also uses memory context to ensure accurate tracking across frames.
![SAM2 Architecture Diagram](https://github.com/facebookresearch/segment-anything-2/blob/main/assets/model_diagram.png?raw=true)
### Memory Mechanism and Occlusion Handling
The memory mechanism allows SAM2 to handle temporal dependencies and occlusions in video data. As objects move and interact, SAM2 records their features in a memory bank. When an object becomes occluded, the model can rely on this memory to predict its position and appearance when it reappears. The occlusion head specifically handles scenarios where objects are not visible, predicting the likelihood of an object being occluded.
### Multi-Mask Ambiguity Resolution
In situations with ambiguity (e.g., overlapping objects), SAM2 can generate multiple mask predictions. This feature is crucial for accurately representing complex scenes where a single mask might not sufficiently describe the scene's nuances.
## SA-V Dataset
The SA-V dataset, developed for SAM2's training, is one of the largest and most diverse video segmentation datasets available. It includes:
- **51,000+ Videos**: Captured across 47 countries, providing a wide range of real-world scenarios.
- **600,000+ Mask Annotations**: Detailed spatio-temporal mask annotations, referred to as "masklets," covering whole objects and parts.
- **Dataset Scale**: It features 4.5 times more videos and 53 times more annotations than previous largest datasets, offering unprecedented diversity and complexity.
## Benchmarks
### Video Object Segmentation
SAM2 has demonstrated superior performance across major video segmentation benchmarks:
| Dataset | J&F | J | F |
| --------------- | ---- | ---- | ---- |
| **DAVIS 2017** | 82.5 | 79.8 | 85.2 |
| **YouTube-VOS** | 81.2 | 78.9 | 83.5 |
### Interactive Segmentation
In interactive segmentation tasks, SAM2 shows significant efficiency and accuracy:
| Dataset | NoC@90 | AUC |
| --------------------- | ------ | ----- |
| **DAVIS Interactive** | 1.54 | 0.872 |
## Installation
To install SAM2, use the following command. All SAM2 models will automatically download on first use.
```bash
pip install ultralytics
```
## How to Use SAM2: Versatility in Image and Video Segmentation
!!! Note "🚧 SAM2 Integration In Progress 🚧"
The SAM2 features described in this documentation are currently not enabled in the `ultralytics` package. The Ultralytics team is actively working on integrating SAM2, and these capabilities should be available soon. We appreciate your patience as we work to implement this exciting new model.
The following table details the available SAM2 models, their pre-trained weights, supported tasks, and compatibility with different operating modes like [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md).
| Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export |
| ---------- | ------------------------------------------------------------------------------------- | -------------------------------------------- | --------- | ---------- | -------- | ------ |
| SAM2 base | [sam2_b.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/sam2_b.pt) | [Instance Segmentation](../tasks/segment.md) | ✅ | ❌ | ❌ | ❌ |
| SAM2 large | [sam2_l.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/sam2_l.pt) | [Instance Segmentation](../tasks/segment.md) | ✅ | ❌ | ❌ | ❌ |
### SAM2 Prediction Examples
SAM2 can be utilized across a broad spectrum of tasks, including real-time video editing, medical imaging, and autonomous systems. Its ability to segment both static and dynamic visual data makes it a versatile tool for researchers and developers.
#### Segment with Prompts
!!! Example "Segment with Prompts"
Use prompts to segment specific objects in images or videos.
=== "Python"
```python
from ultralytics import SAM2
# Load a model
model = SAM2("sam2_b.pt")
# Display model information (optional)
model.info()
# Segment with bounding box prompt
results = model("path/to/image.jpg", bboxes=[100, 100, 200, 200])
# Segment with point prompt
results = model("path/to/image.jpg", points=[150, 150], labels=[1])
```
#### Segment Everything
!!! Example "Segment Everything"
Segment the entire image or video content without specific prompts.
=== "Python"
```python
from ultralytics import SAM2
# Load a model
model = SAM2("sam2_b.pt")
# Display model information (optional)
model.info()
# Run inference
model("path/to/video.mp4")
```
=== "CLI"
```bash
# Run inference with a SAM2 model
yolo predict model=sam2_b.pt source=path/to/video.mp4
```
- This example demonstrates how SAM2 can be used to segment the entire content of an image or video if no prompts (bboxes/points/masks) are provided.
## SAM comparison vs YOLOv8
Here we compare Meta's smallest SAM model, SAM-b, with Ultralytics smallest segmentation model, [YOLOv8n-seg](../tasks/segment.md):
| Model | Size | Parameters | Speed (CPU) |
| ---------------------------------------------- | -------------------------- | ---------------------- | -------------------------- |
| Meta's SAM-b | 358 MB | 94.7 M | 51096 ms/im |
| [MobileSAM](mobile-sam.md) | 40.7 MB | 10.1 M | 46122 ms/im |
| [FastSAM-s](fast-sam.md) with YOLOv8 backbone | 23.7 MB | 11.8 M | 115 ms/im |
| Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7 MB** (53.4x smaller) | **3.4 M** (27.9x less) | **59 ms/im** (866x faster) |
This comparison shows the order-of-magnitude differences in the model sizes and speeds between models. Whereas SAM presents unique capabilities for automatic segmenting, it is not a direct competitor to YOLOv8 segment models, which are smaller, faster and more efficient.
Tests run on a 2023 Apple M2 Macbook with 16GB of RAM. To reproduce this test:
!!! Example
=== "Python"
```python
from ultralytics import SAM, YOLO, FastSAM
# Profile SAM-b
model = SAM("sam_b.pt")
model.info()
model("ultralytics/assets")
# Profile MobileSAM
model = SAM("mobile_sam.pt")
model.info()
model("ultralytics/assets")
# Profile FastSAM-s
model = FastSAM("FastSAM-s.pt")
model.info()
model("ultralytics/assets")
# Profile YOLOv8n-seg
model = YOLO("yolov8n-seg.pt")
model.info()
model("ultralytics/assets")
```
## Auto-Annotation: Efficient Dataset Creation
Auto-annotation is a powerful feature of SAM2, enabling users to generate segmentation datasets quickly and accurately by leveraging pre-trained models. This capability is particularly useful for creating large, high-quality datasets without extensive manual effort.
### How to Auto-Annotate with SAM2
To auto-annotate your dataset using SAM2, follow this example:
!!! Example "Auto-Annotation Example"
```python
from ultralytics.data.annotator import auto_annotate
auto_annotate(data="path/to/images", det_model="yolov8x.pt", sam_model="sam2_b.pt")
```
| Argument | Type | Description | Default |
| ------------ | ----------------------- | ------------------------------------------------------------------------------------------------------- | -------------- |
| `data` | `str` | Path to a folder containing images to be annotated. | |
| `det_model` | `str`, optional | Pre-trained YOLO detection model. Defaults to 'yolov8x.pt'. | `'yolov8x.pt'` |
| `sam_model` | `str`, optional | Pre-trained SAM2 segmentation model. Defaults to 'sam2_b.pt'. | `'sam2_b.pt'` |
| `device` | `str`, optional | Device to run the models on. Defaults to an empty string (CPU or GPU, if available). | |
| `output_dir` | `str`, `None`, optional | Directory to save the annotated results. Defaults to a 'labels' folder in the same directory as 'data'. | `None` |
This function facilitates the rapid creation of high-quality segmentation datasets, ideal for researchers and developers aiming to accelerate their projects.
## Limitations
Despite its strengths, SAM2 has certain limitations:
- **Tracking Stability**: SAM2 may lose track of objects during extended sequences or significant viewpoint changes.
- **Object Confusion**: The model can sometimes confuse similar-looking objects, particularly in crowded scenes.
- **Efficiency with Multiple Objects**: Segmentation efficiency decreases when processing multiple objects simultaneously due to the lack of inter-object communication.
- **Detail Accuracy**: May miss fine details, especially with fast-moving objects. Additional prompts can partially address this issue, but temporal smoothness is not guaranteed.
## Citations and Acknowledgements
If SAM2 is a crucial part of your research or development work, please cite it using the following reference:
!!! Quote ""
=== "BibTeX"
```bibtex
@article{kirillov2024sam2,
title={SAM2: Segment Anything Model 2},
author={Alexander Kirillov and others},
journal={arXiv preprint arXiv:2401.12741},
year={2024}
}
```
We extend our gratitude to Meta AI for their contributions to the AI community with this groundbreaking model and dataset.
## FAQ
### What is SAM2 and how does it improve upon the original Segment Anything Model (SAM)?
SAM2, the successor to Meta's [Segment Anything Model (SAM)](sam.md), is a cutting-edge tool designed for comprehensive object segmentation in both images and videos. It excels in handling complex visual data through a unified, promptable model architecture that supports real-time processing and zero-shot generalization. SAM2 offers several improvements over the original SAM, including:
- **Unified Model Architecture**: Combines image and video segmentation capabilities in a single model.
- **Real-Time Performance**: Processes approximately 44 frames per second, making it suitable for applications requiring immediate feedback.
- **Zero-Shot Generalization**: Segments objects it has never encountered before, useful in diverse visual domains.
- **Interactive Refinement**: Allows users to iteratively refine segmentation results by providing additional prompts.
- **Advanced Handling of Visual Challenges**: Manages common video segmentation challenges like object occlusion and reappearance.
For more details on SAM2's architecture and capabilities, explore the [SAM2 research paper](https://arxiv.org/abs/2401.12741).
### How can I use SAM2 for real-time video segmentation?
SAM2 can be utilized for real-time video segmentation by leveraging its promptable interface and real-time inference capabilities. Here's a basic example:
!!! Example "Segment with Prompts"
Use prompts to segment specific objects in images or videos.
=== "Python"
```python
from ultralytics import SAM2
# Load a model
model = SAM2("sam2_b.pt")
# Display model information (optional)
model.info()
# Segment with bounding box prompt
results = model("path/to/image.jpg", bboxes=[100, 100, 200, 200])
# Segment with point prompt
results = model("path/to/image.jpg", points=[150, 150], labels=[1])
```
For more comprehensive usage, refer to the [How to Use SAM2](#how-to-use-sam2-versatility-in-image-and-video-segmentation) section.
### What datasets are used to train SAM2, and how do they enhance its performance?
SAM2 is trained on the SA-V dataset, one of the largest and most diverse video segmentation datasets available. The SA-V dataset includes:
- **51,000+ Videos**: Captured across 47 countries, providing a wide range of real-world scenarios.
- **600,000+ Mask Annotations**: Detailed spatio-temporal mask annotations, referred to as "masklets," covering whole objects and parts.
- **Dataset Scale**: Features 4.5 times more videos and 53 times more annotations than previous largest datasets, offering unprecedented diversity and complexity.
This extensive dataset allows SAM2 to achieve superior performance across major video segmentation benchmarks and enhances its zero-shot generalization capabilities. For more information, see the [SA-V Dataset](#sa-v-dataset) section.
### How does SAM2 handle occlusions and object reappearances in video segmentation?
SAM2 includes a sophisticated memory mechanism to manage temporal dependencies and occlusions in video data. The memory mechanism consists of:
- **Memory Encoder and Memory Bank**: Stores features from past frames.
- **Memory Attention Module**: Utilizes stored information to maintain consistent object tracking over time.
- **Occlusion Head**: Specifically handles scenarios where objects are not visible, predicting the likelihood of an object being occluded.
This mechanism ensures continuity even when objects are temporarily obscured or exit and re-enter the scene. For more details, refer to the [Memory Mechanism and Occlusion Handling](#memory-mechanism-and-occlusion-handling) section.
### How does SAM2 compare to other segmentation models like YOLOv8?
SAM2 and Ultralytics YOLOv8 serve different purposes and excel in different areas. While SAM2 is designed for comprehensive object segmentation with advanced features like zero-shot generalization and real-time performance, YOLOv8 is optimized for speed and efficiency in object detection and segmentation tasks. Here's a comparison:
| Model | Size | Parameters | Speed (CPU) |
| ---------------------------------------------- | -------------------------- | ---------------------- | -------------------------- |
| Meta's SAM-b | 358 MB | 94.7 M | 51096 ms/im |
| [MobileSAM](mobile-sam.md) | 40.7 MB | 10.1 M | 46122 ms/im |
| [FastSAM-s](fast-sam.md) with YOLOv8 backbone | 23.7 MB | 11.8 M | 115 ms/im |
| Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7 MB** (53.4x smaller) | **3.4 M** (27.9x less) | **59 ms/im** (866x faster) |
For more details, see the [SAM comparison vs YOLOv8](#sam-comparison-vs-yolov8) section.

@ -239,6 +239,7 @@ nav:
- YOLOv9: models/yolov9.md
- YOLOv10: models/yolov10.md
- SAM (Segment Anything Model): models/sam.md
- SAM2 (Segment Anything Model 2): models/sam.md
- MobileSAM (Mobile Segment Anything Model): models/mobile-sam.md
- FastSAM (Fast Segment Anything Model): models/fast-sam.md
- YOLO-NAS (Neural Architecture Search): models/yolo-nas.md

Loading…
Cancel
Save