PyCharm Code and Docs Inspect fixes v1 (#18461)

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: UltralyticsAssistant <web@ultralytics.com>
Co-authored-by: Ultralytics Assistant <135830346+UltralyticsAssistant@users.noreply.github.com>
Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
pull/18449/head
Muhammad Rizwan Munawar 3 months ago committed by GitHub
parent 126867e355
commit 7f1a50e893
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
  1. 2
      docker/Dockerfile
  2. 4
      docs/en/datasets/detect/sku-110k.md
  3. 4
      docs/en/datasets/explorer/explorer.md
  4. 2
      docs/en/guides/nvidia-jetson.md
  5. 2
      docs/en/guides/sahi-tiled-inference.md
  6. 2
      docs/en/hub/cloud-training.md
  7. 6
      docs/en/integrations/albumentations.md
  8. 2
      docs/en/integrations/paddlepaddle.md
  9. 2
      docs/en/integrations/tf-savedmodel.md
  10. 6
      docs/en/integrations/vscode.md
  11. 2
      docs/en/reference/utils/metrics.md
  12. 2
      docs/en/yolov5/tutorials/clearml_logging_integration.md
  13. 2
      docs/en/yolov5/tutorials/comet_logging_integration.md
  14. 2
      docs/en/yolov5/tutorials/tips_for_best_training_results.md
  15. 4
      examples/YOLOv8-ONNXRuntime-Rust/README.md
  16. 2
      examples/YOLOv8-Region-Counter/readme.md
  17. 4
      examples/YOLOv8-Region-Counter/yolov8_region_counter.py
  18. 14
      ultralytics/data/augment.py
  19. 2
      ultralytics/data/converter.py
  20. 2
      ultralytics/data/split_dota.py
  21. 2
      ultralytics/models/sam/amg.py
  22. 22
      ultralytics/models/sam/modules/blocks.py
  23. 4
      ultralytics/models/sam/modules/sam.py
  24. 59
      ultralytics/models/sam/predict.py
  25. 24
      ultralytics/trackers/utils/gmc.py
  26. 2
      ultralytics/utils/metrics.py

@ -41,7 +41,7 @@ ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
# Install pip packages
RUN pip install uv
# Note -cu12 must be used with tensorrt)
# Note -cu12 must be used with tensorrt
RUN uv pip install --system -e ".[export]" tensorrt-cu12 "albumentations>=1.4.6" comet pycocotools
# Run exports to AutoInstall packages

@ -6,7 +6,7 @@ keywords: SKU-110k, dataset, object detection, retail shelf images, deep learnin
# SKU-110k Dataset
The [SKU-110k](https://github.com/eg4000/SKU110K_CVPR19) dataset is a collection of densely packed retail shelf images, designed to support research in [object detection](https://www.ultralytics.com/glossary/object-detection) tasks. Developed by Eran Goldman et al., the dataset contains over 110,000 unique store keeping unit (SKU) categories with densely packed objects, often looking similar or even identical, positioned in close proximity.
The [SKU-110k](https://github.com/eg4000/SKU110K_CVPR19) dataset is a collection of densely packed retail shelf images, designed to support research in [object detection](https://www.ultralytics.com/glossary/object-detection) tasks. Developed by Eran Goldman et al., the dataset contains over 110,000 unique store keeping unit (SKU) categories with densely packed objects, often looking similar or even identical, positioned in proximity.
<p align="center">
<br>
@ -107,7 +107,7 @@ We would like to acknowledge Eran Goldman et al. for creating and maintaining th
### What is the SKU-110k dataset and why is it important for object detection?
The SKU-110k dataset consists of densely packed retail shelf images designed to aid research in object detection tasks. Developed by Eran Goldman et al., it includes over 110,000 unique SKU categories. Its importance lies in its ability to challenge state-of-the-art object detectors with diverse object appearances and close proximity, making it an invaluable resource for researchers and practitioners in computer vision. Learn more about the dataset's structure and applications in our [SKU-110k Dataset](#sku-110k-dataset) section.
The SKU-110k dataset consists of densely packed retail shelf images designed to aid research in object detection tasks. Developed by Eran Goldman et al., it includes over 110,000 unique SKU categories. Its importance lies in its ability to challenge state-of-the-art object detectors with diverse object appearances and proximity, making it an invaluable resource for researchers and practitioners in computer vision. Learn more about the dataset's structure and applications in our [SKU-110k Dataset](#sku-110k-dataset) section.
### How do I train a YOLO11 model using the SKU-110k dataset?

@ -43,9 +43,9 @@ Try `yolo explorer` powered by Explorer API
Simply `pip install ultralytics` and run `yolo explorer` in your terminal to run custom queries and semantic search on your datasets right inside your browser!
## Ultralytics Explorer support deprecated ⚠
!!! warning "Community Note ⚠"
As of **`ultralytics>=8.3.10`**, Ultralytics explorer support has been deprecated. But don't worry! You can now access similar and even enhanced functionality through [Ultralytics HUB](https://hub.ultralytics.com/), our intuitive no-code platform designed to streamline your workflow. With Ultralytics HUB, you can continue exploring, visualizing, and managing your data effortlessly, all without writing a single line of code. Make sure to check it out and take advantage of its powerful features!🚀
As of **`ultralytics>=8.3.10`**, Ultralytics explorer support has been deprecated. But don't worry! You can now access similar and even enhanced functionality through [Ultralytics HUB](https://hub.ultralytics.com/), our intuitive no-code platform designed to streamline your workflow. With Ultralytics HUB, you can continue exploring, visualizing, and managing your data effortlessly, all without writing a single line of code. Make sure to check it out and take advantage of its powerful features!🚀
## Setup

@ -628,7 +628,7 @@ TensorRT is highly recommended for deploying YOLO11 models on NVIDIA Jetson due
### How can I install PyTorch and Torchvision on NVIDIA Jetson?
To install PyTorch and Torchvision on NVIDIA Jetson, first uninstall any existing versions that may have been installed via pip. Then, manually install the compatible PyTorch and Torchvision versions for the Jetson's ARM64 architecture. Detailed instructions for this process are provided in the [Install PyTorch and Torchvision](#install-pytorch-and-torchvision) section.
To install PyTorch and Torchvision on NVIDIA Jetson, first uninstall any existing versions that may have been installed via pip. Then, manually install the compatible PyTorch and Torchvision versions for the Jetson's ARM64 architecture. Detailed instructions for this process are provided in the [Installation of PyTorch and Torchvision](#install-pytorch-and-torchvision) section.
### What are the best practices for maximizing performance on NVIDIA Jetson when using YOLO11?

@ -254,7 +254,7 @@ result.export_visuals(export_dir="demo_data/")
Image("demo_data/prediction_visual.png")
```
This command will save the visualized predictions to the specified directory and you can then load the image to view it in your notebook or application. For a detailed guide, check out the [Standard Inference section](#visualize-results).
This command will save the visualized predictions to the specified directory, and you can then load the image to view it in your notebook or application. For a detailed guide, check out the [Standard Inference section](#visualize-results).
### What features does SAHI offer for improving YOLO11 object detection?

@ -34,7 +34,7 @@ Follow the [Train Model](./models.md#train-model) instructions from the [Models]
![Ultralytics HUB screenshot of the Model page with an arrow pointing to the Start Training card](https://github.com/ultralytics/docs/releases/download/0/hub-cloud-training-model-page-start-training.avif)
Most of the times, you will use the Epochs training. The number of epochs can be adjusted on this step (if the training didn't start yet) and represents the number of times your dataset needs to go through the cycle of train, label, and test. The exact pricing based on the number of epochs is hard to determine, reason why we only allow the [Account Balance](./pro.md#account-balance) payment method.
Most of the time, you will use the Epochs training. The number of epochs can be adjusted on this step (if the training didn't start yet) and represents the number of times your dataset needs to go through the cycle of train, label, and test. The exact pricing based on the number of epochs is hard to determine, reason why we only allow the [Account Balance](./pro.md#account-balance) payment method.
!!! note

@ -87,7 +87,7 @@ Next, let's take look a closer look at the specific augmentations that are appli
### Blur
The Blur transformation in Albumentations applies a simple blur effect to the image by averaging pixel values within a small square area, or kernel. This is done using OpenCV's `cv2.blur` function, which helps reduce noise in the image, though it also slightly reduces image details.
The Blur transformation in Albumentations applies a simple blur effect to the image by averaging pixel values within a small square area, or kernel. This is done using OpenCV `cv2.blur` function, which helps reduce noise in the image, though it also slightly reduces image details.
Here are the parameters and values used in this integration:
@ -117,7 +117,7 @@ The ToGray transformation in Albumentations converts an image to grayscale, redu
Here are the parameters and values used in this integration:
- **num_output_channels**: Sets the number of channels in the output image. If this value is more than 1, the single grayscale channel will be replicated to create a multi-channel grayscale image. By default, it's set to 3, giving a grayscale image with three identical channels.
- **num_output_channels**: Sets the number of channels in the output image. If this value is more than 1, the single grayscale channel will be replicated to create a multichannel grayscale image. By default, it's set to 3, giving a grayscale image with three identical channels.
- **method**: Defines the grayscale conversion method. The default method, "weighted_average", applies a formula (0.299R + 0.587G + 0.114B) that closely aligns with human perception, providing a natural-looking grayscale effect. Other options, like "from_lab", "desaturation", "average", "max", and "pca", offer alternative ways to create grayscale images based on various needs for speed, brightness emphasis, or detail preservation.
@ -135,7 +135,7 @@ Here are the parameters and values used in this integration:
- **clip_limit**: Controls the contrast enhancement range. Set to a default range of (1, 4), it determines the maximum contrast allowed in each tile. Higher values are used for more contrast but may also introduce noise.
- **tile_grid_size**: Defines the size of the grid of tiles, typically as (rows, columns). The default value is (8, 8), meaning the image is divided into an 8x8 grid. Smaller tile sizes provide more localized adjustments, while larger ones create effects closer to global equalization.
- **tile_grid_size**: Defines the size of the grid of tiles, typically as (rows, columns). The default value is (8, 8), meaning the image is divided into a 8x8 grid. Smaller tile sizes provide more localized adjustments, while larger ones create effects closer to global equalization.
- **p**: The probability of applying CLAHE. Here, p=0.01 introduces the enhancement effect only 1% of the time, ensuring that contrast adjustments are applied sparingly for occasional variation in training images.

@ -114,7 +114,7 @@ For more details about supported export options, visit the [Ultralytics document
## Deploying Exported YOLO11 PaddlePaddle Models
After successfully exporting your Ultralytics YOLO11 models to PaddlePaddle format, you can now deploy them. The primary and recommended first step for running a PaddlePaddle model is to use the YOLO("./model_paddle_model") method, as outlined in the previous usage code snippet.
After successfully exporting your Ultralytics YOLO11 models to PaddlePaddle format, you can now deploy them. The primary and recommended first step for running a PaddlePaddle model is to use the YOLO("yolo11n_paddle_model/") method, as outlined in the previous usage code snippet.
However, for in-depth instructions on deploying your PaddlePaddle models in various other settings, take a look at the following resources:

@ -101,7 +101,7 @@ For more details about supported export options, visit the [Ultralytics document
## Deploying Exported YOLO11 TF SavedModel Models
Now that you have exported your YOLO11 model to the TF SavedModel format, the next step is to deploy it. The primary and recommended first step for running a TF GraphDef model is to use the YOLO("./yolo11n_saved_model") method, as previously shown in the usage code snippet.
Now that you have exported your YOLO11 model to the TF SavedModel format, the next step is to deploy it. The primary and recommended first step for running a TF GraphDef model is to use the YOLO("yolo11n_saved_model/") method, as previously shown in the usage code snippet.
However, for in-depth instructions on deploying your TF SavedModel models, take a look at the following resources:

@ -125,7 +125,7 @@ These are the current snippet categories available to the Ultralytics-snippets e
### Learning with Examples
The `ultra.examples` snippets are to useful for anyone looking to learn how to get started with the basics of working with Ultralytics YOLO. Example snippets are intended to run once inserted (some have dropdown options as well). An example of this is shown at the animation at the [top] of this page, where after the snippet is inserted, all code is selected and run interactively using <kbd>Shift ⇑</kbd>+<kbd>Enter ↵</kbd>.
The `ultra.examples` snippets are very useful for anyone looking to learn how to get started with the basics of working with Ultralytics YOLO. Example snippets are intended to run once inserted (some have dropdown options as well). An example of this is shown at the animation at the [top] of this page, where after the snippet is inserted, all code is selected and run interactively using <kbd>Shift ⇑</kbd>+<kbd>Enter ↵</kbd>.
!!! example
@ -168,7 +168,7 @@ However, since Ultralytics supports numerous [tasks], when [working with inferen
### Keywords Arguments
There are over 💯 keyword arguments for all of the various Ultralytics [tasks] and [modes]! That's a lot to remember and it can be easy to forget if the argument is `save_frame` or `save_frames` (it's definitely `save_frames` by the way). This is where the `ultra.kwargs` snippets can help out!
There are over 💯 keyword arguments for all the various Ultralytics [tasks] and [modes]! That's a lot to remember, and it can be easy to forget if the argument is `save_frame` or `save_frames` (it's definitely `save_frames` by the way). This is where the `ultra.kwargs` snippets can help out!
!!! example
@ -229,7 +229,7 @@ If you use VS Code and have started to see a message prompting you to install th
1. Install Ultralytics-snippets and the message will no longer be shown 😆!
2. You can using `yolo settings vscode_msg False` to disable the message from showing without having to install the extension. You can learn more about the [Ultralytics Settings] on the [quickstart] page if you're unfamiliar.
2. You can be using `yolo settings vscode_msg False` to disable the message from showing without having to install the extension. You can learn more about the [Ultralytics Settings] on the [quickstart] page if you're unfamiliar.
### I have an idea for a new Ultralytics code snippet, how can I get one added?

@ -71,7 +71,7 @@ keywords: Ultralytics, metrics, model validation, performance analysis, IoU, con
<br><br><hr><br>
## ::: ultralytics.utils.metrics.smooth_BCE
## ::: ultralytics.utils.metrics.smooth_bce
<br><br><hr><br>

@ -102,7 +102,7 @@ Versioning your data separately from your code is generally a good idea and make
### Prepare Your Dataset
The YOLOv5 repository supports a number of different datasets by using YAML files containing their information. By default datasets are downloaded to the `../datasets` folder in relation to the repository root folder. So if you downloaded the `coco128` dataset using the link in the YAML or with the scripts provided by yolov5, you get this folder structure:
The YOLOv5 repository supports a number of different datasets by using YAML files containing their information. By default, datasets are downloaded to the `../datasets` folder in relation to the repository root folder. So if you downloaded the `coco128` dataset using the link in the YAML or with the scripts provided by yolov5, you get this folder structure:
```
..

@ -138,7 +138,7 @@ python train.py \
### Controlling the number of Prediction Images logged to Comet
When logging predictions from YOLOv5, Comet will log the images associated with each set of predictions. By default a maximum of 100 validation images are logged. You can increase or decrease this number using the `COMET_MAX_IMAGE_UPLOADS` environment variable.
When logging predictions from YOLOv5, Comet will log the images associated with each set of predictions. By default, a maximum of 100 validation images are logged. You can increase or decrease this number using the `COMET_MAX_IMAGE_UPLOADS` environment variable.
```shell
env COMET_MAX_IMAGE_UPLOADS=200 python train.py \

@ -18,7 +18,7 @@ We've put together a full guide for users looking to get the best results on the
- **Instances per class.** ≥ 10000 instances (labeled objects) per class recommended
- **Image variety.** Must be representative of deployed environment. For real-world use cases we recommend images from different times of day, different seasons, different weather, different lighting, different angles, different sources (scraped online, collected locally, different cameras) etc.
- **Label consistency.** All instances of all classes in all images must be labelled. Partial labelling will not work.
- **Label [accuracy](https://www.ultralytics.com/glossary/accuracy).** Labels must closely enclose each object. No space should exist between an object and it's [bounding box](https://www.ultralytics.com/glossary/bounding-box). No objects should be missing a label.
- **Label [accuracy](https://www.ultralytics.com/glossary/accuracy).** Labels must closely enclose each object. No space should exist between an object, and it's [bounding box](https://www.ultralytics.com/glossary/bounding-box). No objects should be missing a label.
- **Label verification.** View `train_batch*.jpg` on train start to verify your labels appear correct, i.e. see [example](./train_custom_data.md#local-logging) mosaic.
- **Background images.** Background images are images with no objects that are added to a dataset to reduce False Positives (FP). We recommend about 0-10% background images to help reduce FPs (COCO has 1000 background images for reference, 1% of the total). No labels are required for background images.

@ -87,13 +87,13 @@ cargo run --release -- --cuda --device_id 0 --model <MODEL> --source <SOURCE>
Set `--batch` to do multi-batch-size inference.
If you're using `--trt`, you can also set `--batch-min` and `--batch-max` to explicitly specify min/max/opt batch for dynamic batch input.(https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#explicit-shape-range-for-dynamic-shape-input).(Note that the ONNX model should exported with dynamic shapes)
If you're using `--trt`, you can also set `--batch-min` and `--batch-max` to explicitly specify min/max/opt batch for dynamic batch input.(https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#explicit-shape-range-for-dynamic-shape-input).(Note that the ONNX model should be exported with dynamic shapes.)
```bash
cargo run --release -- --cuda --batch 2 --model <MODEL> --source <SOURCE>
```
Set `--height` and `--width` to do dynamic image size inference. (Note that the ONNX model should exported with dynamic shapes)
Set `--height` and `--width` to do dynamic image size inference. (Note that the ONNX model should be exported with dynamic shapes.)
```bash
cargo run --release -- --cuda --width 480 --height 640 --model <MODEL> --source <SOURCE>

@ -80,7 +80,7 @@ Region counting is a computational method utilized to ascertain the quantity of
**2. Is Friendly Region Plotting Supported by the Region Counter?**
The Region Counter offers the capability to create regions in various formats, such as polygons and rectangles. You have the flexibility to modify region attributes, including coordinates, colors, and other details, as demonstrated in the following code:
The Region Counting offers the capability to create regions in various formats, such as polygons and rectangles. You have the flexibility to modify region attributes, including coordinates, colors, and other details, as demonstrated in the following code:
```python
from shapely.geometry import Polygon

@ -185,7 +185,7 @@ def run(
region_color = region["region_color"]
region_text_color = region["text_color"]
polygon_coords = np.array(region["polygon"].exterior.coords, dtype=np.int32)
polygon_coordinates = np.array(region["polygon"].exterior.coords, dtype=np.int32)
centroid_x, centroid_y = int(region["polygon"].centroid.x), int(region["polygon"].centroid.y)
text_size, _ = cv2.getTextSize(
@ -203,7 +203,7 @@ def run(
cv2.putText(
frame, region_label, (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, 0.7, region_text_color, line_thickness
)
cv2.polylines(frame, [polygon_coords], isClosed=True, color=region_color, thickness=region_thickness)
cv2.polylines(frame, [polygon_coordinates], isClosed=True, color=region_color, thickness=region_thickness)
if view_img:
if vid_frame_count == 1:

@ -642,7 +642,7 @@ class Mosaic(BaseMixTransform):
c = s - w, s + h0 - h, s, s + h0
padw, padh = c[:2]
x1, y1, x2, y2 = (max(x, 0) for x in c) # allocate coords
x1, y1, x2, y2 = (max(x, 0) for x in c) # allocate coordinates
img3[y1:y2, x1:x2] = img[y1 - padh :, x1 - padw :] # img3[ymin:ymax, xmin:xmax]
# hp, wp = h, w # height, width previous for next iteration
@ -771,7 +771,7 @@ class Mosaic(BaseMixTransform):
c = s - w, s + h0 - hp - h, s, s + h0 - hp
padw, padh = c[:2]
x1, y1, x2, y2 = (max(x, 0) for x in c) # allocate coords
x1, y1, x2, y2 = (max(x, 0) for x in c) # allocate coordinates
# Image
img9[y1:y2, x1:x2] = img[y1 - padh :, x1 - padw :] # img9[ymin:ymax, xmin:xmax]
@ -1283,7 +1283,7 @@ class RandomPerspective:
eps (float): Small epsilon value to prevent division by zero.
Returns:
(numpy.ndarray): Boolean array of shape (n,) indicating which boxes are candidates.
(numpy.ndarray): Boolean array of shape (n) indicating which boxes are candidates.
True values correspond to boxes that meet all criteria.
Examples:
@ -1320,7 +1320,7 @@ class RandomHSV:
>>> augmenter = RandomHSV(hgain=0.5, sgain=0.5, vgain=0.5)
>>> image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)
>>> labels = {"img": image}
>>> augmented_labels = augmenter(labels)
>>> augmenter(labels)
>>> augmented_image = augmented_labels["img"]
"""
@ -1337,7 +1337,7 @@ class RandomHSV:
Examples:
>>> hsv_aug = RandomHSV(hgain=0.5, sgain=0.5, vgain=0.5)
>>> augmented_image = hsv_aug(image)
>>> hsv_aug(image)
"""
self.hgain = hgain
self.sgain = sgain
@ -1419,7 +1419,7 @@ class RandomFlip:
Examples:
>>> flip = RandomFlip(p=0.5, direction="horizontal")
>>> flip = RandomFlip(p=0.7, direction="vertical", flip_idx=[1, 0, 3, 2, 5, 4])
>>> flip_with_idx = RandomFlip(p=0.7, direction="vertical", flip_idx=[1, 0, 3, 2, 5, 4])
"""
assert direction in {"horizontal", "vertical"}, f"Support direction `horizontal` or `vertical`, got {direction}"
assert 0 <= p <= 1.0, f"The probability should be in range [0, 1], but got {p}."
@ -2022,7 +2022,7 @@ class Format:
Returns:
(Dict): A dictionary with formatted data, including:
- 'img': Formatted image tensor.
- 'cls': Class labels tensor.
- 'cls': Class label's tensor.
- 'bboxes': Bounding boxes tensor in the specified format.
- 'masks': Instance masks tensor (if return_mask is True).
- 'keypoints': Keypoints tensor (if return_keypoint is True).

@ -241,7 +241,7 @@ def convert_coco(
```python
from ultralytics.data.converter import convert_coco
convert_coco("../datasets/coco/annotations/", use_segments=True, use_keypoints=False, cls91to80=True)
convert_coco("../datasets/coco/annotations/", use_segments=True, use_keypoints=False, cls91to80=False)
convert_coco("../datasets/lvis/annotations/", use_segments=True, use_keypoints=False, cls91to80=False, lvis=True)
```

@ -67,7 +67,7 @@ def load_yolo_dota(data_root, split="train"):
Args:
data_root (str): Data root.
split (str): The split data set, could be train or val.
split (str): The split data set, could be `train` or `val`.
Notes:
The directory structure assumed for the DOTA dataset:

@ -76,7 +76,7 @@ def build_all_layer_point_grids(n_per_side: int, n_layers: int, scale_per_layer:
def generate_crop_boxes(
im_size: Tuple[int, ...], n_layers: int, overlap_ratio: float
) -> Tuple[List[List[int]], List[int]]:
"""Generates crop boxes of varying sizes for multi-scale image processing, with layered overlapping regions."""
"""Generates crop boxes of varying sizes for multiscale image processing, with layered overlapping regions."""
crop_boxes, layer_idxs = [], []
im_h, im_w = im_size
short_side = min(im_h, im_w)

@ -502,11 +502,11 @@ def do_pool(x: torch.Tensor, pool: nn.Module, norm: nn.Module = None) -> torch.T
class MultiScaleAttention(nn.Module):
"""
Implements multi-scale self-attention with optional query pooling for efficient feature extraction.
Implements multiscale self-attention with optional query pooling for efficient feature extraction.
This class provides a flexible implementation of multi-scale attention, allowing for optional
This class provides a flexible implementation of multiscale attention, allowing for optional
downsampling of query features through pooling. It's designed to enhance the model's ability to
capture multi-scale information in visual tasks.
capture multiscale information in visual tasks.
Attributes:
dim (int): Input dimension of the feature map.
@ -518,7 +518,7 @@ class MultiScaleAttention(nn.Module):
proj (nn.Linear): Output projection.
Methods:
forward: Applies multi-scale attention to the input tensor.
forward: Applies multiscale attention to the input tensor.
Examples:
>>> import torch
@ -537,7 +537,7 @@ class MultiScaleAttention(nn.Module):
num_heads: int,
q_pool: nn.Module = None,
):
"""Initializes multi-scale attention with optional query pooling for efficient feature extraction."""
"""Initializes multiscale attention with optional query pooling for efficient feature extraction."""
super().__init__()
self.dim = dim
@ -552,7 +552,7 @@ class MultiScaleAttention(nn.Module):
self.proj = nn.Linear(dim_out, dim_out)
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Applies multi-scale attention with optional query pooling to extract multi-scale features."""
"""Applies multiscale attention with optional query pooling to extract multiscale features."""
B, H, W, _ = x.shape
# qkv with shape (B, H * W, 3, nHead, C)
qkv = self.qkv(x).reshape(B, H * W, 3, self.num_heads, -1)
@ -582,9 +582,9 @@ class MultiScaleAttention(nn.Module):
class MultiScaleBlock(nn.Module):
"""
A multi-scale attention block with window partitioning and query pooling for efficient vision transformers.
A multiscale attention block with window partitioning and query pooling for efficient vision transformers.
This class implements a multi-scale attention mechanism with optional window partitioning and downsampling,
This class implements a multiscale attention mechanism with optional window partitioning and downsampling,
designed for use in vision transformer architectures.
Attributes:
@ -601,7 +601,7 @@ class MultiScaleBlock(nn.Module):
proj (nn.Linear | None): Projection layer for dimension mismatch.
Methods:
forward: Processes input tensor through the multi-scale block.
forward: Processes input tensor through the multiscale block.
Examples:
>>> block = MultiScaleBlock(dim=256, dim_out=512, num_heads=8, window_size=7)
@ -623,7 +623,7 @@ class MultiScaleBlock(nn.Module):
act_layer: nn.Module = nn.GELU,
window_size: int = 0,
):
"""Initializes a multi-scale attention block with window partitioning and optional query pooling."""
"""Initializes a multiscale attention block with window partitioning and optional query pooling."""
super().__init__()
if isinstance(norm_layer, str):
@ -660,7 +660,7 @@ class MultiScaleBlock(nn.Module):
self.proj = nn.Linear(dim, dim_out)
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Processes input through multi-scale attention and MLP, with optional windowing and downsampling."""
"""Processes input through multiscale attention and MLP, with optional windowing and downsampling."""
shortcut = x # B, H, W, C
x = self.norm1(x)

@ -425,7 +425,7 @@ class SAM2Model(torch.nn.Module):
low_res_masks: Tensor of shape (B, 1, H*4, W*4) with the best low-resolution mask.
high_res_masks: Tensor of shape (B, 1, H*16, W*16) with the best high-resolution mask.
obj_ptr: Tensor of shape (B, C) with object pointer vector for the output mask.
object_score_logits: Tensor of shape (B,) with object score logits.
object_score_logits: Tensor of shape (B) with object score logits.
Where M is 3 if multimask_output=True, and 1 if multimask_output=False.
@ -643,7 +643,7 @@ class SAM2Model(torch.nn.Module):
if not is_init_cond_frame:
# Retrieve the memories encoded with the maskmem backbone
to_cat_memory, to_cat_memory_pos_embed = [], []
# Add conditioning frames's output first (all cond frames have t_pos=0 for
# Add conditioning frame's output first (all cond frames have t_pos=0 for
# when getting temporal positional embedding below)
assert len(output_dict["cond_frame_outputs"]) > 0
# Select a maximum number of temporally closest cond frames for cross attention

@ -1096,7 +1096,7 @@ class SAM2VideoPredictor(SAM2Predictor):
# to `propagate_in_video_preflight`).
consolidated_frame_inds = self.inference_state["consolidated_frame_inds"]
for is_cond in {False, True}:
# Separately consolidate conditioning and non-conditioning temp outptus
# Separately consolidate conditioning and non-conditioning temp outputs
storage_key = "cond_frame_outputs" if is_cond else "non_cond_frame_outputs"
# Find all the frames that contain temporary outputs for any objects
# (these should be the frames that have just received clicks for mask inputs
@ -1161,36 +1161,35 @@ class SAM2VideoPredictor(SAM2Predictor):
assert predictor.dataset is not None
assert predictor.dataset.mode == "video"
inference_state = {}
inference_state["num_frames"] = predictor.dataset.frames
# inputs on each frame
inference_state["point_inputs_per_obj"] = {}
inference_state["mask_inputs_per_obj"] = {}
# values that don't change across frames (so we only need to hold one copy of them)
inference_state["constants"] = {}
# mapping between client-side object id and model-side object index
inference_state["obj_id_to_idx"] = OrderedDict()
inference_state["obj_idx_to_id"] = OrderedDict()
inference_state["obj_ids"] = []
# A storage to hold the model's tracking results and states on each frame
inference_state["output_dict"] = {
"cond_frame_outputs": {}, # dict containing {frame_idx: <out>}
"non_cond_frame_outputs": {}, # dict containing {frame_idx: <out>}
}
# Slice (view) of each object tracking results, sharing the same memory with "output_dict"
inference_state["output_dict_per_obj"] = {}
# A temporary storage to hold new outputs when user interact with a frame
# to add clicks or mask (it's merged into "output_dict" before propagation starts)
inference_state["temp_output_dict_per_obj"] = {}
# Frames that already holds consolidated outputs from click or mask inputs
# (we directly use their consolidated outputs during tracking)
inference_state["consolidated_frame_inds"] = {
"cond_frame_outputs": set(), # set containing frame indices
"non_cond_frame_outputs": set(), # set containing frame indices
inference_state = {
"num_frames": predictor.dataset.frames,
"point_inputs_per_obj": {}, # inputs points on each frame
"mask_inputs_per_obj": {}, # inputs mask on each frame
"constants": {}, # values that don't change across frames (so we only need to hold one copy of them)
# mapping between client-side object id and model-side object index
"obj_id_to_idx": OrderedDict(),
"obj_idx_to_id": OrderedDict(),
"obj_ids": [],
# A storage to hold the model's tracking results and states on each frame
"output_dict": {
"cond_frame_outputs": {}, # dict containing {frame_idx: <out>}
"non_cond_frame_outputs": {}, # dict containing {frame_idx: <out>}
},
# Slice (view) of each object tracking results, sharing the same memory with "output_dict"
"output_dict_per_obj": {},
# A temporary storage to hold new outputs when user interact with a frame
# to add clicks or mask (it's merged into "output_dict" before propagation starts)
"temp_output_dict_per_obj": {},
# Frames that already holds consolidated outputs from click or mask inputs
# (we directly use their consolidated outputs during tracking)
"consolidated_frame_inds": {
"cond_frame_outputs": set(), # set containing frame indices
"non_cond_frame_outputs": set(), # set containing frame indices
},
# metadata for each tracking frame (e.g. which direction it's tracked)
"tracking_has_started": False,
"frames_already_tracked": [],
}
# metadata for each tracking frame (e.g. which direction it's tracked)
inference_state["tracking_has_started"] = False
inference_state["frames_already_tracked"] = []
predictor.inference_state = inference_state
def get_im_features(self, im, batch=1):

@ -26,9 +26,9 @@ class GMC:
Methods:
__init__: Initializes a GMC object with the specified method and downscale factor.
apply: Applies the chosen method to a raw frame and optionally uses provided detections.
applyEcc: Applies the ECC algorithm to a raw frame.
applyFeatures: Applies feature-based methods like ORB or SIFT to a raw frame.
applySparseOptFlow: Applies the Sparse Optical Flow method to a raw frame.
apply_ecc: Applies the ECC algorithm to a raw frame.
apply_features: Applies feature-based methods like ORB or SIFT to a raw frame.
apply_sparseoptflow: Applies the Sparse Optical Flow method to a raw frame.
reset_params: Resets the internal parameters of the GMC object.
Examples:
@ -108,15 +108,15 @@ class GMC:
(480, 640, 3)
"""
if self.method in {"orb", "sift"}:
return self.applyFeatures(raw_frame, detections)
return self.apply_features(raw_frame, detections)
elif self.method == "ecc":
return self.applyEcc(raw_frame)
return self.apply_ecc(raw_frame)
elif self.method == "sparseOptFlow":
return self.applySparseOptFlow(raw_frame)
return self.apply_sparseoptflow(raw_frame)
else:
return np.eye(2, 3)
def applyEcc(self, raw_frame: np.array) -> np.array:
def apply_ecc(self, raw_frame: np.array) -> np.array:
"""
Apply the ECC (Enhanced Correlation Coefficient) algorithm to a raw frame for motion compensation.
@ -128,7 +128,7 @@ class GMC:
Examples:
>>> gmc = GMC(method="ecc")
>>> processed_frame = gmc.applyEcc(np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]))
>>> processed_frame = gmc.apply_ecc(np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]))
>>> print(processed_frame)
[[1. 0. 0.]
[0. 1. 0.]]
@ -161,7 +161,7 @@ class GMC:
return H
def applyFeatures(self, raw_frame: np.array, detections: list = None) -> np.array:
def apply_features(self, raw_frame: np.array, detections: list = None) -> np.array:
"""
Apply feature-based methods like ORB or SIFT to a raw frame.
@ -175,7 +175,7 @@ class GMC:
Examples:
>>> gmc = GMC(method="orb")
>>> raw_frame = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
>>> processed_frame = gmc.applyFeatures(raw_frame)
>>> processed_frame = gmc.apply_features(raw_frame)
>>> print(processed_frame.shape)
(2, 3)
"""
@ -304,7 +304,7 @@ class GMC:
return H
def applySparseOptFlow(self, raw_frame: np.array) -> np.array:
def apply_sparseoptflow(self, raw_frame: np.array) -> np.array:
"""
Apply Sparse Optical Flow method to a raw frame.
@ -316,7 +316,7 @@ class GMC:
Examples:
>>> gmc = GMC()
>>> result = gmc.applySparseOptFlow(np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]))
>>> result = gmc.apply_sparseoptflow(np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]))
>>> print(result)
[[1. 0. 0.]
[0. 1. 0.]]

@ -270,7 +270,7 @@ def batch_probiou(obb1, obb2, eps=1e-7):
return 1 - hd
def smooth_BCE(eps=0.1):
def smooth_bce(eps=0.1):
"""
Computes smoothed positive and negative Binary Cross-Entropy targets.

Loading…
Cancel
Save