Merge branch 'yolov9' into exp

11 months ago · 9ef317fbf3
parent 6256e711ed 212f7937e1
commit 9ef317fbf3
87 changed files with 952 additions and 254 deletions
--- a/.github/workflows/ci.yaml
+++ b/.github/workflows/ci.yaml
@ -118,9 +118,9 @@ jobs:
        run: |
          yolo checks
          pip list
-      #      - name: Benchmark DetectionModel
-      #        shell: bash
-      #        run: coverage run -a --source=ultralytics -m ultralytics.cfg.__init__ benchmark model='path with spaces/${{ matrix.model }}.pt' imgsz=160 verbose=0.318
+      - name: Benchmark World DetectionModel
+        shell: bash
+        run: coverage run -a --source=ultralytics -m ultralytics.cfg.__init__ benchmark model='path with spaces/yolov8s-worldv2.pt' imgsz=160 verbose=0.318
      - name: Benchmark SegmentationModel
        shell: bash
        run: coverage run -a --source=ultralytics -m ultralytics.cfg.__init__ benchmark model='path with spaces/${{ matrix.model }}-seg.pt' imgsz=160 verbose=0.281
--- a/docs/build_docs.py
+++ b/docs/build_docs.py
@ -23,6 +23,7 @@ Usage:
 Note:
 - This script is built to be run in an environment where Python and MkDocs are installed and properly configured.
 """
+
 import os
 import re
 import shutil
--- a/docs/en/datasets/detect/roboflow-100.md
+++ b/docs/en/datasets/detect/roboflow-100.md
@ -14,7 +14,7 @@ Roboflow 100, developed by [Roboflow](https://roboflow.com/?ref=ultralytics) and

 ## Key Features

- Includes 100 datasets across seven domains: Aerial, Videogames, Microscopic, Underwater, Documents, Electromagnetic, and Real World.
+- Includes 100 datasets across seven domains: Aerial, Video games, Microscopic, Underwater, Documents, Electromagnetic, and Real World.
 - The benchmark comprises 224,714 images across 805 classes, thanks to over 11,170 hours of labeling efforts.
 - All images are resized to 640x640 pixels, with a focus on eliminating class ambiguity and filtering out underrepresented classes.
 - Annotations include bounding boxes for objects, making it suitable for [training](../../modes/train.md) and evaluating object detection models.
@ -24,7 +24,7 @@ Roboflow 100, developed by [Roboflow](https://roboflow.com/?ref=ultralytics) and
 The Roboflow 100 dataset is organized into seven categories, each with a distinct set of datasets, images, and classes:

 - **Aerial**: Consists of 7 datasets with a total of 9,683 images, covering 24 distinct classes.
- **Videogames**: Includes 7 datasets, featuring 11,579 images across 88 classes.
+- **Video Games**: Includes 7 datasets, featuring 11,579 images across 88 classes.
 - **Microscopic**: Comprises 11 datasets with 13,378 images, spanning 28 classes.
 - **Underwater**: Contains 5 datasets, encompassing 18,003 images in 39 classes.
 - **Documents**: Consists of 8 datasets with 24,813 images, divided into 90 classes.
@ -45,7 +45,7 @@ For more ideas and inspiration on real-world applications, be sure to check out

 ## Usage

-The Roboflow 100 dataset is available on both [GitHub](https://github.com/roboflow/roboflow-100-benchmark) and [Roboflow Universe](https://universe.roboflow.com/roboflow-100). 
+The Roboflow 100 dataset is available on both [GitHub](https://github.com/roboflow/roboflow-100-benchmark) and [Roboflow Universe](https://universe.roboflow.com/roboflow-100).

 You can access it directly from the Roboflow 100 GitHub repository. In addition, on Roboflow Universe, you have the flexibility to download individual datasets by simply clicking the export button within each dataset.

--- a/docs/en/datasets/explorer/api.md
+++ b/docs/en/datasets/explorer/api.md
@ -227,7 +227,7 @@ Here are some examples of what you can do with the table:
    print(embeddings)
    ```

-### Advanced Querying with pre and post filters
+### Advanced Querying with pre- and post-filters

 !!! Example

--- a/docs/en/datasets/explorer/dashboard.md
+++ b/docs/en/datasets/explorer/dashboard.md
@ -39,7 +39,7 @@ pip install ultralytics[explorer]
 Semantic search is a technique for finding similar images to a given image. It is based on the idea that similar images will have similar embeddings. In the UI, you can select one of more images and search for the images similar to them. This can be useful when you want to find images similar to a given image or a set of images that don't perform as expected.

 For example:
-In this VOC Exploration dashboard, user selects a couple aeroplane images like this:
+In this VOC Exploration dashboard, user selects a couple airplane images like this:
 <p>
 <img width="1710" alt="Explorer Dashboard Screenshot 2" src="https://github.com/RizwanMunawar/RizwanMunawar/assets/62513924/3becdc1d-45dc-43b7-88ff-84ff0b443894">
 </p>
--- a/docs/en/datasets/index.md
+++ b/docs/en/datasets/index.md
@ -34,7 +34,7 @@ Bounding box object detection is a computer vision technique that involves detec
 - [VOC](detect/voc.md): The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images.
 - [xView](detect/xview.md): A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects.
 - [Roboflow 100](detect/roboflow-100.md): A diverse object detection benchmark with 100 datasets spanning seven imagery domains for comprehensive model evaluation.
-  
+
 ## [Instance Segmentation Datasets](segment/index.md)

 Instance segmentation is a computer vision technique that involves identifying and localizing objects in an image at the pixel level.
--- a/docs/en/datasets/obb/dota-v2.md
+++ b/docs/en/datasets/obb/dota-v2.md
@ -68,7 +68,7 @@ Typically, datasets incorporate a YAML (Yet Another Markup Language) file detail

 ## Split DOTA images

-To train DOTA dataset, We split original DOTA images with high-resolution into images with 1024x1024 resolution in multi-scale way.
+To train DOTA dataset, we split original DOTA images with high-resolution into images with 1024x1024 resolution in multiscale way.

 !!! Example "Split images"

--- a/docs/en/guides/conda-quickstart.md
+++ b/docs/en/guides/conda-quickstart.md
@ -72,7 +72,7 @@ from ultralytics import YOLO

 model = YOLO('yolov8n.pt')  # initialize model
 results = model('path/to/image.jpg')  # perform inference
-results.show()  # display results
+results[0].show()  # display results for the first image
 ```

 ---
--- a/docs/en/guides/coral-edge-tpu-on-raspberry-pi.md
+++ b/docs/en/guides/coral-edge-tpu-on-raspberry-pi.md
@ -12,7 +12,7 @@ keywords: Ultralytics, YOLOv8, Object Detection, Coral, Edge TPU, Raspberry Pi,

 ## What is a Coral Edge TPU?

-The Coral Edge TPU is a compact device that adds an Edge TPU coprocessor to your system. It enables low-power, high-performance ML inferencing for TensorFlow Lite models. Read more at the [Coral Edge TPU home page](https://coral.ai/products/accelerator).
+The Coral Edge TPU is a compact device that adds an Edge TPU coprocessor to your system. It enables low-power, high-performance ML inference for TensorFlow Lite models. Read more at the [Coral Edge TPU home page](https://coral.ai/products/accelerator).

 ## Boost Raspberry Pi Model Performance with Coral Edge TPU

@ -37,16 +37,16 @@ This guide assumes that you already have a working Raspberry Pi OS install and h

 First, we need to install the Edge TPU runtime. There are many different versions available, so you need to choose the right version for your operating system.

-| Raspberry Pi OS | High frequency mode | Version to download                      |
-|-----------------|:-------------------:|------------------------------------------|
-| Bullseye 32bit  |         No          | libedgetpu1-std_ ... .bullseye_armhf.deb |
-| Bullseye 64bit  |         No          | libedgetpu1-std_ ... .bullseye_arm64.deb |
-| Bullseye 32bit  |         Yes         | libedgetpu1-max_ ... .bullseye_armhf.deb |
-| Bullseye 64bit  |         Yes         | libedgetpu1-max_ ... .bullseye_arm64.deb |
-| Bookworm 32bit  |         No          | libedgetpu1-std_ ... .bookworm_armhf.deb |
-| Bookworm 64bit  |         No          | libedgetpu1-std_ ... .bookworm_arm64.deb |
-| Bookworm 32bit  |         Yes         | libedgetpu1-max_ ... .bookworm_armhf.deb |
-| Bookworm 64bit  |         Yes         | libedgetpu1-max_ ... .bookworm_arm64.deb |
+| Raspberry Pi OS | High frequency mode | Version to download                        |
+|-----------------|:-------------------:|--------------------------------------------|
+| Bullseye 32bit  |         No          | `libedgetpu1-std_ ... .bullseye_armhf.deb` |
+| Bullseye 64bit  |         No          | `libedgetpu1-std_ ... .bullseye_arm64.deb` |
+| Bullseye 32bit  |         Yes         | `libedgetpu1-max_ ... .bullseye_armhf.deb` |
+| Bullseye 64bit  |         Yes         | `libedgetpu1-max_ ... .bullseye_arm64.deb` |
+| Bookworm 32bit  |         No          | `libedgetpu1-std_ ... .bookworm_armhf.deb` |
+| Bookworm 64bit  |         No          | `libedgetpu1-std_ ... .bookworm_arm64.deb` |
+| Bookworm 32bit  |         Yes         | `libedgetpu1-max_ ... .bookworm_armhf.deb` |
+| Bookworm 64bit  |         Yes         | `libedgetpu1-max_ ... .bookworm_arm64.deb` |

 [Download the latest version from here](https://github.com/feranick/libedgetpu/releases).

--- a/docs/en/guides/distance-calculation.md
+++ b/docs/en/guides/distance-calculation.md
@ -10,6 +10,17 @@ keywords: Ultralytics, YOLOv8, Object Detection, Distance Calculation, Object Tr

 Measuring the gap between two objects is known as distance calculation within a specified space. In the case of [Ultralytics YOLOv8](https://github.com/ultralytics/ultralytics), the bounding box centroid is employed to calculate the distance for bounding boxes highlighted by the user.

+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/LE8am1QoVn4"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> Distance Calculation using Ultralytics YOLOv8
+</p>
+
 ## Visuals

 |                                                  Distance Calculation using Ultralytics YOLOv8                                                  |                                                                
--- a/docs/en/guides/hyperparameter-tuning.md
+++ b/docs/en/guides/hyperparameter-tuning.md
@ -23,7 +23,7 @@ Hyperparameters are high-level, structural settings for the algorithm. They are
  <img width="640" src="https://user-images.githubusercontent.com/26833433/263858934-4f109a2f-82d9-4d08-8bd6-6fd1ff520bcd.png" alt="Hyperparameter Tuning Visual">
 </p>

-For a full list of augmentation hyperparameters used in YOLOv8 please refer to the [configurations page](../usage/cfg.md#augmentation).
+For a full list of augmentation hyperparameters used in YOLOv8 please refer to the [configurations page](../usage/cfg.md#augmentation-settings).

 ### Genetic Evolution and Mutation

--- a/docs/en/guides/instance-segmentation-and-tracking.md
+++ b/docs/en/guides/instance-segmentation-and-tracking.md
@ -1,7 +1,7 @@
 ---
 comments: true
 description: Instance Segmentation with Object Tracking using Ultralytics YOLOv8
-keywords: Ultralytics, YOLOv8, Instance Segmentation, Object Detection, Object Tracking, Segbbox, Computer Vision, Notebook, IPython Kernel, CLI, Python SDK
+keywords: Ultralytics, YOLOv8, Instance Segmentation, Object Detection, Object Tracking, Bounding Box, Computer Vision, Notebook, IPython Kernel, CLI, Python SDK
 ---

 # Instance Segmentation and Tracking using Ultralytics YOLOv8 🚀
@ -16,6 +16,17 @@ There are two types of instance segmentation tracking available in the Ultralyti

 - **Instance Segmentation with Object Tracks:** Every track is represented by a distinct color, facilitating easy identification and tracking.

+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/75G_S1Ngji8"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> Instance Segmentation with Object Tracking using Ultralytics YOLOv8
+</p>
+
 ## Samples

 |                                                          Instance Segmentation                                                          |                                                           Instance Segmentation + Object Tracking                                                            |
--- a/docs/en/guides/isolating-segmentation-objects.md
+++ b/docs/en/guides/isolating-segmentation-objects.md
@ -14,12 +14,12 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab

 ## Recipe Walk Through

-1.  Begin with the necessary imports
+1. Begin with the necessary imports

-    ```py
+    ```python
    from pathlib import Path

-    import cv2 as cv
+    import cv2
    import numpy as np
    from ultralytics import YOLO
    ```
@ -30,19 +30,19 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab

    ***

-2.  Load a model and run `predict()` method on a source.
+2. Load a model and run `predict()` method on a source.

-    ```py
+    ```python
    from ultralytics import YOLO

    # Load a model
    model = YOLO('yolov8n-seg.pt')

    # Run inference
-    result = model.predict()
+    results = model.predict()
    ```

-    ??? question "No Prediction Arguments?"
+    !!! question "No Prediction Arguments?"

        Without specifying a source, the example images from the library will be used:

@ -57,7 +57,7 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab

    ***

-3.  Now iterate over the results and the contours. For workflows that want to save an image to file, the source image `base-name` and the detection `class-label` are retrieved for later use (optional).
+3. Now iterate over the results and the contours. For workflows that want to save an image to file, the source image `base-name` and the detection `class-label` are retrieved for later use (optional).

    ```{ .py .annotate }
    # (2) Iterate detection results (helpful for multiple images)
@ -81,7 +81,7 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab

    ***

-4.  Start with generating a binary mask from the source image and then draw a filled contour onto the mask. This will allow the object to be isolated from the other parts of the image. An example from `bus.jpg` for one of the detected `person` class objects is shown on the right.
+4. Start with generating a binary mask from the source image and then draw a filled contour onto the mask. This will allow the object to be isolated from the other parts of the image. An example from `bus.jpg` for one of the detected `person` class objects is shown on the right.

    ![Binary Mask Image](https://github.com/ultralytics/ultralytics/assets/62214284/59bce684-fdda-4b17-8104-0b4b51149aca){ width="240", align="right" }

@ -98,11 +98,11 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab


    # Draw contour onto mask
-    _ = cv.drawContours(b_mask,
+    _ = cv2.drawContours(b_mask,
                        [contour],
                        -1,
                        (255, 255, 255),
-                        cv.FILLED)
+                        cv2.FILLED)

    ```

@ -136,7 +136,7 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab

    - The `tuple` `(255, 255, 255)` represents the color white, which is the desired color for drawing the contour in this binary mask.

-    - The addition of `cv.FILLED` will color all pixels enclosed by the contour boundary the same, in this case, all enclosed pixels will be white.
+    - The addition of `cv2.FILLED` will color all pixels enclosed by the contour boundary the same, in this case, all enclosed pixels will be white.

    - See [OpenCV Documentation on `drawContours()`](https://docs.opencv.org/4.8.0/d6/d6e/group__imgproc__draw.html#ga746c0625f1781f1ffc9056259103edbc) for more information.

@ -145,7 +145,7 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab

    ***

-5.  Next the there are 2 options for how to move forward with the image from this point and a subsequent option for each.
+5. Next the there are 2 options for how to move forward with the image from this point and a subsequent option for each.

    ### Object Isolation Options

@ -155,10 +155,10 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab

            ```py
            # Create 3-channel mask
-            mask3ch = cv.cvtColor(b_mask, cv.COLOR_GRAY2BGR)
+            mask3ch = cv2.cvtColor(b_mask, cv2.COLOR_GRAY2BGR)

            # Isolate object with binary mask
-            isolated = cv.bitwise_and(mask3ch, img)
+            isolated = cv2.bitwise_and(mask3ch, img)

            ```

@ -258,7 +258,7 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab

    ***

-6.  <u>What to do next is entirely left to you as the developer.</u> A basic example of one possible next step (saving the image to file for future use) is shown.
+6. <u>What to do next is entirely left to you as the developer.</u> A basic example of one possible next step (saving the image to file for future use) is shown.

    - **NOTE:** this step is optional and can be skipped if not required for your specific use case.

@ -266,7 +266,7 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab

        ```py
        # Save isolated object to file
-        _ = cv.imwrite(f'{img_name}_{label}-{ci}.png', iso_crop)
+        _ = cv2.imwrite(f'{img_name}_{label}-{ci}.png', iso_crop)
        ```

        - In this example, the `img_name` is the base-name of the source image file, `label` is the detected class-name, and `ci` is the index of the object detection (in case of multiple instances with the same class name).
@ -278,7 +278,7 @@ Here, all steps from the previous section are combined into a single block of co
 ```{ .py .annotate }
 from pathlib import Path

-import cv2 as cv
+import cv2
 import numpy as np
 from ultralytics import YOLO

@ -298,13 +298,13 @@ for r in res:

        # Create contour mask (1)
        contour = c.masks.xy.pop().astype(np.int32).reshape(-1, 1, 2)
-        _ = cv.drawContours(b_mask, [contour], -1, (255, 255, 255), cv.FILLED)
+        _ = cv2.drawContours(b_mask, [contour], -1, (255, 255, 255), cv2.FILLED)

        # Choose one:

        # OPTION-1: Isolate object with black background
-        mask3ch = cv.cvtColor(b_mask, cv.COLOR_GRAY2BGR)
-        isolated = cv.bitwise_and(mask3ch, img)
+        mask3ch = cv2.cvtColor(b_mask, cv2.COLOR_GRAY2BGR)
+        isolated = cv2.bitwise_and(mask3ch, img)

        # OPTION-2: Isolate object with transparent background (when saved as PNG)
        isolated = np.dstack([img, b_mask])
--- a/docs/en/guides/model-deployment-options.md
+++ b/docs/en/guides/model-deployment-options.md
@ -240,9 +240,9 @@ PaddlePaddle is an open-source deep learning framework developed by Baidu. It is

 - **Hardware Acceleration**: Supports various hardware accelerations, including Baidu's own Kunlun chips.

-#### ncnn
+#### NCNN

-ncnn is a high-performance neural network inference framework optimized for the mobile platform. It stands out for its lightweight nature and efficiency, making it particularly well-suited for mobile and embedded devices where resources are limited.
+NCNN is a high-performance neural network inference framework optimized for the mobile platform. It stands out for its lightweight nature and efficiency, making it particularly well-suited for mobile and embedded devices where resources are limited.

 - **Performance Benchmarks**: Highly optimized for mobile platforms, offering efficient inference on ARM-based devices.

@ -276,7 +276,7 @@ The following table provides a snapshot of the various deployment options availa
 | TF Edge TPU       | Optimized for Google's Edge TPU hardware        | Exclusive to Edge TPU devices                  | Growing with Google and third-party resources | IoT devices requiring real-time processing | Improvements for new Edge TPU hardware      | Google's robust IoT security                      | Custom-designed for Google Coral   |
 | TF.js             | Reasonable in-browser performance               | High with web technologies                     | Web and Node.js developers support            | Interactive web applications               | TensorFlow team and community contributions | Web platform security model                       | Enhanced with WebGL and other APIs |
 | PaddlePaddle      | Competitive, easy to use and scalable           | Baidu ecosystem, wide application support      | Rapidly growing, especially in China          | Chinese market and language processing     | Focus on Chinese AI applications            | Emphasizes data privacy and security              | Including Baidu's Kunlun chips     |
-| ncnn              | Optimized for mobile ARM-based devices          | Mobile and embedded ARM systems                | Niche but active mobile/embedded ML community | Android and ARM systems efficiency         | High performance maintenance on ARM         | On-device security advantages                     | ARM CPUs and GPUs optimizations    |
+| NCNN              | Optimized for mobile ARM-based devices          | Mobile and embedded ARM systems                | Niche but active mobile/embedded ML community | Android and ARM systems efficiency         | High performance maintenance on ARM         | On-device security advantages                     | ARM CPUs and GPUs optimizations    |

 This comparative analysis gives you a high-level overview. For deployment, it's essential to consider the specific requirements and constraints of your project, and consult the detailed documentation and resources available for each option.

--- a/docs/en/guides/object-counting.md
+++ b/docs/en/guides/object-counting.md
@ -175,8 +175,8 @@ Object counting with [Ultralytics YOLOv8](https://github.com/ultralytics/ultraly
 | Name                  | Type        | Default                    | Description                                   |
 |-----------------------|-------------|----------------------------|-----------------------------------------------|
 | `view_img`            | `bool`      | `False`                    | Display frames with counts                    |
-| `view_in_counts`      | `bool`      | `True`                     | Display incounts only on video frame          |
-| `view_out_counts`     | `bool`      | `True`                     | Display outcounts only on video frame         |
+| `view_in_counts`      | `bool`      | `True`                     | Display in-counts only on video frame         |
+| `view_out_counts`     | `bool`      | `True`                     | Display out-counts only on video frame        |
 | `line_thickness`      | `int`       | `2`                        | Increase bounding boxes thickness             |
 | `reg_pts`             | `list`      | `[(20, 400), (1260, 400)]` | Points defining the Region Area               |
 | `classes_names`       | `dict`      | `model.model.names`        | Dictionary of Class Names                     |
--- a/docs/en/guides/object-cropping.md
+++ b/docs/en/guides/object-cropping.md
@ -16,7 +16,6 @@ Object cropping with [Ultralytics YOLOv8](https://github.com/ultralytics/ultraly
 - **Reduced Data Volume**: By extracting only relevant objects, object cropping helps in minimizing data size, making it efficient for storage, transmission, or subsequent computational tasks.
 - **Enhanced Precision**: YOLOv8's object detection accuracy ensures that the cropped objects maintain their spatial relationships, preserving the integrity of the visual information for detailed analysis.

-
 ## Visuals

 |                                                                               Airport Luggage                                                                                |                                                                                                                         
@ -24,7 +23,6 @@ Object cropping with [Ultralytics YOLOv8](https://github.com/ultralytics/ultraly
 | ![Conveyor Belt at Airport Suitcases Cropping using Ultralytics YOLOv8](https://github.com/RizwanMunawar/RizwanMunawar/assets/62513924/648f46be-f233-4307-a8e5-046eea38d2e4) |
 |                                                     Suitcases Cropping at airport conveyor belt using Ultralytics YOLOv8                                                     |                                                                                                       

-
 !!! Example "Object Cropping using YOLOv8 Example"

    === "Object Cropping"
--- a/docs/en/guides/speed-estimation.md
+++ b/docs/en/guides/speed-estimation.md
@ -10,6 +10,17 @@ keywords: Ultralytics, YOLOv8, Object Detection, Speed Estimation, Object Tracki

 Speed estimation is the process of calculating the rate of movement of an object within a given context, often employed in computer vision applications. Using [Ultralytics YOLOv8](https://github.com/ultralytics/ultralytics/) you can now calculate the speed of object using [object tracking](https://docs.ultralytics.com/modes/track/) alongside distance and time data, crucial for tasks like traffic and surveillance. The accuracy of speed estimation directly influences the efficiency and reliability of various applications, making it a key component in the advancement of intelligent systems and real-time decision-making processes.

+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/rCggzXRRSRo"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> Speed Estimation using Ultralytics YOLOv8
+</p>
+
 ## Advantages of Speed Estimation?

 - **Efficient Traffic Control:** Accurate speed estimation aids in managing traffic flow, enhancing safety, and reducing congestion on roadways.
--- a/docs/en/guides/vision-eye.md
+++ b/docs/en/guides/vision-eye.md
@ -12,10 +12,10 @@ keywords: Ultralytics, YOLOv8, Object Detection, Object Tracking, IDetection, Vi

 ## Samples

-|                                                                        VisionEye View                                                                        |                                                                        VisionEye View With Object Tracking                                                                        |
-|:------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
-| ![VisionEye View Object Mapping using Ultralytics YOLOv8](https://github.com/RizwanMunawar/ultralytics/assets/62513924/7d593acc-2e37-41b0-ad0e-92b4ffae6647) | ![VisionEye View Object Mapping with Object Tracking using Ultralytics YOLOv8](https://github.com/RizwanMunawar/ultralytics/assets/62513924/fcd85952-390f-451e-8fb0-b82e943af89c) |
-|                                                    VisionEye View Object Mapping using Ultralytics YOLOv8                                                    |                                                    VisionEye View Object Mapping with Object Tracking using Ultralytics YOLOv8                                                    |
+|                                                                        VisionEye View                                                                        |                                                                        VisionEye View With Object Tracking                                                                        |                                                                 VisionEye View With Distance Calculation                                                                  |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| ![VisionEye View Object Mapping using Ultralytics YOLOv8](https://github.com/RizwanMunawar/ultralytics/assets/62513924/7d593acc-2e37-41b0-ad0e-92b4ffae6647) | ![VisionEye View Object Mapping with Object Tracking using Ultralytics YOLOv8](https://github.com/RizwanMunawar/ultralytics/assets/62513924/fcd85952-390f-451e-8fb0-b82e943af89c) | ![VisionEye View with Distance Calculation using Ultralytics YOLOv8](https://github.com/RizwanMunawar/RizwanMunawar/assets/62513924/18c4dafe-a22e-4fa9-a7d4-2bb293562a95) |
+|                                                    VisionEye View Object Mapping using Ultralytics YOLOv8                                                    |                                                    VisionEye View Object Mapping with Object Tracking using Ultralytics YOLOv8                                                    |                                                     VisionEye View with Distance Calculation using Ultralytics YOLOv8                                                     |

 !!! Example "VisionEye Object Mapping using YOLOv8"

@ -105,6 +105,63 @@ keywords: Ultralytics, YOLOv8, Object Detection, Object Tracking, IDetection, Vi
        cap.release()
        cv2.destroyAllWindows()
        ```
+    
+    === "VisionEye with Distance Calculation"
+    
+        ```python
+        import cv2
+        import math
+        from ultralytics import YOLO
+        from ultralytics.utils.plotting import Annotator, colors
+        
+        model = YOLO("yolov8s.pt")
+        cap = cv2.VideoCapture("Path/to/video/file.mp4")
+        
+        w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))
+        
+        out = cv2.VideoWriter('visioneye-distance-calculation.avi', cv2.VideoWriter_fourcc(*'MJPG'), fps, (w, h))
+        
+        center_point = (0, h)
+        pixel_per_meter = 10
+        
+        txt_color, txt_background, bbox_clr = ((0, 0, 0), (255, 255, 255), (255, 0, 255))
+        
+        while True:
+            ret, im0 = cap.read()
+            if not ret:
+                print("Video frame is empty or video processing has been successfully completed.")
+                break
+        
+            annotator = Annotator(im0, line_width=2)
+        
+            results = model.track(im0, persist=True)
+            boxes = results[0].boxes.xyxy.cpu()
+        
+            if results[0].boxes.id is not None:
+                track_ids = results[0].boxes.id.int().cpu().tolist()
+        
+                for box, track_id in zip(boxes, track_ids):
+                    annotator.box_label(box, label=str(track_id), color=bbox_clr)
+                    annotator.visioneye(box, center_point)
+        
+                    x1, y1 = int((box[0] + box[2]) // 2), int((box[1] + box[3]) // 2)    # Bounding box centroid
+        
+                    distance = (math.sqrt((x1 - center_point[0]) ** 2 + (y1 - center_point[1]) ** 2))/pixel_per_meter
+        
+                    text_size, _ = cv2.getTextSize(f"Distance: {distance:.2f} m", cv2.FONT_HERSHEY_SIMPLEX,1.2, 3)
+                    cv2.rectangle(im0, (x1, y1 - text_size[1] - 10),(x1 + text_size[0] + 10, y1), txt_background, -1)
+                    cv2.putText(im0, f"Distance: {distance:.2f} m",(x1, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 1.2,txt_color, 3)
+        
+            out.write(im0)
+            cv2.imshow("visioneye-distance-calculation", im0)
+        
+            if cv2.waitKey(1) & 0xFF == ord('q'):
+                break
+        
+        out.release()
+        cap.release()
+        cv2.destroyAllWindows()
+        ```

 ### `visioneye` Arguments

--- a/docs/en/guides/workouts-monitoring.md
+++ b/docs/en/guides/workouts-monitoring.md
@ -13,7 +13,7 @@ Monitoring workouts through pose estimation with [Ultralytics YOLOv8](https://gi
 - **Optimized Performance:** Tailoring workouts based on monitoring data for better results.
 - **Goal Achievement:** Track and adjust fitness goals for measurable progress.
 - **Personalization:** Customized workout plans based on individual data for effectiveness.
- **Health Awareness:** Early detection of patterns indicating health issues or overtraining.
+- **Health Awareness:** Early detection of patterns indicating health issues or over-training.
 - **Informed Decisions:** Data-driven decisions for adjusting routines and setting realistic goals.

 ## Real World Applications
@ -109,7 +109,7 @@ Monitoring workouts through pose estimation with [Ultralytics YOLOv8](https://gi
 | `kpts_to_check`   | `list` | `None`   | List of three keypoints index, for counting specific workout, followed by keypoint Map |
 | `view_img`        | `bool` | `False`  | Display the frame with counts                                                          |
 | `line_thickness`  | `int`  | `2`      | Increase the thickness of count value                                                  |
-| `pose_type`       | `str`  | `pushup` | Pose that need to be monitored, "pullup" and "abworkout" also supported                |
+| `pose_type`       | `str`  | `pushup` | Pose that need to be monitored, `pullup` and `abworkout` also supported                |
 | `pose_up_angle`   | `int`  | `145`    | Pose Up Angle value                                                                    |
 | `pose_down_angle` | `int`  | `90`     | Pose Down Angle value                                                                  |

--- a/docs/en/guides/yolo-performance-metrics.md
+++ b/docs/en/guides/yolo-performance-metrics.md
@ -18,7 +18,7 @@ Performance metrics are key tools to evaluate the accuracy and efficiency of obj
    allowfullscreen>
  </iframe>
  <br>
-  <strong>Watch:</strong>  Ultralytics YOLOv8 Performance Metrics | MAP, F1 Score, Precision, IOU & Accuracy
+  <strong>Watch:</strong>  Ultralytics YOLOv8 Performance Metrics | MAP, F1 Score, Precision, IoU & Accuracy
 </p>

 ## Object Detection Metrics
--- a/docs/en/hub/app/android.md
+++ b/docs/en/hub/app/android.md
@ -31,6 +31,17 @@ keywords: Ultralytics, Android App, real-time object detection, YOLO models, Ten

 The Ultralytics Android App is a powerful tool that allows you to run YOLO models directly on your Android device for real-time object detection. This app utilizes TensorFlow Lite for model optimization and various hardware delegates for acceleration, enabling fast and efficient object detection.

+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/AIvrQ7y0aLo"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> Getting Started with the Ultralytics HUB App (IOS & Android)
+</p>
+
 ## Quantization and Acceleration

 To achieve real-time performance on your Android device, YOLO models are quantized to either FP16 or INT8 precision. Quantization is a process that reduces the numerical precision of the model's weights and biases, thus reducing the model's size and the amount of computation required. This results in faster inference times without significantly affecting the model's accuracy.
--- a/docs/en/hub/app/ios.md
+++ b/docs/en/hub/app/ios.md
@ -31,6 +31,17 @@ keywords: Ultralytics, iOS app, object detection, YOLO models, real time, Apple

 The Ultralytics iOS App is a powerful tool that allows you to run YOLO models directly on your iPhone or iPad for real-time object detection. This app utilizes the Apple Neural Engine and Core ML for model optimization and acceleration, enabling fast and efficient object detection.

+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/AIvrQ7y0aLo"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> Getting Started with the Ultralytics HUB App (IOS & Android)
+</p>
+
 ## Quantization and Acceleration

 To achieve real-time performance on your iOS device, YOLO models are quantized to either FP16 or INT8 precision. Quantization is a process that reduces the numerical precision of the model's weights and biases, thus reducing the model's size and the amount of computation required. This results in faster inference times without significantly affecting the model's accuracy.
--- a/docs/en/hub/cloud-training.md
+++ b/docs/en/hub/cloud-training.md
@ -6,12 +6,23 @@ keywords: Ultralytics, HUB Models, AI model training, model creation, model trai

 # Cloud Training

-[Ultralytics HUB](https://hub.ultralytics.com/) provides a powerful and user-friendly cloud platform to train custom object detection models. Easily select your dataset and the desired training method, then kick off the process with just a few clicks.  Ultralytics HUB offers pre-built options and various model architectures to streamline your workflow.
+[Ultralytics HUB](https://hub.ultralytics.com/) provides a powerful and user-friendly cloud platform to train custom object detection models. Easily select your dataset and the desired training method, then kick off the process with just a few clicks. Ultralytics HUB offers pre-built options and various model architectures to streamline your workflow.

 ![cloud training cover](https://github.com/ultralytics/ultralytics/assets/19519529/cbfdb3b8-ad35-44a6-afe6-61ec0b8e8b8d)

 Read more about creating and other details of a Model at our [HUB Models page](models.md)

+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/ie3vLUDNYZo"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> New Feature 🌟 Introducing Ultralytics HUB Cloud Training
+</p>
+
 ## Selecting an Instance

 For details on picking a model and instances for it, please read our [Instances guide Page](models.md)
--- a/docs/en/integrations/amazon-sagemaker.md
+++ b/docs/en/integrations/amazon-sagemaker.md
@ -38,7 +38,7 @@ First, ensure you have the following prerequisites in place:

 - AWS CDK: If not already installed, install the AWS Cloud Development Kit (CDK), which will be used for scripting the deployment. Follow [the AWS CDK instructions](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_install) for installation.

- Adequate Service Quota: Confirm that you have sufficient quotas for two separate resources in Amazon SageMaker: one for ml.m5.4xlarge for endpoint usage and another for ml.m5.4xlarge for notebook instance usage. Each of these requires a minimum of one quota value. If your current quotas are below this requirement, it's important to request an increase for each. You can request a quota increase by following the detailed instructions in the [AWS Service Quotas documentation](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html#quota-console-increase).
+- Adequate Service Quota: Confirm that you have sufficient quotas for two separate resources in Amazon SageMaker: one for `ml.m5.4xlarge` for endpoint usage and another for `ml.m5.4xlarge` for notebook instance usage. Each of these requires a minimum of one quota value. If your current quotas are below this requirement, it's important to request an increase for each. You can request a quota increase by following the detailed instructions in the [AWS Service Quotas documentation](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html#quota-console-increase).

 ### Step 2: Clone the YOLOv8 SageMaker Repository

@ -115,17 +115,21 @@ After creating the AWS CloudFormation Stack, the next step is to deploy YOLOv8.
 - Access and Modify inference.py: After opening the SageMaker notebook instance in Jupyter, locate the inference.py file. Edit the output_fn function in inference.py as shown below and save your changes to the script, ensuring that there are no syntax errors.

 ```python
+import json
+
 def output_fn(prediction_output, content_type):
    print("Executing output_fn from inference.py ...")
    infer = {}
    for result in prediction_output:
-        if 'boxes' in result._keys and result.boxes is not None:
+        if result.boxes is not None:
            infer['boxes'] = result.boxes.numpy().data.tolist()
-        if 'masks' in result._keys and result.masks is not None:
+        if result.masks is not None:
            infer['masks'] = result.masks.numpy().data.tolist()
-        if 'keypoints' in result._keys and result.keypoints is not None:
+        if result.keypoints is not None:
            infer['keypoints'] = result.keypoints.numpy().data.tolist()
-        if 'probs' in result._keys and result.probs is not None:
+        if result.obb is not None:
+            infer['obb'] = result.obb.numpy().data.tolist()
+        if result.probs is not None:
            infer['probs'] = result.probs.numpy().data.tolist()
    return json.dumps(infer)
 ```
--- a/docs/en/integrations/coreml.md
+++ b/docs/en/integrations/coreml.md
@ -18,9 +18,10 @@ The CoreML export format allows you to optimize your [Ultralytics YOLOv8](https:

 [CoreML](https://developer.apple.com/documentation/coreml) is Apple's foundational machine learning framework that builds upon Accelerate, BNNS, and Metal Performance Shaders. It provides a machine-learning model format that seamlessly integrates into iOS applications and supports tasks such as image analysis, natural language processing, audio-to-text conversion, and sound analysis.

-Applications can take advantage of Core ML without the need to have a network connection or API calls because the Core ML framework works using on-device computing. This means model inferencing can be performed locally on the user's device.
+Applications can take advantage of Core ML without the need to have a network connection or API calls because the Core ML framework works using on-device computing. This means model inference can be performed locally on the user's device.

 ## Key Features of CoreML Models
+
 Apple's CoreML framework offers robust features for on-device machine learning. Here are the key features that make CoreML a powerful tool for developers:

 - **Comprehensive Model Support**: Converts and runs models from popular frameworks like TensorFlow, PyTorch, scikit-learn, XGBoost, and LibSVM.
--- a/docs/en/integrations/index.md
+++ b/docs/en/integrations/index.md
@ -40,16 +40,22 @@ Welcome to the Ultralytics Integrations page! This page provides an overview of

 - [Neural Magic](neural-magic.md): Leverage Quantization Aware Training (QAT) and pruning techniques to optimize Ultralytics models for superior performance and leaner size.

- [Gradio](../integrations/gradio.md) 🚀 NEW: Deploy Ultralytics models with Gradio for real-time, interactive object detection demos.
+- [Gradio](gradio.md) 🚀 NEW: Deploy Ultralytics models with Gradio for real-time, interactive object detection demos.

- [OpenVINO](openvino.md): Intel's toolkit for optimizing and deploying computer vision models efficiently across various Intel CPU and GPU platforms.
+- [TorchScript](torchscript.md): Developed as part of the [PyTorch](https://pytorch.org/) framework, TorchScript enables efficient execution and deployment of machine learning models in various production environments without the need for Python dependencies.

 - [ONNX](onnx.md): An open-source format created by [Microsoft](https://www.microsoft.com) for facilitating the transfer of AI models between various frameworks, enhancing the versatility and deployment flexibility of Ultralytics models.

+- [OpenVINO](openvino.md): Intel's toolkit for optimizing and deploying computer vision models efficiently across various Intel CPU and GPU platforms.
+
 - [TensorRT](tensorrt.md): Developed by [NVIDIA](https://www.nvidia.com/), this high-performance deep learning inference framework and model format optimizes AI models for accelerated speed and efficiency on NVIDIA GPUs, ensuring streamlined deployment.

 - [CoreML](coreml.md): CoreML, developed by [Apple](https://www.apple.com/), is a framework designed for efficiently integrating machine learning models into applications across iOS, macOS, watchOS, and tvOS, using Apple's hardware for effective and secure model deployment.

+- [TFLite](tflite.md): Developed by [Google](https://www.google.com), TFLite is a lightweight framework for deploying machine learning models on mobile and edge devices, ensuring fast, efficient inference with minimal memory footprint.
+
+- [NCNN](ncnn.md): Developed by [Tencent](http://www.tencent.com/), NCNN is an efficient neural network inference framework tailored for mobile devices. It enables direct deployment of AI models into apps, optimizing performance across various mobile platforms.
+
 ### Export Formats

 We also support a variety of model export formats for deployment in different environments. Here are the available formats:
--- a/docs/en/integrations/ncnn.md
+++ b/docs/en/integrations/ncnn.md
@ -0,0 +1,120 @@
+---
+comments: true
+description: Uncover how to improve your Ultralytics YOLOv8 model's performance using the NCNN export format that is suitable for devices with limited computation resources.
+keywords: Ultralytics, YOLOv8, NCNN Export, Export YOLOv8, Model Deployment
+---
+
+# How to Export to NCNN from YOLOv8 for Smooth Deployment
+
+Deploying computer vision models on devices with limited computational power, such as mobile or embedded systems, can be tricky. You need to make sure you use a format optimized for optimal performance. This makes sure that even devices with limited processing power can handle advanced computer vision tasks well.
+
+The export to NCNN format feature allows you to optimize your [Ultralytics YOLOv8](https://github.com/ultralytics/ultralytics) models for lightweight device-based applications. In this guide, we'll walk you through how to convert your models to the NCNN format, making it easier for your models to perform well on various mobile and embedded devices.
+
+## Why should you export to NCNN?
+
+<p align="center">
+  <img width="100%" src="https://repository-images.githubusercontent.com/494294418/207a2e12-dc16-41a6-a39e-eae26e662638" alt="NCNN overview">
+</p>
+
+The [NCNN](https://github.com/Tencent/ncnn) framework, developed by Tencent, is a high-performance neural network inference computing framework optimized specifically for mobile platforms, including mobile phones, embedded devices, and IoT devices. NCNN is compatible with a wide range of platforms, including Linux, Android, iOS, and macOS.
+
+NCNN is known for its fast processing speed on mobile CPUs and enables rapid deployment of deep learning models to mobile platforms. This makes it easier to build smart apps, putting the power of AI right at your fingertips.
+
+## Key Features of NCNN Models
+
+NCNN models offer a wide range of key features that enable on-device machine learning by helping developers run their models on mobile, embedded, and edge devices:
+
+- **Efficient and High-Performance**: NCNN models are made to be efficient and lightweight, optimized for running on mobile and embedded devices like Raspberry Pi with limited resources. They can also achieve high performance with high accuracy on various computer vision-based tasks.
+
+- **Quantization**: NCNN models often support quantization which is a technique that reduces the precision of the model's weights and activations. This leads to further improvements in performance and reduces memory footprint.
+
+- **Compatibility**: NCNN models are compatible with popular deep learning frameworks like [TensorFlow](https://www.tensorflow.org/), [Caffe](https://caffe.berkeleyvision.org/), and [ONNX](https://onnx.ai/). This compatibility allows developers to use existing models and workflows easily.
+
+- **Easy to Use**: NCNN models are designed for easy integration into various applications, thanks to their compatibility with popular deep learning frameworks. Additionally, NCNN offers user-friendly tools for converting models between different formats, ensuring smooth interoperability across the development landscape.
+
+## Deployment Options with NCNN
+
+Before we look at the code for exporting YOLOv8 models to the NCNN format, let’s understand how NCNN models are normally used.
+
+NCNN models, designed for efficiency and performance, are compatible with a variety of deployment platforms:
+
+- **Mobile Deployment**: Specifically optimized for Android and iOS, allowing for seamless integration into mobile applications for efficient on-device inference.
+
+- **Embedded Systems and IoT Devices**: If you find that running inference on a Raspberry Pi with the [Ultralytics Guide](../guides/raspberry-pi.md) isn't fast enough, switching to an NCNN exported model could help speed things up. NCNN is great for devices like Raspberry Pi and NVIDIA Jetson, especially in situations where you need quick processing right on the device.
+
+- **Desktop and Server Deployment**: Capable of being deployed in desktop and server environments across Linux, Windows, and macOS, supporting development, training, and evaluation with higher computational capacities.
+
+## Export to NCNN: Converting Your YOLOv8 Model
+
+You can expand model compatibility and deployment flexibility by converting YOLOv8 models to NCNN format.
+
+### Installation
+
+To install the required packages, run:
+
+!!! Tip "Installation"
+
+    === "CLI"
+
+        ```bash
+        # Install the required package for YOLOv8
+        pip install ultralytics
+        ```
+
+For detailed instructions and best practices related to the installation process, check our [Ultralytics Installation guide](../quickstart.md). While installing the required packages for YOLOv8, if you encounter any difficulties, consult our [Common Issues guide](../guides/yolo-common-issues.md) for solutions and tips.
+
+### Usage
+
+Before diving into the usage instructions, it's important to note that while all [Ultralytics YOLOv8 models](../models/index.md) are available for exporting, you can ensure that the model you select supports export functionality [here](../modes/export.md).
+
+!!! Example "Usage"
+
+    === "Python"
+
+          ```python
+          from ultralytics import YOLO
+          
+          # Load the YOLOv8 model
+          model = YOLO('yolov8n.pt')
+          
+          # Export the model to NCNN format
+          model.export(format='ncnn') # creates '/yolov8n_ncnn_model'
+          
+          # Load the exported NCNN model
+          ncnn_model = YOLO('./yolov8n_ncnn_model')
+          
+          # Run inference
+          results = ncnn_model('https://ultralytics.com/images/bus.jpg')
+          ```
+
+    === "CLI"
+
+          ```bash
+          # Export a YOLOv8n PyTorch model to NCNN format
+          yolo export model=yolov8n.pt format=ncnn  # creates '/yolov8n_ncnn_model'
+          
+          # Run inference with the exported model
+          yolo predict model='./yolov8n_ncnn_model' source='https://ultralytics.com/images/bus.jpg'
+          ```
+
+For more details about supported export options, visit the [Ultralytics documentation page on deployment options](../guides/model-deployment-options.md).
+
+## Deploying Exported YOLOv8 NCNN Models
+
+After successfully exporting your Ultralytics YOLOv8 models to NCNN format, you can now deploy them. The primary and recommended first step for running a NCNN model is to utilize the YOLO("./model_ncnn_model") method, as outlined in the previous usage code snippet. However, for in-depth instructions on deploying your NCNN models in various other settings, take a look at the following resources:
+
+- **[Android](https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-android)**: This blog explains how to use NCNN models for performing tasks like object detection through Android applications.
+
+- **[macOS](https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-macos)**: Understand how to use NCNN models for performing tasks through macOS.
+
+- **[Linux](https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-linux)**: Explore this page to learn how to deploy NCNN models on limited resource devices like Raspberry Pi and other similar devices.
+
+- **[Windows x64 using VS2017](https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-windows-x64-using-visual-studio-community-2017)**: Explore this blog to learn how to deploy NCNN models on windows x64 using Visual Studio Community 2017.
+
+## Summary
+
+In this guide, we've gone over exporting Ultralytics YOLOv8 models to the NCNN format. This conversion step is crucial for improving the efficiency and speed of YOLOv8 models, making them more effective and suitable for limited-resource computing environments.
+
+For detailed instructions on usage, please refer to the [official NCNN documentation](https://ncnn.readthedocs.io/en/latest/index.html).
+
+Also, if you're interested in exploring other integration options for Ultralytics YOLOv8, be sure to visit our [integration guide page](index.md) for further insights and information.
--- a/docs/en/integrations/neural-magic.md
+++ b/docs/en/integrations/neural-magic.md
@ -6,7 +6,7 @@ keywords: YOLOv8, DeepSparse Engine, Ultralytics, CPU Inference, Neural Network

 # Optimizing YOLOv8 Inferences with Neural Magic’s DeepSparse Engine

-When deploying object detection models like [Ultralytics’ YOLOv8](https://ultralytics.com) on various hardware, you can bump into unique issues like optimization. This is where YOLOv8’s integration with Neural Magic’s DeepSparse Engine steps in. It transforms the way YOLOv8 models are executed and enables GPU-level performance directly on CPUs.
+When deploying object detection models like [Ultralytics YOLOv8](https://ultralytics.com) on various hardware, you can bump into unique issues like optimization. This is where YOLOv8’s integration with Neural Magic’s DeepSparse Engine steps in. It transforms the way YOLOv8 models are executed and enables GPU-level performance directly on CPUs.

 This guide shows you how to deploy YOLOv8 using Neural Magic's DeepSparse, how to run inferences, and also how to benchmark performance to ensure it is optimized.

--- a/docs/en/integrations/tensorboard.md
+++ b/docs/en/integrations/tensorboard.md
@ -102,7 +102,7 @@ The Time Series feature in the TensorBoard offers a dynamic and detailed perspec

 #### Importance of Time Series in YOLOv8 Training

-The Time Series section is essential for a thorough analysis of the YOLOv8 model's training progress. It lets you track the metrics in real time so you can promptly identify and solve issues. It also offers a detailed view of each metric's progression, which is crucial for fine-tuning the model and enhancing its performance.
+The Time Series section is essential for a thorough analysis of the YOLOv8 model's training progress. It lets you track the metrics in real time to promptly identify and solve issues. It also offers a detailed view of each metric's progression, which is crucial for fine-tuning the model and enhancing its performance.

 ### Scalars

@ -146,7 +146,7 @@ Graphs are particularly useful for debugging the model, especially in complex ar

 ## Summary

-This guide aims to help you use TensorBoard with YOLOv8 for visualization and analysis of machine learning model training. It focuses on explaining how key TensorBoard features can provides insights into training metrics and model performance during YOLOv8 training sessions.
+This guide aims to help you use TensorBoard with YOLOv8 for visualization and analysis of machine learning model training. It focuses on explaining how key TensorBoard features can provide insights into training metrics and model performance during YOLOv8 training sessions.

 For a more detailed exploration of these features and effective utilization strategies, you can refer to TensorFlow’s official [TensorBoard documentation](https://www.tensorflow.org/tensorboard/get_started) and their [GitHub repository](https://github.com/tensorflow/tensorboard).

--- a/docs/en/integrations/tensorrt.md
+++ b/docs/en/integrations/tensorrt.md
@ -16,7 +16,7 @@ By using the TensorRT export format, you can enhance your [Ultralytics YOLOv8](h
  <img width="100%" src="https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-601/tensorrt-developer-guide/graphics/whatistrt2.png" alt="TensorRT Overview">
 </p>

-[TensorRT](https://developer.nvidia.com/tensorrt#:~:text=NVIDIA%20TensorRT%2DLLM%20is%20an,knowledge%20of%20C%2B%2B%20or%20CUDA.), developed by NVIDIA, is an advanced software development kit (SDK) designed for high-speed deep learning inference. It’s well-suited for real-time applications like object detection. 
+[TensorRT](https://developer.nvidia.com/tensorrt), developed by NVIDIA, is an advanced software development kit (SDK) designed for high-speed deep learning inference. It’s well-suited for real-time applications like object detection.

 This toolkit optimizes deep learning models for NVIDIA GPUs and results in faster and more efficient operations. TensorRT models undergo TensorRT optimization, which includes techniques like layer fusion, precision calibration (INT8 and FP16), dynamic tensor memory management, and kernel auto-tuning. Converting deep learning models into the TensorRT format allows developers to realize the potential of NVIDIA GPUs fully.

@ -40,7 +40,7 @@ TensorRT models offer a range of key features that contribute to their efficienc

 ## Deployment Options in TensorRT

-Before we look at the code for exporting YOLOv8 models to the TensorRT format, let’s understand where TensorRT models are normally used. 
+Before we look at the code for exporting YOLOv8 models to the TensorRT format, let’s understand where TensorRT models are normally used.

 TensorRT offers several deployment options, and each option balances ease of integration, performance optimization, and flexibility differently:

@ -52,7 +52,7 @@ TensorRT offers several deployment options, and each option balances ease of int

 - **Standalone TensorRT Runtime API**: Offers granular control, ideal for performance-critical applications. It's more complex but allows for custom implementation of unsupported operators.

- **NVIDIA Triton Inference Server**: An option that supports models from various frameworks. Particularly suited for cloud or edge inferencing, it provides features like concurrent model execution and model analysis.
+- **NVIDIA Triton Inference Server**: An option that supports models from various frameworks. Particularly suited for cloud or edge inference, it provides features like concurrent model execution and model analysis.

 ## Exporting YOLOv8 Models to TensorRT

--- a/docs/en/integrations/tflite.md
+++ b/docs/en/integrations/tflite.md
@ -0,0 +1,122 @@
+---
+comments: true
+description: Explore how to improve your Ultralytics YOLOv8 model's performance and interoperability using the TFLite export format suitable for edge computing environments.
+keywords: Ultralytics, YOLOv8, TFLite Export, Export YOLOv8, Model Deployment
+---
+
+# A Guide on YOLOv8 Model Export to TFLite for Deployment
+
+<p align="center">
+  <img width="75%" src="https://github.com/ultralytics/ultralytics/assets/26833433/6ecf34b9-9187-4d6f-815c-72394290a4d3" alt="TFLite Logo">
+</p>
+
+Deploying computer vision models on edge devices or embedded devices requires a format that can ensure seamless performance.
+
+The TensorFlow Lite or TFLite export format allows you to optimize your [Ultralytics YOLOv8](https://github.com/ultralytics/ultralytics) models for tasks like object detection and image classification in edge device-based applications. In this guide, we'll walk through the steps for converting your models to the TFLite format, making it easier for your models to perform well on various edge devices.
+
+## Why should you export to TFLite?
+
+Introduced by Google in May 2017 as part of their TensorFlow framework, [TensorFlow Lite](https://www.tensorflow.org/lite/guide), or TFLite for short, is an open-source deep learning framework designed for on-device inference, also known as edge computing. It gives developers the necessary tools to execute their trained models on mobile, embedded, and IoT devices, as well as traditional computers.
+
+TensorFlow Lite is compatible with a wide range of platforms, including embedded Linux, Android, iOS, and MCU. Exporting your model to TFLite makes your applications faster, more reliable, and capable of running offline.
+
+## Key Features of TFLite Models
+
+TFLite models offer a wide range of key features that enable on-device machine learning by helping developers run their models on mobile, embedded, and edge devices:
+
+- **On-device Optimization**: TFLite optimizes for on-device ML, reducing latency by processing data locally, enhancing privacy by not transmitting personal data, and minimizing model size to save space.
+
+- **Multiple Platform Support**: TFLite offers extensive platform compatibility, supporting Android, iOS, embedded Linux, and microcontrollers.
+
+- **Diverse Language Support**: TFLite is compatible with various programming languages, including Java, Swift, Objective-C, C++, and Python.
+
+- **High Performance**: Achieves superior performance through hardware acceleration and model optimization.
+
+## Deployment Options in TFLite
+
+Before we look at the code for exporting YOLOv8 models to the TFLite format, let’s understand how TFLite models are normally used.
+
+TFLite offers various on-device deployment options for machine learning models, including:
+
+- **Deploying with Android and iOS**: Both Android and iOS applications with TFLite can analyze edge-based camera feeds and sensors to detect and identify objects. TFLite also offers native iOS libraries written in [Swift](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/swift) and [Objective-C](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/objc). The architecture diagram below shows the process of deploying a trained model onto Android and iOS platforms using TensorFlow Lite.
+
+ <p align="center">
+  <img width="75%" src="https://1.bp.blogspot.com/-6fS9FD8KD7g/XhJ1l8y2S4I/AAAAAAAACKw/MW9MQZ8gtiYmUe0naRdN0n2FwkT1l4trACLcBGAsYHQ/s1600/architecture.png" alt="Architecture">
+</p>
+
+- **Implementing with Embedded Linux**: If running inferences on a [Raspberry Pi](https://www.raspberrypi.org/) using the [Ultralytics Guide](../guides/raspberry-pi.md) does not meet the speed requirements for your use case, you can use an exported TFLite model to accelerate inference times. Additionally, it's possible to further improve performance by utilizing a [Coral Edge TPU device](https://coral.withgoogle.com/).
+
+- **Deploying with Microcontrollers**: TFLite models can also be deployed on microcontrollers and other devices with only a few kilobytes of memory. The core runtime just fits in 16 KB on an Arm Cortex M3 and can run many basic models. It doesn't require operating system support, any standard C or C++ libraries, or dynamic memory allocation.
+
+## Export to TFLite: Converting Your YOLOv8 Model
+
+You can improve on-device model execution efficiency and optimize performance by converting them to TFLite format.
+
+### Installation
+
+To install the required packages, run:
+
+!!! Tip "Installation"
+
+    === "CLI"
+
+        ```bash
+        # Install the required package for YOLOv8
+        pip install ultralytics
+        ```
+
+For detailed instructions and best practices related to the installation process, check our [Ultralytics Installation guide](../quickstart.md). While installing the required packages for YOLOv8, if you encounter any difficulties, consult our [Common Issues guide](../guides/yolo-common-issues.md) for solutions and tips.
+
+### Usage
+
+Before diving into the usage instructions, it's important to note that while all [Ultralytics YOLOv8 models](../models/index.md) are available for exporting, you can ensure that the model you select supports export functionality [here](../modes/export.md).
+
+!!! Example "Usage"
+
+    === "Python"
+
+          ```python
+          from ultralytics import YOLO
+          
+          # Load the YOLOv8 model
+          model = YOLO('yolov8n.pt')
+          
+          # Export the model to TFLite format
+          model.export(format='tflite') # creates 'yolov8n_float32.tflite'
+          
+          # Load the exported TFLite model
+          tflite_model = YOLO('yolov8n_float32.tflite')
+          
+          # Run inference
+          results = tflite_model('https://ultralytics.com/images/bus.jpg')
+          ```
+
+    === "CLI"
+
+          ```bash
+          # Export a YOLOv8n PyTorch model to TFLite format
+          yolo export model=yolov8n.pt format=tflite  # creates 'yolov8n_float32.tflite'
+          
+          # Run inference with the exported model
+          yolo predict model='yolov8n_float32.tflite' source='https://ultralytics.com/images/bus.jpg'
+          ```
+
+For more details about the export process, visit the [Ultralytics documentation page on exporting](../modes/export.md).
+
+## Deploying Exported YOLOv8 TFLite Models
+
+After successfully exporting your Ultralytics YOLOv8 models to TFLite format, you can now deploy them. The primary and recommended first step for running a TFLite model is to utilize the YOLO("model.tflite") method, as outlined in the previous usage code snippet. However, for in-depth instructions on deploying your TFLite models in various other settings, take a look at the following resources:
+
+- **[Android](https://www.tensorflow.org/lite/android/quickstart)**: A quick start guide for integrating TensorFlow Lite into Android applications, providing easy-to-follow steps for setting up and running machine learning models.
+
+- **[iOS](https://www.tensorflow.org/lite/guide/ios)**: Check out this detailed guide for developers on integrating and deploying TensorFlow Lite models in iOS applications, offering step-by-step instructions and resources.
+
+- **[End-To-End Examples](https://www.tensorflow.org/lite/examples)**: This page provides an overview of various TensorFlow Lite examples, showcasing practical applications and tutorials designed to help developers implement TensorFlow Lite in their machine learning projects on mobile and edge devices.
+
+## Summary
+
+In this guide, we focused on how to export to TFLite format. By converting your Ultralytics YOLOv8 models to TFLite model format, you can improve the efficiency and speed of YOLOv8 models, making them more effective and suitable for edge computing environments.
+
+For further details on usage, visit [TFLite’s official documentation](https://www.tensorflow.org/lite/guide).
+
+Also, if you're curious about other Ultralytics YOLOv8 integrations, make sure to check out our [integration guide page](../integrations/index.md). You'll find tons of helpful info and insights waiting for you there.
--- a/docs/en/integrations/torchscript.md
+++ b/docs/en/integrations/torchscript.md
@ -0,0 +1,126 @@
+---
+comments: true
+description: Learn to export your Ultralytics YOLOv8 models to TorchScript format for deployment through platforms like embedded systems, web browsers, and C++ applications.
+keywords: Ultralytics, YOLOv8, Export to Torchscript, Model Optimization, Deployment, PyTorch, C++, Faster Inference
+---
+
+# YOLOv8 Model Export to TorchScript for Quick Deployment
+
+Deploying computer vision models across different environments, including embedded systems, web browsers, or platforms with limited Python support, requires a flexible and portable solution. TorchScript focuses on portability and the ability to run models in environments where the entire Python framework is unavailable. This makes it ideal for scenarios where you need to deploy your computer vision capabilities across various devices or platforms.
+
+Export to Torchscript to serialize your [Ultralytics YOLOv8](https://github.com/ultralytics/ultralytics) models for cross-platform compatibility and streamlined deployment. In this guide, we'll show you how to export your YOLOv8 models to the TorchScript format, making it easier for you to use them across a wider range of applications.
+
+## Why should you export to TorchScript?
+
+![Torchscript Overview](https://github.com/ultralytics/ultralytics/assets/26833433/6873349d-c2f6-4620-b3cc-7b26b0698d0b)
+
+Developed by the creators of PyTorch, TorchScript is a powerful tool for optimizing and deploying PyTorch models across a variety of platforms. Exporting YOLOv8 models to [TorchScript](https://pytorch.org/docs/stable/jit.html) is crucial for moving from research to real-world applications. TorchScript, part of the PyTorch framework, helps make this transition smoother by allowing PyTorch models to be used in environments that don't support Python.
+
+The process involves two techniques: tracing and scripting. Tracing records operations during model execution, while scripting allows for the definition of models using a subset of Python. These techniques ensure that models like YOLOv8 can still work their magic even outside their usual Python environment.
+
+![TorchScript Script and Trace](https://github.com/ultralytics/ultralytics/assets/26833433/ea9ea24f-a3a9-44bb-aca7-9c358d7490d7)
+
+TorchScript models can also be optimized through techniques such as operator fusion and refinements in memory usage, ensuring efficient execution. Another advantage of exporting to TorchScript is its potential to accelerate model execution across various hardware platforms. It creates a standalone, production-ready representation of your PyTorch model that can be integrated into C++ environments, embedded systems, or deployed in web or mobile applications.
+
+## Key Features of TorchScript Models
+
+TorchScript, a key part of the PyTorch ecosystem, provides powerful features for optimizing and deploying deep learning models.
+
+![TorchScript Features](https://github.com/ultralytics/ultralytics/assets/26833433/44c7c5e3-1146-42db-952a-9060f070fead)
+
+Here are the key features that make TorchScript a valuable tool for developers:
+
+- **Static Graph Execution**: TorchScript uses a static graph representation of the model’s computation, which is different from PyTorch’s dynamic graph execution. In static graph execution, the computational graph is defined and compiled once before the actual execution, resulting in improved performance during inference.
+
+- **Model Serialization**: TorchScript allows you to serialize PyTorch models into a platform-independent format. Serialized models can be loaded without requiring the original Python code, enabling deployment in different runtime environments.
+
+- **JIT Compilation**: TorchScript uses Just-In-Time (JIT) compilation to convert PyTorch models into an optimized intermediate representation. JIT compiles the model’s computational graph, enabling efficient execution on target devices.
+
+- **Cross-Language Integration**: With TorchScript, you can export PyTorch models to other languages such as C++, Java, and JavaScript. This makes it easier to integrate PyTorch models into existing software systems written in different languages.
+
+- **Gradual Conversion**: TorchScript provides a gradual conversion approach, allowing you to incrementally convert parts of your PyTorch model into TorchScript. This flexibility is particularly useful when dealing with complex models or when you want to optimize specific portions of the code.
+
+## Deployment Options in TorchScript
+
+Before we look at the code for exporting YOLOv8 models to the TorchScript format, let’s understand where TorchScript models are normally used.
+
+TorchScript offers various deployment options for machine learning models, such as:
+
+- **C++ API**: The most common use case for TorchScript is its C++ API, which allows you to load and execute optimized TorchScript models directly within C++ applications. This is ideal for production environments where Python may not be suitable or available. The C++ API offers low-overhead and efficient execution of TorchScript models, maximizing performance potential.
+
+- **Mobile Deployment**: TorchScript offers tools for converting models into formats readily deployable on mobile devices. PyTorch Mobile provides a runtime for executing these models within iOS and Android apps. This enables low-latency, offline inference capabilities, enhancing user experience and data privacy.
+
+- **Cloud Deployment**: TorchScript models can be deployed to cloud-based servers using solutions like TorchServe. It provides features like model versioning, batching, and metrics monitoring for scalable deployment in production environments. Cloud deployment with TorchScript can make your models accessible via APIs or other web services.
+
+## Export to TorchScript: Converting Your YOLOv8 Model
+
+Exporting YOLOv8 models to TorchScript makes it easier to use them in different places and helps them run faster and more efficiently. This is great for anyone looking to use deep learning models more effectively in real-world applications.
+
+### Installation
+
+To install the required package, run:
+
+!!! Tip "Installation"
+
+    === "CLI"
+    
+        ```bash
+        # Install the required package for YOLOv8
+        pip install ultralytics
+        ```
+
+For detailed instructions and best practices related to the installation process, check our [Ultralytics Installation guide](../quickstart.md). While installing the required packages for YOLOv8, if you encounter any difficulties, consult our [Common Issues guide](../guides/yolo-common-issues.md) for solutions and tips.
+
+### Usage
+
+Before diving into the usage instructions, it's important to note that while all [Ultralytics YOLOv8 models](../models/index.md) are available for exporting, you can ensure that the model you select supports export functionality [here](../modes/export.md).
+
+!!! Example "Usage"
+
+    === "Python"
+
+        ```python
+        from ultralytics import YOLO
+
+        # Load the YOLOv8 model
+        model = YOLO('yolov8n.pt')
+
+        # Export the model to TorchScript format
+        model.export(format='torchscript')  # creates 'yolov8n.torchscript'
+
+        # Load the exported TorchScript model
+        torchscript_model = YOLO('yolov8n.torchscript')
+
+        # Run inference
+        results = torchscript_model('https://ultralytics.com/images/bus.jpg')
+        ```
+
+    === "CLI"
+
+        ```bash
+        # Export a YOLOv8n PyTorch model to TorchScript format
+        yolo export model=yolov8n.pt format=torchscript  # creates 'yolov8n.torchscript'
+
+        # Run inference with the exported model
+        yolo predict model=yolov8n.torchscript source='https://ultralytics.com/images/bus.jpg'
+        ```
+
+For more details about the export process, visit the [Ultralytics documentation page on exporting](../modes/export.md).
+
+## Deploying Exported YOLOv8 TorchScript Models
+
+After successfully exporting your Ultralytics YOLOv8 models to TorchScript format, you can now deploy them. The primary and recommended first step for running a TorchScript model is to utilize the YOLO("model.torchscript") method, as outlined in the previous usage code snippet. However, for in-depth instructions on deploying your TorchScript models in various other settings, take a look at the following resources:
+
+- **[Explore Mobile Deployment](https://pytorch.org/mobile/home/)**: The PyTorch Mobile Documentation provides comprehensive guidelines for deploying models on mobile devices, ensuring your applications are efficient and responsive.
+
+- **[Master Server-Side Deployment](https://pytorch.org/serve/getting_started.html)**: Learn how to deploy models server-side with TorchServe, offering a step-by-step tutorial for scalable, efficient model serving.
+
+- **[Implement C++ Deployment](https://pytorch.org/tutorials/advanced/cpp_export.html)**: Dive into the Tutorial on Loading a TorchScript Model in C++, facilitating the integration of your TorchScript models into C++ applications for enhanced performance and versatility.
+
+## Summary
+
+In this guide, we explored the process of exporting Ultralytics YOLOv8 models to the TorchScript format. By following the provided instructions, you can optimize YOLOv8 models for performance and gain the flexibility to deploy them across various platforms and environments.
+
+For further details on usage, visit [TorchScript’s official documentation](https://pytorch.org/docs/stable/jit.html).
+
+Also, if you’d like to know more about other Ultralytics YOLOv8 integrations, visit our [integration guide page](../integrations/index.md). You'll find plenty of useful resources and insights there.
--- a/docs/en/models/rtdetr.md
+++ b/docs/en/models/rtdetr.md
@ -27,7 +27,7 @@ The Ultralytics Python API provides pre-trained PaddlePaddle RT-DETR models with

 ## Usage Examples

-This example provides simple RT-DETRR training and inference examples. For full documentation on these and other [modes](../modes/index.md) see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md) docs pages.
+This example provides simple RT-DETR training and inference examples. For full documentation on these and other [modes](../modes/index.md) see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md) docs pages.

 !!! Example

--- a/docs/en/models/yolo-world.md
+++ b/docs/en/models/yolo-world.md
@ -36,19 +36,29 @@ This section details the models available with their specific pre-trained weight

    All the YOLOv8-World weights have been directly migrated from the official [YOLO-World](https://github.com/AILab-CVC/YOLO-World) repository, highlighting their excellent contributions.

-| Model Type    | Pre-trained Weights                                                                                 | Tasks Supported                        | Inference | Validation | Training | Export |
-|---------------|-----------------------------------------------------------------------------------------------------|----------------------------------------|-----------|------------|----------|--------|
-| YOLOv8s-world | [yolov8s-world.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8s-world.pt) | [Object Detection](../tasks/detect.md) | ✅         | ✅          | ❌        | ❌      |
-| YOLOv8m-world | [yolov8m-world.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8m-world.pt) | [Object Detection](../tasks/detect.md) | ✅         | ✅          | ❌        | ❌      |
-| YOLOv8l-world | [yolov8l-world.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l-world.pt) | [Object Detection](../tasks/detect.md) | ✅         | ✅          | ❌        | ❌      |
+| Model Type      | Pre-trained Weights                                                                                   | Tasks Supported                        | Inference | Validation | Training | Export |
+|-----------------|-------------------------------------------------------------------------------------------------------|----------------------------------------|-----------|------------|----------|--------|
+| YOLOv8s-world   | [yolov8s-world.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8s-world.pt)   | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ❌       | ❌     |
+| YOLOv8s-worldv2 | [yolov8s-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8s-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ❌       | ✅     |
+| YOLOv8m-world   | [yolov8m-world.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8m-world.pt)   | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ❌       | ❌     |
+| YOLOv8m-worldv2 | [yolov8m-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8m-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ❌       | ✅     |
+| YOLOv8l-world   | [yolov8l-world.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l-world.pt)   | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ❌       | ❌     |
+| YOLOv8l-worldv2 | [yolov8l-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ❌       | ✅     |
+| YOLOv8x-world   | [yolov8x-world.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8x-world.pt)   | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ❌       | ❌     |
+| YOLOv8x-worldv2 | [yolov8x-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8x-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ❌       | ✅     |

 ## Zero-shot Transfer on COCO Dataset

-| Model Type    | mAP  | mAP50 | mAP75 |
-|---------------|------|-------|-------|
-| yolov8s-world | 37.4 | 52.0  | 40.6  |
-| yolov8m-world | 42.0 | 57.0  | 45.6  |
-| yolov8l-world | 45.7 | 61.3  | 49.8  |
+| Model Type      | mAP  | mAP50 | mAP75 |
+|-----------------|------|-------|-------|
+| yolov8s-world   | 37.4 | 52.0  | 40.6  |
+| yolov8s-worldv2 | 37.7 | 52.2  | 41.0  |
+| yolov8m-world   | 42.0 | 57.0  | 45.6  |
+| yolov8m-worldv2 | 43.0 | 58.4  | 46.8  |
+| yolov8l-world   | 45.7 | 61.3  | 49.8  |
+| yolov8l-worldv2 | 45.8 | 61.3  | 49.8  |
+| yolov8x-world   | 47.0 | 63.0  | 51.2  |
+| yolov8x-worldv2 | 47.1 | 62.8  | 51.4  |

 ## Usage Examples

--- a/docs/en/models/yolov9.md
+++ b/docs/en/models/yolov9.md
@ -58,12 +58,12 @@ The performance of YOLOv9 on the [COCO dataset](../datasets/detect/coco.md) exem

 **Table 1. Comparison of State-of-the-Art Real-Time Object Detectors**

-| Model    | Parameters (M) | FLOPs (G) | APval 50:95 (%) | APval 50 (%) | APval 75 (%) | APval S (%) | APval M (%) | APval L (%) |
-|----------|----------------|-----------|-----------------|--------------|--------------|-------------|-------------|-------------|
-| YOLOv9-S | 7.2            | 26.7      | 46.8            | 63.4         | 50.7         | 26.6        | 56.0        | 64.5        |
-| YOLOv9-M | 20.1           | 76.8      | 51.4            | 68.1         | 56.1         | 33.6        | 57.0        | 68.0        |
-| YOLOv9-C | 25.5           | 102.8     | 53.0            | 70.2         | 57.8         | 36.2        | 58.5        | 69.3        |
-| YOLOv9-E | 58.1           | 192.5     | 55.6            | 72.8         | 60.6         | 40.2        | 61.0        | 71.4        |
+| Model    | size<br><sup>(pixels) | AP<sup>val<br>50-95 | AP<sup>val<br>50 | AP<sup>val<br>75 | params<br><sup>(M) | FLOPs<br><sup>(B) |
+|----------|-----------------------|---------------------|------------------|------------------|--------------------|-------------------|
+| YOLOv9-S | 640                   | 46.8                | 63.4             | 50.7             | 7.2                | 26.7              |
+| YOLOv9-M | 640                   | 51.4                | 68.1             | 56.1             | 20.1               | 76.8              |
+| YOLOv9-C | 640                   | 53.0                | 70.2             | 57.8             | 25.5               | 102.8             |
+| YOLOv9-E | 640                   | 55.6                | 72.8             | 60.6             | 58.1               | 192.5             |

 YOLOv9's iterations, ranging from the smaller S variant to the extensive E model, demonstrate improvements not only in accuracy (AP metrics) but also in efficiency with a reduced number of parameters and computational needs (FLOPs). This table underscores YOLOv9's ability to deliver high precision while maintaining or reducing the computational overhead compared to prior versions and competing models.

@ -72,19 +72,66 @@ Comparatively, YOLOv9 exhibits remarkable gains:
 - **Lightweight Models**: YOLOv9-S surpasses the YOLO MS-S in parameter efficiency and computational load while achieving an improvement of 0.4∼0.6% in AP.
 - **Medium to Large Models**: YOLOv9-M and YOLOv9-E show notable advancements in balancing the trade-off between model complexity and detection performance, offering significant reductions in parameters and computations against the backdrop of improved accuracy.

-The YOLOv9-C model, in particular, highlights the effectiveness of the architecture's optimizations. It operates with 42% fewer parameters and 21% less computational demand than YOLOv7 AF, yet it achieves comparable accuracy, demonstrating YOLOv9's significant efficiency improvements. Furthermore, the YOLOv9-E model sets a new standard for large models, with 15% fewer parameters and 25% less computational need than YOLOv8-X, alongside a substantial 1.7% improvement in AP.
+The YOLOv9-C model, in particular, highlights the effectiveness of the architecture's optimizations. It operates with 42% fewer parameters and 21% less computational demand than YOLOv7 AF, yet it achieves comparable accuracy, demonstrating YOLOv9's significant efficiency improvements. Furthermore, the YOLOv9-E model sets a new standard for large models, with 15% fewer parameters and 25% less computational need than [YOLOv8x](yolov8.md), alongside a substantial 1.7% improvement in AP.

 These results showcase YOLOv9's strategic advancements in model design, emphasizing its enhanced efficiency without compromising on the precision essential for real-time object detection tasks. The model not only pushes the boundaries of performance metrics but also emphasizes the importance of computational efficiency, making it a pivotal development in the field of computer vision.

-## Integration and Future Directions
-
-YOLOv9 embodies the spirit of open-source collaboration that is central to the advancement of AI technology. With plans for future integration into the Ultralytics package, YOLOv9 is poised to become an accessible tool for researchers and practitioners alike, further enhancing its impact on the field of computer vision.
-
 ## Conclusion

 YOLOv9 represents a pivotal development in real-time object detection, offering significant improvements in terms of efficiency, accuracy, and adaptability. By addressing critical challenges through innovative solutions like PGI and GELAN, YOLOv9 sets a new precedent for future research and application in the field. As the AI community continues to evolve, YOLOv9 stands as a testament to the power of collaboration and innovation in driving technological progress.

-Stay tuned for updates on Ultralytics package integration and explore the possibilities that YOLOv9 brings to the realm of computer vision.
+
+## Usage Examples
+
+This example provides simple YOLOv9 training and inference examples. For full documentation on these and other [modes](../modes/index.md) see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md) docs pages.
+
+!!! Example
+
+    === "Python"
+
+        PyTorch pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLO()` class to create a model instance in python:
+
+        ```python
+        from ultralytics import YOLO
+
+        # Build a YOLOv9c model from scratch
+        model = YOLO('yolov9c.yaml')
+
+        # Build a YOLOv9c model from pretrained weight
+        model = YOLO('yolov9c.pt')
+
+        # Display model information (optional)
+        model.info()
+
+        # Train the model on the COCO8 example dataset for 100 epochs
+        results = model.train(data='coco8.yaml', epochs=100, imgsz=640)
+
+        # Run inference with the YOLOv9c model on the 'bus.jpg' image
+        results = model('path/to/bus.jpg')
+        ```
+
+    === "CLI"
+
+        CLI commands are available to directly run the models:
+
+        ```bash
+        # Build a YOLOv9c model from scratch and train it on the COCO8 example dataset for 100 epochs
+        yolo train model=yolov9c.yaml data=coco8.yaml epochs=100 imgsz=640
+
+        # Build a YOLOv9c model from scratch and run inference on the 'bus.jpg' image
+        yolo predict model=yolov9c.yaml source=path/to/bus.jpg
+        ```
+
+## Supported Tasks and Modes
+
+The YOLOv9 series offers a range of models, each optimized for high-performance [Object Detection](../tasks/detect.md). These models cater to varying computational needs and accuracy requirements, making them versatile for a wide array of applications.
+
+| Model Type | Pre-trained Weights                                                                     | Tasks Supported                        | Inference | Validation | Training | Export |
+|------------|-----------------------------------------------------------------------------------------|----------------------------------------|-----------|------------|----------|--------|
+| YOLOv9-C   | [yolov9c.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov9c.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+| YOLOv9-E   | [yolov9e.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov9e.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+
+This table provides a detailed overview of the YOLOv9 model variants, highlighting their capabilities in object detection tasks and their compatibility with various operational modes such as [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md). This comprehensive support ensures that users can fully leverage the capabilities of YOLOv9 models in a broad range of object detection scenarios.

 ## Citations and Acknowledgements

--- a/docs/en/modes/benchmark.md
+++ b/docs/en/modes/benchmark.md
@ -101,6 +101,6 @@ Benchmarks will attempt to run automatically on all possible export formats belo
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n_web_model/`      | ✅        | `imgsz`, `half`, `int8`                             |
 | [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n_paddle_model/`   | ✅        | `imgsz`                                             |
-| [ncnn](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |

 See full `export` details in the [Export](https://docs.ultralytics.com/modes/export/) page.
--- a/docs/en/modes/export.md
+++ b/docs/en/modes/export.md
@ -108,4 +108,4 @@ Available YOLOv8 export formats are in the table below. You can export to any fo
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n_web_model/`      | ✅        | `imgsz`, `half`, `int8`                             |
 | [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n_paddle_model/`   | ✅        | `imgsz`                                             |
-| [ncnn](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |
--- a/docs/en/modes/predict.md
+++ b/docs/en/modes/predict.md
@ -683,7 +683,7 @@ The `plot()` method in `Results` objects facilitates visualization of prediction
    for i, r in enumerate(results):
        # Plot results image
        im_bgr = r.plot()  # BGR-order numpy array
-        im_rgb = Image.fromarray(im_array[..., ::-1])  # RGB-order PIL image
+        im_rgb = Image.fromarray(im_bgr[..., ::-1])  # RGB-order PIL image
        
        # Show results to screen (in supported environments)
        r.show()
--- a/docs/en/modes/val.md
+++ b/docs/en/modes/val.md
@ -91,7 +91,7 @@ When validating YOLO models, several arguments can be fine-tuned to optimize the
 | `max_det`     | `int`   | `300`   | Limits the maximum number of detections per image. Useful in dense scenes to prevent excessive detections.                                                    |
 | `half`        | `bool`  | `True`  | Enables half-precision (FP16) computation, reducing memory usage and potentially increasing speed with minimal impact on accuracy.                            |
 | `device`      | `str`   | `None`  | Specifies the device for validation (`cpu`, `cuda:0`, etc.). Allows flexibility in utilizing CPU or GPU resources.                                            |
-| `dnn`         | `bool`  | `False` | If `True`, uses OpenCV's DNN module for ONNX model inference, offering an alternative to PyTorch inference methods.                                           |
+| `dnn`         | `bool`  | `False` | If `True`, uses the OpenCV DNN module for ONNX model inference, offering an alternative to PyTorch inference methods.                                         |
 | `plots`       | `bool`  | `False` | When set to `True`, generates and saves plots of predictions versus ground truth for visual evaluation of the model's performance.                            |
 | `rect`        | `bool`  | `False` | If `True`, uses rectangular inference for batching, reducing padding and potentially increasing speed and efficiency.                                         |
 | `split`       | `str`   | `val`   | Determines the dataset split to use for validation (`val`, `test`, or `train`). Allows flexibility in choosing the data segment for performance evaluation.   |
--- a/docs/en/reference/models/yolo/model.md
+++ b/docs/en/reference/models/yolo/model.md
@ -14,3 +14,7 @@ keywords: Ultralytics YOLO, YOLO, YOLO model, Model Training, Machine Learning,
 ## ::: ultralytics.models.yolo.model.YOLO

 <br><br>
+
+## ::: ultralytics.models.yolo.model.YOLOWorld
+
+<br><br>
--- a/docs/en/reference/nn/modules/block.md
+++ b/docs/en/reference/nn/modules/block.md
@ -86,3 +86,55 @@ keywords: YOLO, Ultralytics, neural network, nn.modules.block, Proto, HGBlock, S
 ## ::: ultralytics.nn.modules.block.ResNetLayer

 <br><br>
+
+## ::: ultralytics.nn.modules.block.MaxSigmoidAttnBlock
+
+<br><br>
+
+## ::: ultralytics.nn.modules.block.C2fAttn
+
+<br><br>
+
+## ::: ultralytics.nn.modules.block.ImagePoolingAttn
+
+<br><br>
+
+## ::: ultralytics.nn.modules.block.ContrastiveHead
+
+<br><br>
+
+## ::: ultralytics.nn.modules.block.BNContrastiveHead
+
+<br><br>
+
+## ::: ultralytics.nn.modules.block.RepBottleneck
+
+<br><br>
+
+## ::: ultralytics.nn.modules.block.RepCSP
+
+<br><br>
+
+## ::: ultralytics.nn.modules.block.RepNCSPELAN4
+
+<br><br>
+
+## ::: ultralytics.nn.modules.block.ADown
+
+<br><br>
+
+## ::: ultralytics.nn.modules.block.SPPELAN
+
+<br><br>
+
+## ::: ultralytics.nn.modules.block.Silence
+
+<br><br>
+
+## ::: ultralytics.nn.modules.block.CBLinear
+
+<br><br>
+
+## ::: ultralytics.nn.modules.block.CBFuse
+
+<br><br>
--- a/docs/en/reference/nn/modules/head.md
+++ b/docs/en/reference/nn/modules/head.md
@ -31,6 +31,10 @@ keywords: Ultralytics, YOLO, Detection, Pose, RTDETRDecoder, nn modules, guides

 <br><br>

+## ::: ultralytics.nn.modules.head.WorldDetect
+
+<br><br>
+
 ## ::: ultralytics.nn.modules.head.RTDETRDecoder

 <br><br>
--- a/docs/en/reference/nn/tasks.md
+++ b/docs/en/reference/nn/tasks.md
@ -39,6 +39,10 @@ keywords: Ultralytics, YOLO, nn tasks, DetectionModel, PoseModel, RTDETRDetectio

 <br><br>

+## ::: ultralytics.nn.tasks.WorldModel
+
+<br><br>
+
 ## ::: ultralytics.nn.tasks.Ensemble

 <br><br>
--- a/docs/en/reference/utils/metrics.md
+++ b/docs/en/reference/utils/metrics.md
@ -1,6 +1,6 @@
 ---
-description: Explore Ultralytics YOLO metrics tools - from confusion matrix, detection metrics, pose metrics to box IOU. Learn how to compute and plot precision-recall curves.
-keywords: Ultralytics, YOLO, YOLOv3, YOLOv4, metrics, confusion matrix, detection metrics, pose metrics, box IOU, mask IOU, plot precision-recall curves, compute average precision
+description: Explore Ultralytics YOLO metrics tools - from confusion matrix, detection metrics, pose metrics to box IoU. Learn how to compute and plot precision-recall curves.
+keywords: Ultralytics, YOLO, YOLOv3, YOLOv4, metrics, confusion matrix, detection metrics, pose metrics, box IoU, mask IoU, plot precision-recall curves, compute average precision
 ---

 # Reference for `ultralytics/utils/metrics.py`
--- a/docs/en/tasks/classify.md
+++ b/docs/en/tasks/classify.md
@ -176,6 +176,6 @@ Available YOLOv8-cls export formats are in the table below. You can predict or v
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n-cls_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n-cls_web_model/`      | ✅        | `imgsz`, `half`, `int8`                             |
 | [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n-cls_paddle_model/`   | ✅        | `imgsz`                                             |
-| [ncnn](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-cls_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-cls_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |

 See full `export` details in the [Export](https://docs.ultralytics.com/modes/export/) page.
--- a/docs/en/tasks/detect.md
+++ b/docs/en/tasks/detect.md
@ -177,6 +177,6 @@ Available YOLOv8 export formats are in the table below. You can predict or valid
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n_web_model/`      | ✅        | `imgsz`, `half`, `int8`                             |
 | [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n_paddle_model/`   | ✅        | `imgsz`                                             |
-| [ncnn](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |

 See full `export` details in the [Export](https://docs.ultralytics.com/modes/export/) page.
--- a/docs/en/tasks/obb.md
+++ b/docs/en/tasks/obb.md
@ -49,7 +49,7 @@ YOLOv8 pretrained OBB models are shown here, which are pretrained on the [DOTAv1
 | [YOLOv8l-obb](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l-obb.pt) | 1024                  | 80.7               | 1278.42                        | 11.83                               | 44.5               | 433.8             |
 | [YOLOv8x-obb](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8x-obb.pt) | 1024                  | 81.36              | 1759.10                        | 13.23                               | 69.5               | 676.7             |

- **mAP<sup>test</sup>** values are for single-model multi-scale on [DOTAv1 test](https://captain-whu.github.io/DOTA/index.html) dataset. <br>Reproduce by `yolo val obb data=DOTAv1.yaml device=0 split=test` and submit merged results to [DOTA evaluation](https://captain-whu.github.io/DOTA/evaluation.html).
+- **mAP<sup>test</sup>** values are for single-model multiscale on [DOTAv1 test](https://captain-whu.github.io/DOTA/index.html) dataset. <br>Reproduce by `yolo val obb data=DOTAv1.yaml device=0 split=test` and submit merged results to [DOTA evaluation](https://captain-whu.github.io/DOTA/evaluation.html).
 - **Speed** averaged over DOTAv1 val images using an [Amazon EC2 P4d](https://aws.amazon.com/ec2/instance-types/p4/) instance. <br>Reproduce by `yolo val obb data=DOTAv1.yaml batch=1 device=0|cpu`

 ## Train
@ -186,6 +186,6 @@ Available YOLOv8-obb export formats are in the table below. You can predict or v
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n-obb_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n-obb_web_model/`      | ✅        | `imgsz`, `half`, `int8`                             |
 | [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n-obb_paddle_model/`   | ✅        | `imgsz`                                             |
-| [ncnn](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-obb_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-obb_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |

 See full `export` details in the [Export](https://docs.ultralytics.com/modes/export/) page.
--- a/docs/en/tasks/pose.md
+++ b/docs/en/tasks/pose.md
@ -180,6 +180,6 @@ Available YOLOv8-pose export formats are in the table below. You can predict or
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n-pose_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n-pose_web_model/`      | ✅        | `imgsz`, `half`, `int8`                             |
 | [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n-pose_paddle_model/`   | ✅        | `imgsz`                                             |
-| [ncnn](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-pose_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-pose_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |

 See full `export` details in the [Export](https://docs.ultralytics.com/modes/export/) page.
--- a/docs/en/tasks/segment.md
+++ b/docs/en/tasks/segment.md
@ -182,6 +182,6 @@ Available YOLOv8-seg export formats are in the table below. You can predict or v
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n-seg_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n-seg_web_model/`      | ✅        | `imgsz`, `half`, `int8`                             |
 | [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n-seg_paddle_model/`   | ✅        | `imgsz`                                             |
-| [ncnn](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-seg_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-seg_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |

 See full `export` details in the [Export](https://docs.ultralytics.com/modes/export/) page.
--- a/docs/en/usage/cfg.md
+++ b/docs/en/usage/cfg.md
@ -194,7 +194,7 @@ The val (validation) settings for YOLO models involve various hyperparameters an
 | `max_det`     | `int`   | `300`   | Limits the maximum number of detections per image. Useful in dense scenes to prevent excessive detections.                                                    |
 | `half`        | `bool`  | `True`  | Enables half-precision (FP16) computation, reducing memory usage and potentially increasing speed with minimal impact on accuracy.                            |
 | `device`      | `str`   | `None`  | Specifies the device for validation (`cpu`, `cuda:0`, etc.). Allows flexibility in utilizing CPU or GPU resources.                                            |
-| `dnn`         | `bool`  | `False` | If `True`, uses OpenCV's DNN module for ONNX model inference, offering an alternative to PyTorch inference methods.                                           |
+| `dnn`         | `bool`  | `False` | If `True`, uses the OpenCV DNN module for ONNX model inference, offering an alternative to PyTorch inference methods.                                         |
 | `plots`       | `bool`  | `False` | When set to `True`, generates and saves plots of predictions versus ground truth for visual evaluation of the model's performance.                            |
 | `rect`        | `bool`  | `False` | If `True`, uses rectangular inference for batching, reducing padding and potentially increasing speed and efficiency.                                         |
 | `split`       | `str`   | `val`   | Determines the dataset split to use for validation (`val`, `test`, or `train`). Allows flexibility in choosing the data segment for performance evaluation.   |
--- a/docs/en/usage/cli.md
+++ b/docs/en/usage/cli.md
@ -184,7 +184,7 @@ Available YOLOv8 export formats are in the table below. You can export to any fo
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n_web_model/`      | ✅        | `imgsz`, `half`, `int8`                             |
 | [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n_paddle_model/`   | ✅        | `imgsz`                                             |
-| [ncnn](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |

 ## Overriding default arguments

--- a/docs/en/usage/python.md
+++ b/docs/en/usage/python.md
@ -249,7 +249,7 @@ Benchmark mode is used to profile the speed and accuracy of various export forma

 ## Explorer

-Explorer API can be used to explore datasets with advanced semantic, vector-similarity and SQL search among other features. It also searching for images based on their content using natural language by utilizing the power of LLMs. The Explorer API allows you to write your own dataset exploration notebooks or scripts to get insights into your datasets.
+Explorer API can be used to explore datasets with advanced semantic, vector-similarity and SQL search among other features. It also enabled searching for images based on their content using natural language by utilizing the power of LLMs. The Explorer API allows you to write your own dataset exploration notebooks or scripts to get insights into your datasets.

 !!! Example "Semantic Search Using Explorer"

--- a/docs/en/usage/simple-utilities.md
+++ b/docs/en/usage/simple-utilities.md
@ -20,7 +20,7 @@ The `ultralytics` package comes with a myriad of utilities that can support, enh

 ### Auto Labeling / Annotations

-Dataset annotation is an _extremely_ resource heavy and time consuming process. If you have a YOLO object detection model trained on a reasonable amount of data, you can use it and [SAM](../models/sam.md) to auto-annotate additional data (segmentation format).
+Dataset annotation is a very resource intensive and time-consuming process. If you have a YOLO object detection model trained on a reasonable amount of data, you can use it and [SAM](../models/sam.md) to auto-annotate additional data (segmentation format).

 ```{ .py .annotate }
 from ultralytics.data.annotator import auto_annotate
@ -211,7 +211,7 @@ boxes.bboxes
 See the [`Bboxes` reference section](../reference/utils/instance.md#ultralytics.utils.instance.Bboxes) for more attributes and methods available.

 !!! tip
-    Many of the following functions (and more) can be accessed using the [`Bboxes` class](#bounding-box-horizontal-instances) but if you prefer to work with the functions directly, see the next sub-sections on how to import these independently. 
+    Many of the following functions (and more) can be accessed using the [`Bboxes` class](#bounding-box-horizontal-instances) but if you prefer to work with the functions directly, see the next subsections on how to import these independently. 

 ### Scaling Boxes

@ -385,7 +385,7 @@ for obb in obb_boxes:
 image_with_obb = ann.result()
 ```

-See the [`Annotator` Reference Page](../reference/utils/plotting.md#ultralytics.utils.plotting.Annotator) page for additional insight.
+See the [`Annotator` Reference Page](../reference/utils/plotting.md#ultralytics.utils.plotting.Annotator) for additional insight.

 ## Miscellaneous 

--- a/examples/tutorial.ipynb
+++ b/examples/tutorial.ipynb
@ -357,7 +357,7 @@
        "| [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n_edgetpu.tflite`  | ✅        | `imgsz`                                             |\n",
        "| [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n_web_model/`      | ✅        | `imgsz`                                             |\n",
        "| [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n_paddle_model/`   | ✅        | `imgsz`                                             |\n",
-        "| [ncnn](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |\n"
+        "| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |\n"
      ],
      "metadata": {
        "id": "nPZZeNrLCQG6"
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -339,11 +339,14 @@ nav:
              - Clearml Logging: yolov5/tutorials/clearml_logging_integration.md
  - Integrations:
      - integrations/index.md
-      - Comet ML: integrations/comet.md
-      - OpenVINO: integrations/openvino.md
+      - TorchScript: integrations/torchscript.md
      - ONNX: integrations/onnx.md
+      - OpenVINO: integrations/openvino.md
      - TensorRT: integrations/tensorrt.md
      - CoreML: integrations/coreml.md
+      - TFLite: integrations/tflite.md
+      - NCNN: integrations/ncnn.md
+      - Comet ML: integrations/comet.md
      - Ray Tune: integrations/ray-tune.md
      - Roboflow: integrations/roboflow.md
      - MLflow: integrations/mlflow.md
--- a/ultralytics/init.py
+++ b/ultralytics/init.py
@ -1,6 +1,6 @@
 # Ultralytics YOLO 🚀, AGPL-3.0 license

-__version__ = "8.1.19"
+__version__ = "8.1.23"

 from ultralytics.data.explorer.explorer import Explorer
 from ultralytics.models import RTDETR, SAM, YOLO, YOLOWorld
--- a/ultralytics/cfg/models/v8/yolov8-world.yaml
+++ b/ultralytics/cfg/models/v8/yolov8-world.yaml
@ -1,5 +1,5 @@
 # Ultralytics YOLO 🚀, AGPL-3.0 license
-# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
+# YOLOv8-World object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/tasks/detect

 # Parameters
 nc: 80 # number of classes
--- a/ultralytics/cfg/models/v8/yolov8-world-t2i.yaml
+++ b/ultralytics/cfg/models/v8/yolov8-world-t2i.yaml
@ -1,5 +1,5 @@
 # Ultralytics YOLO 🚀, AGPL-3.0 license
-# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
+# YOLOv8-World-v2 object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/tasks/detect

 # Parameters
 nc: 80 # number of classes
@ -29,18 +29,18 @@ backbone:
 head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
-  - [-1, 2, C2fAttn, [512, 256, 8]] # 12
+  - [-1, 3, C2fAttn, [512, 256, 8]] # 12

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
-  - [-1, 2, C2fAttn, [256, 128, 4]] # 15 (P3/8-small)
+  - [-1, 3, C2fAttn, [256, 128, 4]] # 15 (P3/8-small)

  - [15, 1, Conv, [256, 3, 2]]
  - [[-1, 12], 1, Concat, [1]] # cat head P4
-  - [-1, 2, C2fAttn, [512, 256, 8]] # 18 (P4/16-medium)
+  - [-1, 3, C2fAttn, [512, 256, 8]] # 18 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 9], 1, Concat, [1]] # cat head P5
-  - [-1, 2, C2fAttn, [1024, 512, 16]] # 21 (P5/32-large)
+  - [-1, 3, C2fAttn, [1024, 512, 16]] # 21 (P5/32-large)

  - [[15, 18, 21], 1, WorldDetect, [nc, 512, True]] # Detect(P3, P4, P5)
--- a/ultralytics/cfg/models/v9/yolov9-c.yaml
+++ b/ultralytics/cfg/models/v9/yolov9-c.yaml
--- a/ultralytics/cfg/models/v9/yolov9-e.yaml
+++ b/ultralytics/cfg/models/v9/yolov9-e.yaml
--- a/ultralytics/data/explorer/gui/dash.py
+++ b/ultralytics/data/explorer/gui/dash.py
@ -144,9 +144,9 @@ def run_sql_query():
 def run_ai_query():
    """Execute SQL query and update session state with query results."""
    if not SETTINGS["openai_api_key"]:
-        st.session_state[
-            "error"
-        ] = 'OpenAI API key not found in settings. Please run yolo settings openai_api_key="..."'
+        st.session_state["error"] = (
+            'OpenAI API key not found in settings. Please run yolo settings openai_api_key="..."'
+        )
        return
    st.session_state["error"] = None
    query = st.session_state.get("ai_query")
--- a/ultralytics/engine/exporter.py
+++ b/ultralytics/engine/exporter.py
@ -16,7 +16,7 @@ TensorFlow Lite         | `tflite`                  | yolov8n.tflite
 TensorFlow Edge TPU     | `edgetpu`                 | yolov8n_edgetpu.tflite
 TensorFlow.js           | `tfjs`                    | yolov8n_web_model/
 PaddlePaddle            | `paddle`                  | yolov8n_paddle_model/
-ncnn                    | `ncnn`                    | yolov8n_ncnn_model/
+NCNN                    | `ncnn`                    | yolov8n_ncnn_model/

 Requirements:
    $ pip install "ultralytics[export]"
@ -41,6 +41,7 @@ Inference:
                         yolov8n.tflite             # TensorFlow Lite
                         yolov8n_edgetpu.tflite     # TensorFlow Edge TPU
                         yolov8n_paddle_model       # PaddlePaddle
+                         yolov8n_ncnn_model         # NCNN

 TensorFlow.js:
    $ cd .. && git clone https://github.com/zldrobit/tfjs-yolov5-example.git && cd tfjs-yolov5-example
@ -48,6 +49,7 @@ TensorFlow.js:
    $ ln -s ../../yolov5/yolov8n_web_model public/yolov8n_web_model
    $ npm start
 """
+
 import json
 import os
 import shutil
@ -66,7 +68,7 @@ from ultralytics.data.dataset import YOLODataset
 from ultralytics.data.utils import check_det_dataset
 from ultralytics.nn.autobackend import check_class_names, default_class_names
 from ultralytics.nn.modules import C2f, Detect, RTDETRDecoder
-from ultralytics.nn.tasks import DetectionModel, SegmentationModel
+from ultralytics.nn.tasks import DetectionModel, SegmentationModel, WorldModel
 from ultralytics.utils import (
    ARM64,
    DEFAULT_CFG,
@ -105,7 +107,7 @@ def export_formats():
        ["TensorFlow Edge TPU", "edgetpu", "_edgetpu.tflite", True, False],
        ["TensorFlow.js", "tfjs", "_web_model", True, False],
        ["PaddlePaddle", "paddle", "_paddle_model", True, True],
-        ["ncnn", "ncnn", "_ncnn_model", True, True],
+        ["NCNN", "ncnn", "_ncnn_model", True, True],
    ]
    return pandas.DataFrame(x, columns=["Format", "Argument", "Suffix", "CPU", "GPU"])

@ -199,6 +201,13 @@ class Exporter:
            assert self.device.type == "cpu", "optimize=True not compatible with cuda devices, i.e. use device='cpu'"
        if edgetpu and not LINUX:
            raise SystemError("Edge TPU export only supported on Linux. See https://coral.ai/docs/edgetpu/compiler/")
+        if isinstance(model, WorldModel):
+            LOGGER.warning(
+                "WARNING ⚠️ YOLOWorld (original version) export is not supported to any format.\n"
+                "WARNING ⚠️ YOLOWorldv2 models (i.e. 'yolov8s-worldv2.pt') only support export to "
+                "(torchscript, onnx, openvino, engine, coreml) formats. "
+                "See https://docs.ultralytics.com/models/yolo-world for details."
+            )

        # Input
        im = torch.zeros(self.args.batch, 3, *self.imgsz).to(self.device)
@ -250,9 +259,10 @@ class Exporter:
        self.metadata = {
            "description": description,
            "author": "Ultralytics",
-            "license": "AGPL-3.0 https://ultralytics.com/license",
            "date": datetime.now().isoformat(),
            "version": __version__,
+            "license": "AGPL-3.0 License (https://ultralytics.com/license)",
+            "docs": "https://docs.ultralytics.com",
            "stride": int(max(model.stride)),
            "task": model.task,
            "batch": self.args.batch,
@ -292,7 +302,7 @@ class Exporter:
                f[9], _ = self.export_tfjs()
        if paddle:  # PaddlePaddle
            f[10], _ = self.export_paddle()
-        if ncnn:  # ncnn
+        if ncnn:  # NCNN
            f[11], _ = self.export_ncnn()

        # Finish
@ -343,8 +353,8 @@ class Exporter:
        requirements = ["onnx>=1.12.0"]
        if self.args.simplify:
            requirements += ["onnxsim>=0.4.33", "onnxruntime-gpu" if torch.cuda.is_available() else "onnxruntime"]
-        if ARM64:
-            check_requirements("cmake")  # 'cmake' is needed to build onnxsim on aarch64
+            if ARM64:
+                check_requirements("cmake")  # 'cmake' is needed to build onnxsim on aarch64
        check_requirements(requirements)
        import onnx  # noqa

@ -495,14 +505,14 @@ class Exporter:
        return f, None

    @try_export
-    def export_ncnn(self, prefix=colorstr("ncnn:")):
+    def export_ncnn(self, prefix=colorstr("NCNN:")):
        """
-        YOLOv8 ncnn export using PNNX https://github.com/pnnx/pnnx.
+        YOLOv8 NCNN export using PNNX https://github.com/pnnx/pnnx.
        """
        check_requirements("ncnn")
        import ncnn  # noqa

-        LOGGER.info(f"\n{prefix} starting export with ncnn {ncnn.__version__}...")
+        LOGGER.info(f"\n{prefix} starting export with NCNN {ncnn.__version__}...")
        f = Path(str(self.file).replace(self.file.suffix, f"_ncnn_model{os.sep}"))
        f_ts = self.file.with_suffix(".torchscript")

@ -742,10 +752,10 @@ class Exporter:
            verbose=True,
            msg="https://github.com/ultralytics/ultralytics/issues/5161",
        )
+        import onnx2tf
+
        f = Path(str(self.file).replace(self.file.suffix, "_saved_model"))
        if f.is_dir():
-            import shutil
-
            shutil.rmtree(f)  # delete output folder

        # Pre-download calibration file to fix https://github.com/PINTO0309/onnx2tf/issues/545
@ -759,8 +769,9 @@ class Exporter:

        # Export to TF
        tmp_file = f / "tmp_tflite_int8_calibration_images.npy"  # int8 calibration images file
+        np_data = None
        if self.args.int8:
-            verbosity = "--verbosity info"
+            verbosity = "info"
            if self.args.data:
                # Generate calibration data for integer quantization
                LOGGER.info(f"{prefix} collecting INT8 calibration images from 'data={self.args.data}'")
@ -777,16 +788,20 @@ class Exporter:
                # mean = images.view(-1, 3).mean(0)  # imagenet mean [123.675, 116.28, 103.53]
                # std = images.view(-1, 3).std(0)  # imagenet std [58.395, 57.12, 57.375]
                np.save(str(tmp_file), images.numpy())  # BHWC
-                int8 = f'-oiqt -qt per-tensor -cind images "{tmp_file}" "[[[[0, 0, 0]]]]" "[[[[255, 255, 255]]]]"'
-            else:
-                int8 = "-oiqt -qt per-tensor"
+                np_data = [["images", tmp_file, [[[[0, 0, 0]]]], [[[[255, 255, 255]]]]]]
        else:
-            verbosity = "--non_verbose"
-            int8 = ""
-
-        cmd = f'onnx2tf -i "{f_onnx}" -o "{f}" -nuo {verbosity} {int8}'.strip()
-        LOGGER.info(f"{prefix} running '{cmd}'")
-        subprocess.run(cmd, shell=True)
+            verbosity = "error"
+
+        LOGGER.info(f"{prefix} starting TFLite export with onnx2tf {onnx2tf.__version__}...")
+        onnx2tf.convert(
+            input_onnx_file_path=f_onnx,
+            output_folder_path=str(f),
+            not_use_onnxsim=True,
+            verbosity=verbosity,
+            output_integer_quantized_tflite=self.args.int8,
+            quant_type="per-tensor",  # "per-tensor" (faster) or "per-channel" (slower but more accurate)
+            custom_input_op_name_np_data_path=np_data,
+        )
        yaml_save(f / "metadata.yaml", self.metadata)  # add metadata.yaml

        # Remove/rename TFLite models
@ -883,7 +898,10 @@ class Exporter:

        quantization = "--quantize_float16" if self.args.half else "--quantize_uint8" if self.args.int8 else ""
        with spaces_in_path(f_pb) as fpb_, spaces_in_path(f) as f_:  # exporter can not handle spaces in path
-            cmd = f'tensorflowjs_converter --input_format=tf_frozen_model {quantization} --output_node_names={outputs} "{fpb_}" "{f_}"'
+            cmd = (
+                "tensorflowjs_converter "
+                f'--input_format=tf_frozen_model {quantization} --output_node_names={outputs} "{fpb_}" "{f_}"'
+            )
            LOGGER.info(f"{prefix} running '{cmd}'")
            subprocess.run(cmd, shell=True)

@ -1078,10 +1096,10 @@ class Exporter:
        # Save the model
        model = ct.models.MLModel(pipeline.spec, weights_dir=weights_dir)
        model.input_description["image"] = "Input image"
-        model.input_description["iouThreshold"] = f"(optional) IOU threshold override (default: {nms.iouThreshold})"
-        model.input_description[
-            "confidenceThreshold"
-        ] = f"(optional) Confidence threshold override (default: {nms.confidenceThreshold})"
+        model.input_description["iouThreshold"] = f"(optional) IoU threshold override (default: {nms.iouThreshold})"
+        model.input_description["confidenceThreshold"] = (
+            f"(optional) Confidence threshold override (default: {nms.confidenceThreshold})"
+        )
        model.output_description["confidence"] = 'Boxes × Class confidence (see user-defined metadata "classes")'
        model.output_description["coordinates"] = "Boxes × [x, y, width, height] (relative to image size)"
        LOGGER.info(f"{prefix} pipeline success")
--- a/ultralytics/engine/model.py
+++ b/ultralytics/engine/model.py
@ -295,7 +295,7 @@ class Model(nn.Module):
        self.model.load(weights)
        return self

-    def save(self, filename: Union[str, Path] = "saved_model.pt") -> None:
+    def save(self, filename: Union[str, Path] = "saved_model.pt", use_dill=True) -> None:
        """
        Saves the current model state to a file.

@ -303,12 +303,22 @@ class Model(nn.Module):

        Args:
            filename (str | Path): The name of the file to save the model to. Defaults to 'saved_model.pt'.
+            use_dill (bool): Whether to try using dill for serialization if available. Defaults to True.

        Raises:
            AssertionError: If the model is not a PyTorch model.
        """
        self._check_is_pytorch_model()
-        torch.save(self.ckpt, filename)
+        from ultralytics import __version__
+        from datetime import datetime
+
+        updates = {
+            "date": datetime.now().isoformat(),
+            "version": __version__,
+            "license": "AGPL-3.0 License (https://ultralytics.com/license)",
+            "docs": "https://docs.ultralytics.com",
+        }
+        torch.save({**self.ckpt, **updates}, filename, use_dill=use_dill)

    def info(self, detailed: bool = False, verbose: bool = True):
        """
--- a/ultralytics/engine/predictor.py
+++ b/ultralytics/engine/predictor.py
@ -26,7 +26,9 @@ Usage - formats:
                              yolov8n.tflite             # TensorFlow Lite
                              yolov8n_edgetpu.tflite     # TensorFlow Edge TPU
                              yolov8n_paddle_model       # PaddlePaddle
+                              yolov8n_ncnn_model         # NCNN
 """
+
 import platform
 import threading
 from pathlib import Path
--- a/ultralytics/engine/trainer.py
+++ b/ultralytics/engine/trainer.py
@ -489,6 +489,8 @@ class BaseTrainer:
            "train_results": results,
            "date": datetime.now().isoformat(),
            "version": __version__,
+            "license": "AGPL-3.0 (https://ultralytics.com/license)",
+            "docs": "https://docs.ultralytics.com",
        }

        if self.args.close_mosaic and self.epoch == (self.epochs - self.args.close_mosaic - 1):
--- a/ultralytics/engine/tuner.py
+++ b/ultralytics/engine/tuner.py
@ -16,6 +16,7 @@ Example:
    model.tune(data='coco8.yaml', epochs=10, iterations=300, optimizer='AdamW', plots=False, save=False, val=False)
    ```
 """
+
 import random
 import shutil
 import subprocess
--- a/ultralytics/engine/validator.py
+++ b/ultralytics/engine/validator.py
@ -17,7 +17,9 @@ Usage - formats:
                          yolov8n.tflite             # TensorFlow Lite
                          yolov8n_edgetpu.tflite     # TensorFlow Edge TPU
                          yolov8n_paddle_model       # PaddlePaddle
+                          yolov8n_ncnn_model         # NCNN
 """
+
 import json
 import time
 from pathlib import Path
--- a/ultralytics/hub/init.py
+++ b/ultralytics/hub/init.py
@ -48,7 +48,7 @@ def login(api_key: str = None, save=True) -> bool:
        return True
    else:
        # Failed to authenticate with HUB
-        LOGGER.info(f"{PREFIX}Retrieve API key from {api_key_url}")
+        LOGGER.info(f"{PREFIX}Get API key from {api_key_url} and then run 'yolo hub login API_KEY'")
        return False


--- a/ultralytics/hub/auth.py
+++ b/ultralytics/hub/auth.py
@ -64,7 +64,7 @@ class Auth:
            if verbose:
                LOGGER.info(f"{PREFIX}New authentication successful ✅")
        elif verbose:
-            LOGGER.info(f"{PREFIX}Retrieve API key from {API_KEY_URL}")
+            LOGGER.info(f"{PREFIX}Get API key from {API_KEY_URL} and then run 'yolo hub login API_KEY'")

    def request_api_key(self, max_attempts=3):
        """
--- a/ultralytics/hub/session.py
+++ b/ultralytics/hub/session.py
@ -52,6 +52,7 @@ class HUBTrainingSession:
            "heartbeat": 300.0,
        }  # rate limits (seconds)
        self.metrics_queue = {}  # holds metrics for each epoch until upload
+        self.metrics_upload_failed_queue = {}  # holds metrics for each epoch if upload failed
        self.timers = {}  # holds timers in ultralytics/utils/callbacks/hub.py

        # Parse input
@ -234,6 +235,9 @@ class HUBTrainingSession:
                    self._show_upload_progress(progress_total, response)

                if HTTPStatus.OK <= response.status_code < HTTPStatus.MULTIPLE_CHOICES:
+                    # if request related to metrics upload
+                    if kwargs.get("metrics"):
+                        self.metrics_upload_failed_queue = {}
                    return response  # Success, no need to retry

                if i == 0:
@ -249,6 +253,10 @@ class HUBTrainingSession:

                time.sleep(2**i)  # Exponential backoff for retries

+            # if request related to metrics upload and exceed retries
+            if response is None and kwargs.get("metrics"):
+                self.metrics_upload_failed_queue.update(kwargs.get("metrics", None))
+
            return response

        if thread:
--- a/ultralytics/models/rtdetr/model.py
+++ b/ultralytics/models/rtdetr/model.py
@ -7,6 +7,8 @@ hybrid encoder and IoU-aware query selection for enhanced detection accuracy.
 For more information on RT-DETR, visit: https://arxiv.org/pdf/2304.08069.pdf
 """

+from pathlib import Path
+
 from ultralytics.engine.model import Model
 from ultralytics.nn.tasks import RTDETRDetectionModel

@ -34,7 +36,7 @@ class RTDETR(Model):
        Raises:
            NotImplementedError: If the model file extension is not 'pt', 'yaml', or 'yml'.
        """
-        if model and model.split(".")[-1] not in ("pt", "yaml", "yml"):
+        if model and Path(model).suffix not in (".pt", ".yaml", ".yml"):
            raise NotImplementedError("RT-DETR only supports creating from *.pt, *.yaml, or *.yml files.")
        super().__init__(model=model, task="detect")

--- a/ultralytics/models/yolo/detect/val.py
+++ b/ultralytics/models/yolo/detect/val.py
@ -36,7 +36,7 @@ class DetectionValidator(BaseValidator):
        self.class_map = None
        self.args.task = "detect"
        self.metrics = DetMetrics(save_dir=self.save_dir, on_plot=self.on_plot)
-        self.iouv = torch.linspace(0.5, 0.95, 10)  # iou vector for mAP@0.5:0.95
+        self.iouv = torch.linspace(0.5, 0.95, 10)  # IoU vector for mAP@0.5:0.95
        self.niou = self.iouv.numel()
        self.lb = []  # for autolabelling

--- a/ultralytics/models/yolo/model.py
+++ b/ultralytics/models/yolo/model.py
@ -13,9 +13,9 @@ class YOLO(Model):

    def __init__(self, model="yolov8n.pt", task=None, verbose=False):
        """Initialize YOLO model, switching to YOLOWorld if model filename contains '-world'."""
-        stem = Path(model).stem  # filename stem without suffix, i.e. "yolov8n"
-        if "-world" in stem:
-            new_instance = YOLOWorld(model)
+        path = Path(model)
+        if "-world" in path.stem and path.suffix in {".pt", ".yaml", ".yml"}:  # if YOLOWorld PyTorch model
+            new_instance = YOLOWorld(path)
            self.__class__ = type(new_instance)
            self.__dict__ = new_instance.__dict__
        else:
@ -67,7 +67,7 @@ class YOLOWorld(Model):
        Initializes the YOLOv8-World model with the given pre-trained model file. Supports *.pt and *.yaml formats.

        Args:
-            model (str): Path to the pre-trained model. Defaults to 'yolov8s-world.pt'.
+            model (str | Path): Path to the pre-trained model. Defaults to 'yolov8s-world.pt'.
        """
        super().__init__(model=model, task="detect")

--- a/ultralytics/nn/autobackend.py
+++ b/ultralytics/nn/autobackend.py
@ -72,7 +72,7 @@ class AutoBackend(nn.Module):
            | TensorFlow Lite       | *.tflite         |
            | TensorFlow Edge TPU   | *_edgetpu.tflite |
            | PaddlePaddle          | *_paddle_model   |
-            | ncnn                  | *_ncnn_model     |
+            | NCNN                  | *_ncnn_model     |

    This class offers dynamic backend switching capabilities based on the input model format, making it easier to deploy
    models across various platforms.
@ -304,9 +304,9 @@ class AutoBackend(nn.Module):
            input_handle = predictor.get_input_handle(predictor.get_input_names()[0])
            output_names = predictor.get_output_names()
            metadata = w.parents[1] / "metadata.yaml"
-        elif ncnn:  # ncnn
-            LOGGER.info(f"Loading {w} for ncnn inference...")
-            check_requirements("git+https://github.com/Tencent/ncnn.git" if ARM64 else "ncnn")  # requires ncnn
+        elif ncnn:  # NCNN
+            LOGGER.info(f"Loading {w} for NCNN inference...")
+            check_requirements("git+https://github.com/Tencent/ncnn.git" if ARM64 else "ncnn")  # requires NCNN
            import ncnn as pyncnn

            net = pyncnn.Net()
@ -431,7 +431,7 @@ class AutoBackend(nn.Module):
            self.input_handle.copy_from_cpu(im)
            self.predictor.run()
            y = [self.predictor.get_output_handle(x).copy_to_cpu() for x in self.output_names]
-        elif self.ncnn:  # ncnn
+        elif self.ncnn:  # NCNN
            mat_in = self.pyncnn.Mat(im[0].cpu().numpy())
            ex = self.net.create_extractor()
            input_names, output_names = self.net.input_names(), self.net.output_names()
--- a/ultralytics/nn/modules/block.py
+++ b/ultralytics/nn/modules/block.py
@ -537,7 +537,6 @@ class BNContrastiveHead(nn.Module):

    Args:
        embed_dims (int): Embed dimensions of text and image features.
-        norm_cfg (dict): Normalization parameters.
    """

    def __init__(self, embed_dims: int):
@ -559,7 +558,10 @@ class BNContrastiveHead(nn.Module):
 class RepBottleneck(nn.Module):
    """Rep bottleneck."""

-    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):  # ch_in, ch_out, shortcut, kernels, groups, expand
+    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
+        """Initializes a RepBottleneck module with customizable in/out channels, shortcut option, groups and expansion
+        ratio.
+        """
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, k[0], 1)
@ -574,7 +576,8 @@ class RepBottleneck(nn.Module):
 class RepCSP(nn.Module):
    """Rep CSP Bottleneck with 3 convolutions."""

-    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
+    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
+        """Initializes RepCSP layer with given channels, repetitions, shortcut, groups and expansion ratio."""
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
@ -590,7 +593,8 @@ class RepCSP(nn.Module):
 class RepNCSPELAN4(nn.Module):
    """CSP-ELAN."""

-    def __init__(self, c1, c2, c3, c4, n=1):  # ch_in, ch_out, number, shortcut, groups, expansion
+    def __init__(self, c1, c2, c3, c4, n=1):
+        """Initializes CSP-ELAN layer with specified channel sizes, repetitions, and convolutions."""
        super().__init__()
        self.c = c3 // 2
        self.cv1 = Conv(c1, c3, 1, 1)
@ -614,7 +618,8 @@ class RepNCSPELAN4(nn.Module):
 class ADown(nn.Module):
    """ADown."""

-    def __init__(self, c1, c2):  # ch_in, ch_out, shortcut, kernels, groups, expand
+    def __init__(self, c1, c2):
+        """Initializes ADown module with convolution layers to downsample input from channels c1 to c2."""
        super().__init__()
        self.c = c2 // 2
        self.cv1 = Conv(c1 // 2, self.c, 3, 2, 1)
@ -633,7 +638,8 @@ class ADown(nn.Module):
 class SPPELAN(nn.Module):
    """SPP-ELAN."""

-    def __init__(self, c1, c2, c3, k=5):  # ch_in, ch_out, number, shortcut, groups, expansion
+    def __init__(self, c1, c2, c3, k=5):
+        """Initializes SPP-ELAN block with convolution and max pooling layers for spatial pyramid pooling."""
        super().__init__()
        self.c = c3
        self.cv1 = Conv(c1, c3, 1, 1)
@ -653,6 +659,7 @@ class Silence(nn.Module):
    """Silence."""

    def __init__(self):
+        """Initializes the Silence module."""
        super(Silence, self).__init__()

    def forward(self, x):
@ -663,7 +670,8 @@ class Silence(nn.Module):
 class CBLinear(nn.Module):
    """CBLinear."""

-    def __init__(self, c1, c2s, k=1, s=1, p=None, g=1):  # ch_in, ch_outs, kernel, stride, padding, groups
+    def __init__(self, c1, c2s, k=1, s=1, p=None, g=1):
+        """Initializes the CBLinear module, passing inputs unchanged."""
        super(CBLinear, self).__init__()
        self.c2s = c2s
        self.conv = nn.Conv2d(c1, sum(c2s), k, s, autopad(k, p), groups=g, bias=True)
@ -678,6 +686,7 @@ class CBFuse(nn.Module):
    """CBFuse."""

    def __init__(self, idx):
+        """Initializes CBFuse module with layer index for selective feature fusion."""
        super(CBFuse, self).__init__()
        self.idx = idx

--- a/ultralytics/nn/tasks.py
+++ b/ultralytics/nn/tasks.py
@ -48,7 +48,7 @@ from ultralytics.nn.modules import (
    SPPELAN,
    CBFuse,
    CBLinear,
-    Silence
+    Silence,
 )
 from ultralytics.utils import DEFAULT_CFG_DICT, DEFAULT_CFG_KEYS, LOGGER, colorstr, emojis, yaml_load
 from ultralytics.utils.checks import check_requirements, check_suffix, check_yaml
@ -576,7 +576,7 @@ class WorldModel(DetectionModel):
        text_token = clip.tokenize(text).to(device)
        txt_feats = model.encode_text(text_token).to(dtype=torch.float32)
        txt_feats = txt_feats / txt_feats.norm(p=2, dim=-1, keepdim=True)
-        self.txt_feats = txt_feats.reshape(-1, len(text), txt_feats.shape[-1])
+        self.txt_feats = txt_feats.reshape(-1, len(text), txt_feats.shape[-1]).detach()
        self.model[-1].nc = len(text)

    def init_criterion(self):
--- a/ultralytics/trackers/byte_tracker.py
+++ b/ultralytics/trackers/byte_tracker.py
@ -235,7 +235,7 @@ class BYTETracker:
        reset_id(): Resets the ID counter of STrack.
        joint_stracks(tlista, tlistb): Combines two lists of stracks.
        sub_stracks(tlista, tlistb): Filters out the stracks present in the second list from the first list.
-        remove_duplicate_stracks(stracksa, stracksb): Removes duplicate stracks based on IOU.
+        remove_duplicate_stracks(stracksa, stracksb): Removes duplicate stracks based on IoU.
    """

    def __init__(self, args, frame_rate=30):
@ -373,7 +373,7 @@ class BYTETracker:
        return [STrack(xyxy, s, c) for (xyxy, s, c) in zip(dets, scores, cls)] if len(dets) else []  # detections

    def get_dists(self, tracks, detections):
-        """Calculates the distance between tracks and detections using IOU and fuses scores."""
+        """Calculates the distance between tracks and detections using IoU and fuses scores."""
        dists = matching.iou_distance(tracks, detections)
        # TODO: mot20
        # if not self.args.mot20:
@ -428,7 +428,7 @@ class BYTETracker:

    @staticmethod
    def remove_duplicate_stracks(stracksa, stracksb):
-        """Remove duplicate stracks with non-maximum IOU distance."""
+        """Remove duplicate stracks with non-maximum IoU distance."""
        pdist = matching.iou_distance(stracksa, stracksb)
        pairs = np.where(pdist < 0.15)
        dupa, dupb = [], []
--- a/ultralytics/utils/benchmarks.py
+++ b/ultralytics/utils/benchmarks.py
@ -21,7 +21,7 @@ TensorFlow Lite         | `tflite`                  | yolov8n.tflite
 TensorFlow Edge TPU     | `edgetpu`                 | yolov8n_edgetpu.tflite
 TensorFlow.js           | `tfjs`                    | yolov8n_web_model/
 PaddlePaddle            | `paddle`                  | yolov8n_paddle_model/
-ncnn                    | `ncnn`                    | yolov8n_ncnn_model/
+NCNN                    | `ncnn`                    | yolov8n_ncnn_model/
 """

 import glob
@ -32,7 +32,7 @@ from pathlib import Path
 import numpy as np
 import torch.cuda

-from ultralytics import YOLO
+from ultralytics import YOLO, YOLOWorld
 from ultralytics.cfg import TASK2DATA, TASK2METRIC
 from ultralytics.engine.exporter import export_formats
 from ultralytics.utils import ASSETS, LINUX, LOGGER, MACOS, TQDM, WEIGHTS_DIR
@ -84,14 +84,20 @@ def benchmark(
        emoji, filename = "❌", None  # export defaults
        try:
            # Checks
-            if i == 9:
+            if i == 9:  # Edge TPU
                assert LINUX, "Edge TPU export only supported on Linux"
-            elif i == 7:
+            elif i == 7:  # TF GraphDef
                assert model.task != "obb", "TensorFlow GraphDef not supported for OBB task"
            elif i in {5, 10}:  # CoreML and TF.js
                assert MACOS or LINUX, "export only supported on macOS and Linux"
            if i in {3, 5}:  # CoreML and OpenVINO
                assert not IS_PYTHON_3_12, "CoreML and OpenVINO not supported on Python 3.12"
+            if i in {6, 7, 8, 9, 10}:  # All TF formats
+                assert not isinstance(model, YOLOWorld), "YOLOWorldv2 TensorFlow exports not supported by onnx2tf yet"
+            if i in {11}:  # Paddle
+                assert not isinstance(model, YOLOWorld), "YOLOWorldv2 Paddle exports not supported yet"
+            if i in {12}:  # NCNN
+                assert not isinstance(model, YOLOWorld), "YOLOWorldv2 NCNN exports not supported yet"
            if "cpu" in device.type:
                assert cpu, "inference not supported on CPU"
            if "cuda" in device.type:
@ -261,7 +267,8 @@ class ProfileModels:
        """
        return 0.0, 0.0, 0.0, 0.0  # return (num_layers, num_params, num_gradients, num_flops)

-    def iterative_sigma_clipping(self, data, sigma=2, max_iters=3):
+    @staticmethod
+    def iterative_sigma_clipping(data, sigma=2, max_iters=3):
        """Applies an iterative sigma clipping algorithm to the given data times number of iterations."""
        data = np.array(data)
        for _ in range(max_iters):
@ -359,9 +366,13 @@ class ProfileModels:
    def generate_table_row(self, model_name, t_onnx, t_engine, model_info):
        """Generates a formatted string for a table row that includes model performance and metric details."""
        layers, params, gradients, flops = model_info
-        return f"| {model_name:18s} | {self.imgsz} | - | {t_onnx[0]:.2f} ± {t_onnx[1]:.2f} ms | {t_engine[0]:.2f} ± {t_engine[1]:.2f} ms | {params / 1e6:.1f} | {flops:.1f} |"
+        return (
+            f"| {model_name:18s} | {self.imgsz} | - | {t_onnx[0]:.2f} ± {t_onnx[1]:.2f} ms | {t_engine[0]:.2f} ± "
+            f"{t_engine[1]:.2f} ms | {params / 1e6:.1f} | {flops:.1f} |"
+        )

-    def generate_results_dict(self, model_name, t_onnx, t_engine, model_info):
+    @staticmethod
+    def generate_results_dict(model_name, t_onnx, t_engine, model_info):
        """Generates a dictionary of model details including name, parameters, GFLOPS and speed metrics."""
        layers, params, gradients, flops = model_info
        return {
@ -372,11 +383,18 @@ class ProfileModels:
            "model/speed_TensorRT(ms)": round(t_engine[0], 3),
        }

-    def print_table(self, table_rows):
+    @staticmethod
+    def print_table(table_rows):
        """Formats and prints a comparison table for different models with given statistics and performance data."""
        gpu = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "GPU"
-        header = f"| Model | size<br><sup>(pixels) | mAP<sup>val<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>{gpu} TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |"
-        separator = "|-------------|---------------------|--------------------|------------------------------|-----------------------------------|------------------|-----------------|"
+        header = (
+            f"| Model | size<br><sup>(pixels) | mAP<sup>val<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | "
+            f"Speed<br><sup>{gpu} TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |"
+        )
+        separator = (
+            "|-------------|---------------------|--------------------|------------------------------|"
+            "-----------------------------------|------------------|-----------------|"
+        )

        print(f"\n\n{header}")
        print(separator)
--- a/ultralytics/utils/callbacks/hub.py
+++ b/ultralytics/utils/callbacks/hub.py
@ -33,6 +33,11 @@ def on_fit_epoch_end(trainer):
            all_plots = {**all_plots, **model_info_for_loggers(trainer)}

        session.metrics_queue[trainer.epoch] = json.dumps(all_plots)
+
+        # If any metrics fail to upload, add them to the queue to attempt uploading again.
+        if session.metrics_upload_failed_queue:
+            session.metrics_queue.update(session.metrics_upload_failed_queue)
+
        if time() - session.timers["metrics"] > session.rate_limits["metrics"]:
            session.upload_metrics()
            session.timers["metrics"] = time()  # reset timer
--- a/ultralytics/utils/downloads.py
+++ b/ultralytics/utils/downloads.py
@ -20,7 +20,9 @@ GITHUB_ASSETS_NAMES = (
    [f"yolov8{k}{suffix}.pt" for k in "nsmlx" for suffix in ("", "-cls", "-seg", "-pose", "-obb")]
    + [f"yolov5{k}{resolution}u.pt" for k in "nsmlx" for resolution in ("", "6")]
    + [f"yolov3{k}u.pt" for k in ("", "-spp", "-tiny")]
-    + [f"yolov8{k}-world.pt" for k in "sml"]
+    + [f"yolov8{k}-world.pt" for k in "smlx"]
+    + [f"yolov8{k}-worldv2.pt" for k in "smlx"]
+    + [f"yolov9{k}.pt" for k in "ce"]
    + [f"yolo_nas_{k}.pt" for k in "sml"]
    + [f"sam_{k}.pt" for k in "bl"]
    + [f"FastSAM-{k}.pt" for k in "sx"]
--- a/ultralytics/utils/loss.py
+++ b/ultralytics/utils/loss.py
@ -137,10 +137,10 @@ class KeypointLoss(nn.Module):

    def forward(self, pred_kpts, gt_kpts, kpt_mask, area):
        """Calculates keypoint loss factor and Euclidean distance loss for predicted and actual keypoints."""
-        d = (pred_kpts[..., 0] - gt_kpts[..., 0]) ** 2 + (pred_kpts[..., 1] - gt_kpts[..., 1]) ** 2
+        d = (pred_kpts[..., 0] - gt_kpts[..., 0]).pow(2) + (pred_kpts[..., 1] - gt_kpts[..., 1]).pow(2)
        kpt_loss_factor = kpt_mask.shape[1] / (torch.sum(kpt_mask != 0, dim=1) + 1e-9)
        # e = d / (2 * (area * self.sigmas) ** 2 + 1e-9)  # from formula
-        e = d / (2 * self.sigmas) ** 2 / (area + 1e-9) / 2  # from cocoeval
+        e = d / (2 * self.sigmas).pow(2) / (area + 1e-9) / 2  # from cocoeval
        return (kpt_loss_factor.view(-1, 1) * ((1 - torch.exp(-e)) * kpt_mask)).mean()


--- a/ultralytics/utils/metrics.py
+++ b/ultralytics/utils/metrics.py
@ -24,7 +24,7 @@ def bbox_ioa(box1, box2, iou=False, eps=1e-7):
    Args:
        box1 (np.ndarray): A numpy array of shape (n, 4) representing n bounding boxes.
        box2 (np.ndarray): A numpy array of shape (m, 4) representing m bounding boxes.
-        iou (bool): Calculate the standard iou if True else return inter_area/box2_area.
+        iou (bool): Calculate the standard IoU if True else return inter_area/box2_area.
        eps (float, optional): A small value to avoid division by zero. Defaults to 1e-7.

    Returns:
@ -116,10 +116,12 @@ def bbox_iou(box1, box2, xywh=True, GIoU=False, DIoU=False, CIoU=False, eps=1e-7
        cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1)  # convex (smallest enclosing box) width
        ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1)  # convex height
        if CIoU or DIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
-            c2 = cw**2 + ch**2 + eps  # convex diagonal squared
-            rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4  # center dist ** 2
+            c2 = cw.pow(2) + ch.pow(2) + eps  # convex diagonal squared
+            rho2 = (
+                (b2_x1 + b2_x2 - b1_x1 - b1_x2).pow(2) + (b2_y1 + b2_y2 - b1_y1 - b1_y2).pow(2)
+            ) / 4  # center dist**2
            if CIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
-                v = (4 / math.pi**2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)
+                v = (4 / math.pi**2) * ((w2 / h2).atan() - (w1 / h1).atan()).pow(2)
                with torch.no_grad():
                    alpha = v / (v - iou + (1 + eps))
                return iou - (rho2 / c2 + v * alpha)  # CIoU
@ -162,12 +164,12 @@ def kpt_iou(kpt1, kpt2, area, sigma, eps=1e-7):
    Returns:
        (torch.Tensor): A tensor of shape (N, M) representing keypoint similarities.
    """
-    d = (kpt1[:, None, :, 0] - kpt2[..., 0]) ** 2 + (kpt1[:, None, :, 1] - kpt2[..., 1]) ** 2  # (N, M, 17)
+    d = (kpt1[:, None, :, 0] - kpt2[..., 0]).pow(2) + (kpt1[:, None, :, 1] - kpt2[..., 1]).pow(2)  # (N, M, 17)
    sigma = torch.tensor(sigma, device=kpt1.device, dtype=kpt1.dtype)  # (17, )
    kpt_mask = kpt1[..., 2] != 0  # (N, 17)
-    e = d / (2 * sigma) ** 2 / (area[:, None, None] + eps) / 2  # from cocoeval
+    e = d / (2 * sigma).pow(2) / (area[:, None, None] + eps) / 2  # from cocoeval
    # e = d / ((area[None, :, None] + eps) * sigma) ** 2 / 2  # from formula
-    return (torch.exp(-e) * kpt_mask[:, None]).sum(-1) / (kpt_mask.sum(-1)[:, None] + eps)
+    return ((-e).exp() * kpt_mask[:, None]).sum(-1) / (kpt_mask.sum(-1)[:, None] + eps)


 def _get_covariance_matrix(boxes):
@ -181,18 +183,18 @@ def _get_covariance_matrix(boxes):
        (torch.Tensor): Covariance metrixs corresponding to original rotated bounding boxes.
    """
    # Gaussian bounding boxes, ignore the center points (the first two columns) because they are not needed here.
-    gbbs = torch.cat((torch.pow(boxes[:, 2:4], 2) / 12, boxes[:, 4:]), dim=-1)
+    gbbs = torch.cat((boxes[:, 2:4].pow(2) / 12, boxes[:, 4:]), dim=-1)
    a, b, c = gbbs.split(1, dim=-1)
-    return (
-        a * torch.cos(c) ** 2 + b * torch.sin(c) ** 2,
-        a * torch.sin(c) ** 2 + b * torch.cos(c) ** 2,
-        a * torch.cos(c) * torch.sin(c) - b * torch.sin(c) * torch.cos(c),
-    )
+    cos = c.cos()
+    sin = c.sin()
+    cos2 = cos.pow(2)
+    sin2 = sin.pow(2)
+    return a * cos2 + b * sin2, a * sin2 + b * cos2, (a - b) * cos * sin


 def probiou(obb1, obb2, CIoU=False, eps=1e-7):
    """
-    Calculate the prob iou between oriented bounding boxes, https://arxiv.org/pdf/2106.06072v1.pdf.
+    Calculate the prob IoU between oriented bounding boxes, https://arxiv.org/pdf/2106.06072v1.pdf.

    Args:
        obb1 (torch.Tensor): A tensor of shape (N, 5) representing ground truth obbs, with xywhr format.
@ -208,26 +210,21 @@ def probiou(obb1, obb2, CIoU=False, eps=1e-7):
    a2, b2, c2 = _get_covariance_matrix(obb2)

    t1 = (
-        ((a1 + a2) * (torch.pow(y1 - y2, 2)) + (b1 + b2) * (torch.pow(x1 - x2, 2)))
-        / ((a1 + a2) * (b1 + b2) - (torch.pow(c1 + c2, 2)) + eps)
+        ((a1 + a2) * (y1 - y2).pow(2) + (b1 + b2) * (x1 - x2).pow(2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2).pow(2) + eps)
    ) * 0.25
-    t2 = (((c1 + c2) * (x2 - x1) * (y1 - y2)) / ((a1 + a2) * (b1 + b2) - (torch.pow(c1 + c2, 2)) + eps)) * 0.5
+    t2 = (((c1 + c2) * (x2 - x1) * (y1 - y2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2).pow(2) + eps)) * 0.5
    t3 = (
-        torch.log(
-            ((a1 + a2) * (b1 + b2) - (torch.pow(c1 + c2, 2)))
-            / (4 * torch.sqrt((a1 * b1 - torch.pow(c1, 2)).clamp_(0) * (a2 * b2 - torch.pow(c2, 2)).clamp_(0)) + eps)
-            + eps
-        )
-        * 0.5
-    )
-    bd = t1 + t2 + t3
-    bd = torch.clamp(bd, eps, 100.0)
-    hd = torch.sqrt(1.0 - torch.exp(-bd) + eps)
+        ((a1 + a2) * (b1 + b2) - (c1 + c2).pow(2))
+        / (4 * ((a1 * b1 - c1.pow(2)).clamp_(0) * (a2 * b2 - c2.pow(2)).clamp_(0)).sqrt() + eps)
+        + eps
+    ).log() * 0.5
+    bd = (t1 + t2 + t3).clamp(eps, 100.0)
+    hd = (1.0 - (-bd).exp() + eps).sqrt()
    iou = 1 - hd
    if CIoU:  # only include the wh aspect ratio part
        w1, h1 = obb1[..., 2:4].split(1, dim=-1)
        w2, h2 = obb2[..., 2:4].split(1, dim=-1)
-        v = (4 / math.pi**2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)
+        v = (4 / math.pi**2) * ((w2 / h2).atan() - (w1 / h1).atan()).pow(2)
        with torch.no_grad():
            alpha = v / (v - iou + (1 + eps))
        return iou - v * alpha  # CIoU
@ -236,7 +233,7 @@ def probiou(obb1, obb2, CIoU=False, eps=1e-7):

 def batch_probiou(obb1, obb2, eps=1e-7):
    """
-    Calculate the prob iou between oriented bounding boxes, https://arxiv.org/pdf/2106.06072v1.pdf.
+    Calculate the prob IoU between oriented bounding boxes, https://arxiv.org/pdf/2106.06072v1.pdf.

    Args:
        obb1 (torch.Tensor | np.ndarray): A tensor of shape (N, 5) representing ground truth obbs, with xywhr format.
@ -255,21 +252,16 @@ def batch_probiou(obb1, obb2, eps=1e-7):
    a2, b2, c2 = (x.squeeze(-1)[None] for x in _get_covariance_matrix(obb2))

    t1 = (
-        ((a1 + a2) * (torch.pow(y1 - y2, 2)) + (b1 + b2) * (torch.pow(x1 - x2, 2)))
-        / ((a1 + a2) * (b1 + b2) - (torch.pow(c1 + c2, 2)) + eps)
+        ((a1 + a2) * (y1 - y2).pow(2) + (b1 + b2) * (x1 - x2).pow(2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2).pow(2) + eps)
    ) * 0.25
-    t2 = (((c1 + c2) * (x2 - x1) * (y1 - y2)) / ((a1 + a2) * (b1 + b2) - (torch.pow(c1 + c2, 2)) + eps)) * 0.5
+    t2 = (((c1 + c2) * (x2 - x1) * (y1 - y2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2).pow(2) + eps)) * 0.5
    t3 = (
-        torch.log(
-            ((a1 + a2) * (b1 + b2) - (torch.pow(c1 + c2, 2)))
-            / (4 * torch.sqrt((a1 * b1 - torch.pow(c1, 2)).clamp_(0) * (a2 * b2 - torch.pow(c2, 2)).clamp_(0)) + eps)
-            + eps
-        )
-        * 0.5
-    )
-    bd = t1 + t2 + t3
-    bd = torch.clamp(bd, eps, 100.0)
-    hd = torch.sqrt(1.0 - torch.exp(-bd) + eps)
+        ((a1 + a2) * (b1 + b2) - (c1 + c2).pow(2))
+        / (4 * ((a1 * b1 - c1.pow(2)).clamp_(0) * (a2 * b2 - c2.pow(2)).clamp_(0)).sqrt() + eps)
+        + eps
+    ).log() * 0.5
+    bd = (t1 + t2 + t3).clamp(eps, 100.0)
+    hd = (1.0 - (-bd).exp() + eps).sqrt()
    return 1 - hd


--- a/ultralytics/utils/ops.py
+++ b/ultralytics/utils/ops.py
@ -147,7 +147,7 @@ def nms_rotated(boxes, scores, threshold=0.45):
    Args:
        boxes (torch.Tensor): (N, 5), xywhr.
        scores (torch.Tensor): (N, ).
-        threshold (float): Iou threshold.
+        threshold (float): IoU threshold.

    Returns:
    """
@ -287,7 +287,7 @@ def non_max_suppression(
        # if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)
        #     # Update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
        #     from .metrics import box_iou
-        #     iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix
+        #     iou = box_iou(boxes[i], boxes) > iou_thres  # IoU matrix
        #     weights = iou * scores[None]  # box weights
        #     x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
        #     redundant = True  # require redundant detections
--- a/ultralytics/utils/patches.py
+++ b/ultralytics/utils/patches.py
@ -60,27 +60,29 @@ def imshow(winname: str, mat: np.ndarray):
 _torch_save = torch.save  # copy to avoid recursion errors


-def torch_save(*args, **kwargs):
+def torch_save(*args, use_dill=True, **kwargs):
    """
-    Use dill (if exists) to serialize the lambda functions where pickle does not do this. Also adds 3 retries with
-    exponential standoff in case of save failure to improve robustness to transient issues.
+    Optionally use dill to serialize lambda functions where pickle does not, adding robustness with 3 retries and
+    exponential standoff in case of save failure.

    Args:
        *args (tuple): Positional arguments to pass to torch.save.
+        use_dill (bool): Whether to try using dill for serialization if available. Defaults to True.
        **kwargs (dict): Keyword arguments to pass to torch.save.
    """
    try:
-        import dill as pickle  # noqa
-    except ImportError:
+        assert use_dill
+        import dill as pickle
+    except (AssertionError, ImportError):
        import pickle

    if "pickle_module" not in kwargs:
-        kwargs["pickle_module"] = pickle  # noqa
+        kwargs["pickle_module"] = pickle

    for i in range(4):  # 3 retries
        try:
            return _torch_save(*args, **kwargs)
-        except RuntimeError:  # unable to save, possibly waiting for device to flush or anti-virus to finish scanning
+        except RuntimeError as e:  # unable to save, possibly waiting for device to flush or antivirus scan
            if i == 3:
-                raise
-            time.sleep((2**i) / 2)  # exponential standoff 0.5s, 1.0s, 2.0s
+                raise e
+            time.sleep((2**i) / 2)  # exponential standoff: 0.5s, 1.0s, 2.0s
--- a/ultralytics/utils/plotting.py
+++ b/ultralytics/utils/plotting.py
@ -1028,13 +1028,13 @@ def feature_visualization(x, module_type, stage, n=32, save_dir=Path("runs/detec
    for m in ["Detect", "Pose", "Segment"]:
        if m in module_type:
            return
-    batch, channels, height, width = x.shape  # batch, channels, height, width
+    _, channels, height, width = x.shape  # batch, channels, height, width
    if height > 1 and width > 1:
        f = save_dir / f"stage{stage}_{module_type.split('.')[-1]}_features.png"  # filename

        blocks = torch.chunk(x[0].cpu(), channels, dim=0)  # select batch index 0, block by channels
        n = min(n, channels)  # number of plots
-        fig, ax = plt.subplots(math.ceil(n / 8), 8, tight_layout=True)  # 8 rows x n/8 cols
+        _, ax = plt.subplots(math.ceil(n / 8), 8, tight_layout=True)  # 8 rows x n/8 cols
        ax = ax.ravel()
        plt.subplots_adjust(wspace=0.05, hspace=0.05)
        for i in range(n):
--- a/ultralytics/utils/tal.py
+++ b/ultralytics/utils/tal.py
@ -121,7 +121,7 @@ class TaskAlignedAssigner(nn.Module):
        return align_metric, overlaps

    def iou_calculation(self, gt_bboxes, pd_bboxes):
-        """Iou calculation for horizontal bounding boxes."""
+        """IoU calculation for horizontal bounding boxes."""
        return bbox_iou(gt_bboxes, pd_bboxes, xywh=False, CIoU=True).squeeze(-1).clamp_(0)

    def select_topk_candidates(self, metrics, largest=True, topk_mask=None):
@ -231,7 +231,7 @@ class TaskAlignedAssigner(nn.Module):
    @staticmethod
    def select_highest_overlaps(mask_pos, overlaps, n_max_boxes):
        """
-        If an anchor box is assigned to multiple gts, the one with the highest IoI will be selected.
+        If an anchor box is assigned to multiple gts, the one with the highest IoU will be selected.

        Args:
            mask_pos (Tensor): shape(b, n_max_boxes, h*w)
@ -260,7 +260,7 @@ class TaskAlignedAssigner(nn.Module):

 class RotatedTaskAlignedAssigner(TaskAlignedAssigner):
    def iou_calculation(self, gt_bboxes, pd_bboxes):
-        """Iou calculation for rotated bounding boxes."""
+        """IoU calculation for rotated bounding boxes."""
        return probiou(gt_bboxes, pd_bboxes).squeeze(-1).clamp_(0)

    @staticmethod
--- a/ultralytics/utils/torch_utils.py
+++ b/ultralytics/utils/torch_utils.py
@ -115,7 +115,7 @@ def select_device(device="", batch=0, newline=False, verbose=True):
            device = "0"
        visible = os.environ.get("CUDA_VISIBLE_DEVICES", None)
        os.environ["CUDA_VISIBLE_DEVICES"] = device  # set environment variable - must be before assert is_available()
-        if not (torch.cuda.is_available() and torch.cuda.device_count() >= len(device.replace(",", ""))):
+        if not (torch.cuda.is_available() and torch.cuda.device_count() >= len(device.split(","))):
            LOGGER.info(s)
            install = (
                "See https://pytorch.org/get-started/locally/ for up-to-date torch install instructions if no "