Merge branch 'main' into benchmarks_vs_size

1 year ago · b396ee85d2
parent ed8141954d e97958ef6f
commit b396ee85d2
59 changed files with 1855 additions and 315 deletions
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@ -7,5 +7,5 @@ contact_links:
    url: https://community.ultralytics.com/
    about: Ask on Ultralytics Community Forum
  - name: 🎧 Discord
-    url: https://discord.gg/7aegy5d8
+    url: https://discord.gg/2wNGbc6g9X
    about: Ask on Ultralytics Discord
--- a/.github/workflows/ci.yaml
+++ b/.github/workflows/ci.yaml
@ -19,8 +19,7 @@ jobs:
      fail-fast: false
      matrix:
        os: [ubuntu-latest]
-        python-version: ['3.10']
-        model: [yolov5n]
+        python-version: ['3.11']
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
@ -113,22 +112,22 @@ jobs:
        shell: python
        run: |
          from ultralytics.yolo.utils.benchmarks import benchmark
-          benchmark(model='${{ matrix.model }}.pt', imgsz=160, half=False, hard_fail=0.20)
+          benchmark(model='${{ matrix.model }}.pt', imgsz=160, half=False, hard_fail=0.26)
      - name: Benchmark SegmentationModel
        shell: python
        run: |
          from ultralytics.yolo.utils.benchmarks import benchmark
-          benchmark(model='${{ matrix.model }}-seg.pt', imgsz=160, half=False, hard_fail=0.14)
+          benchmark(model='${{ matrix.model }}-seg.pt', imgsz=160, half=False, hard_fail=0.30)
      - name: Benchmark ClassificationModel
        shell: python
        run: |
          from ultralytics.yolo.utils.benchmarks import benchmark
-          benchmark(model='${{ matrix.model }}-cls.pt', imgsz=160, half=False, hard_fail=0.61)
+          benchmark(model='${{ matrix.model }}-cls.pt', imgsz=160, half=False, hard_fail=0.36)
      - name: Benchmark PoseModel
        shell: python
        run: |
          from ultralytics.yolo.utils.benchmarks import benchmark
-          benchmark(model='${{ matrix.model }}-pose.pt', imgsz=160, half=False, hard_fail=0.0)
+          benchmark(model='${{ matrix.model }}-pose.pt', imgsz=160, half=False, hard_fail=0.17)
      - name: Benchmark Summary
        run: |
          cat benchmarks.log
@ -141,7 +140,7 @@ jobs:
      fail-fast: false
      matrix:
        os: [ubuntu-latest]
-        python-version: ['3.8', '3.9', '3.10']
+        python-version: ['3.11']
        model: [yolov8n]
        torch: [latest]
        include:
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -22,7 +22,7 @@ repos:
      - id: detect-private-key

  - repo: https://github.com/asottile/pyupgrade
-    rev: v3.4.0
+    rev: v3.8.0
    hooks:
      - id: pyupgrade
        name: Upgrade code
@ -34,7 +34,7 @@ repos:
        name: Sort imports

  - repo: https://github.com/google/yapf
-    rev: v0.33.0
+    rev: v0.40.0
    hooks:
      - id: yapf
        name: YAPF formatting
@ -56,7 +56,7 @@ repos:
        name: PEP8

  - repo: https://github.com/codespell-project/codespell
-    rev: v2.2.4
+    rev: v2.2.5
    hooks:
      - id: codespell
        args:
--- a/README.md
+++ b/README.md
@ -20,7 +20,7 @@

 [Ultralytics](https://ultralytics.com) [YOLOv8](https://github.com/ultralytics/ultralytics) is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.

-We hope that the resources here will help you get the most out of YOLOv8. Please browse the YOLOv8 <a href="https://docs.ultralytics.com/">Docs</a> for details, raise an issue on <a href="https://github.com/ultralytics/ultralytics/issues/new/choose">GitHub</a> for support, and join our <a href="https://discord.gg/7aegy5d8">Discord</a> community for questions and discussions!
+We hope that the resources here will help you get the most out of YOLOv8. Please browse the YOLOv8 <a href="https://docs.ultralytics.com/">Docs</a> for details, raise an issue on <a href="https://github.com/ultralytics/ultralytics/issues/new/choose">GitHub</a> for support, and join our <a href="https://discord.gg/2wNGbc6g9X">Discord</a> community for questions and discussions!

 To request an Enterprise License please complete the form at [Ultralytics Licensing](https://ultralytics.com/license).

@ -45,7 +45,7 @@ To request an Enterprise License please complete the form at [Ultralytics Licens
  <a href="https://www.instagram.com/ultralytics/" style="text-decoration:none;">
    <img src="https://github.com/ultralytics/assets/raw/main/social/logo-social-instagram.png" width="2%" alt="" /></a>
  <img src="https://github.com/ultralytics/assets/raw/main/social/logo-transparent.png" width="2%" alt="" />
-  <a href="https://discord.gg/7aegy5d8" style="text-decoration:none;">
+  <a href="https://discord.gg/2wNGbc6g9X" style="text-decoration:none;">
    <img src="https://github.com/ultralytics/assets/blob/main/social/logo-social-discord.png" width="2%" alt="" /></a>
 </div>
 </div>
@ -241,7 +241,7 @@ YOLOv8 is available under two different licenses:

 ## <div align="center">Contact</div>

-For YOLOv8 bug reports and feature requests please visit [GitHub Issues](https://github.com/ultralytics/ultralytics/issues), and join our [Discord](https://discord.gg/7aegy5d8) community for questions and discussions!
+For YOLOv8 bug reports and feature requests please visit [GitHub Issues](https://github.com/ultralytics/ultralytics/issues), and join our [Discord](https://discord.gg/2wNGbc6g9X) community for questions and discussions!

 <br>
 <div align="center">
@ -263,6 +263,6 @@ For YOLOv8 bug reports and feature requests please visit [GitHub Issues](https:/
  <a href="https://www.instagram.com/ultralytics/" style="text-decoration:none;">
    <img src="https://github.com/ultralytics/assets/raw/main/social/logo-social-instagram.png" width="3%" alt="" /></a>
  <img src="https://github.com/ultralytics/assets/raw/main/social/logo-transparent.png" width="3%" alt="" />
-  <a href="https://discord.gg/7aegy5d8" style="text-decoration:none;">
+  <a href="https://discord.gg/2wNGbc6g9X" style="text-decoration:none;">
    <img src="https://github.com/ultralytics/assets/blob/main/social/logo-social-discord.png" width="3%" alt="" /></a>
 </div>
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@ -20,7 +20,7 @@

 [Ultralytics](https://ultralytics.com) [YOLOv8](https://github.com/ultralytics/ultralytics) 是一款前沿、最先进（SOTA）的模型，基于先前 YOLO 版本的成功，引入了新功能和改进，进一步提升性能和灵活性。YOLOv8 设计快速、准确且易于使用，使其成为各种物体检测与跟踪、实例分割、图像分类和姿态估计任务的绝佳选择。

-我们希望这里的资源能帮助您充分利用 YOLOv8。请浏览 YOLOv8 <a href="https://docs.ultralytics.com/">文档</a> 了解详细信息，在 <a href="https://github.com/ultralytics/ultralytics/issues/new/choose">GitHub</a> 上提交问题以获得支持，并加入我们的 <a href="https://discord.gg/7aegy5d8">Discord</a> 社区进行问题和讨论！
+我们希望这里的资源能帮助您充分利用 YOLOv8。请浏览 YOLOv8 <a href="https://docs.ultralytics.com/">文档</a> 了解详细信息，在 <a href="https://github.com/ultralytics/ultralytics/issues/new/choose">GitHub</a> 上提交问题以获得支持，并加入我们的 <a href="https://discord.gg/2wNGbc6g9X">Discord</a> 社区进行问题和讨论！

 如需申请企业许可，请在 [Ultralytics Licensing](https://ultralytics.com/license) 处填写表格

@ -45,7 +45,7 @@
  <a href="https://www.instagram.com/ultralytics/" style="text-decoration:none;">
    <img src="https://github.com/ultralytics/assets/raw/main/social/logo-social-instagram.png" width="2%" alt="" /></a>
  <img src="https://github.com/ultralytics/assets/raw/main/social/logo-transparent.png" width="2%" alt="" />
-  <a href="https://discord.gg/7aegy5d8" style="text-decoration:none;">
+  <a href="https://discord.gg/2wNGbc6g9X" style="text-decoration:none;">
    <img src="https://github.com/ultralytics/assets/blob/main/social/logo-social-discord.png" width="2%" alt="" /></a>
 </div>
 </div>
@ -240,7 +240,7 @@ YOLOv8 提供两种不同的许可证：

 ## <div align="center">联系方式</div>

-对于 YOLOv8 的错误报告和功能请求，请访问 [GitHub Issues](https://github.com/ultralytics/ultralytics/issues)，并加入我们的 [Discord](https://discord.gg/7aegy5d8) 社区进行问题和讨论！
+对于 YOLOv8 的错误报告和功能请求，请访问 [GitHub Issues](https://github.com/ultralytics/ultralytics/issues)，并加入我们的 [Discord](https://discord.gg/2wNGbc6g9X) 社区进行问题和讨论！

 <br>
 <div align="center">
@ -261,6 +261,6 @@ YOLOv8 提供两种不同的许可证：
  <img src="https://github.com/ultralytics/assets/raw/main/social/logo-transparent.png" width="3%" alt="" />
  <a href="https://www.instagram.com/ultralytics/" style="text-decoration:none;">
    <img src="https://github.com/ultralytics/assets/raw/main/social/logo-social-instagram.png" width="3%" alt="" /></a>
-  <a href="https://discord.gg/7aegy5d8" style="text-decoration:none;">
+  <a href="https://discord.gg/2wNGbc6g9X" style="text-decoration:none;">
    <img src="https://github.com/ultralytics/assets/blob/main/social/logo-social-discord.png" width="3%" alt="" /></a>
 </div>
--- a/docs/datasets/segment/index.md
+++ b/docs/datasets/segment/index.md
@ -35,7 +35,10 @@ Here is an example of the YOLO dataset format for a single image with two object
 1 0.5046 0.0 0.5015 0.004 0.4984 0.00416 0.4937 0.010 0.492 0.0104
 ```

-Note: The length of each row does not have to be equal.
+!!! tip "Tip"
+
+      - The length of each row does not have to be equal.
+      - Each segmentation label must have a **minimum of 3 xy points**: `<class-index> <x1> <y1> <x2> <y2> <x3> <y3>`

 ### Dataset YAML format

--- a/docs/help/minimum_reproducible_example.md
+++ b/docs/help/minimum_reproducible_example.md
@ -6,7 +6,7 @@ keywords: Ultralytics, YOLO, bug report, minimum reproducible example, MRE, isol

 # Creating a Minimum Reproducible Example for Bug Reports in Ultralytics YOLO Repositories

-When submitting a bug report for Ultralytics YOLO repositories, it's essential to provide a [minimum reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). An MRE is a small, self-contained piece of code that demonstrates the problem you're experiencing. Providing an MRE helps maintainers and contributors understand the issue and work on a fix more efficiently. This guide explains how to create an MRE when submitting bug reports to Ultralytics YOLO repositories.
+When submitting a bug report for Ultralytics YOLO repositories, it's essential to provide a [minimum reproducible example](https://docs.ultralytics.com/help/minimum_reproducible_example/) (MRE). An MRE is a small, self-contained piece of code that demonstrates the problem you're experiencing. Providing an MRE helps maintainers and contributors understand the issue and work on a fix more efficiently. This guide explains how to create an MRE when submitting bug reports to Ultralytics YOLO repositories.

 ## 1. Isolate the Problem

--- a/docs/hub/index.md
+++ b/docs/hub/index.md
@ -29,7 +29,7 @@ easily upload their data and train new models quickly. It offers a range of pre-
 templates to choose from, making it easy for users to get started with training their own models. Once a model is
 trained, it can be easily deployed and used for real-time object detection, instance segmentation and classification tasks.

-We hope that the resources here will help you get the most out of HUB. Please browse the HUB <a href="https://docs.ultralytics.com/hub">Docs</a> for details, raise an issue on <a href="https://github.com/ultralytics/hub/issues/new/choose">GitHub</a> for support, and join our <a href="https://discord.gg/7aegy5d8">Discord</a> community for questions and discussions!
+We hope that the resources here will help you get the most out of HUB. Please browse the HUB <a href="https://docs.ultralytics.com/hub">Docs</a> for details, raise an issue on <a href="https://github.com/ultralytics/hub/issues/new/choose">GitHub</a> for support, and join our <a href="https://discord.gg/2wNGbc6g9X">Discord</a> community for questions and discussions!

 - [**Quickstart**](./quickstart.md). Start training and deploying YOLO models with HUB in seconds.
 - [**Datasets: Preparing and Uploading**](./datasets.md). Learn how to prepare and upload your datasets to HUB in YOLO format.
--- a/docs/models/fast-sam.md
+++ b/docs/models/fast-sam.md
@ -32,9 +32,74 @@ FastSAM is designed to address the limitations of the [Segment Anything Model (S

 ## Usage

-FastSAM is not yet available within the [`ultralytics` package](../quickstart.md), but it is available directly from the [https://github.com/CASIA-IVA-Lab/FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM) repository. Here is a brief overview of the typical steps you might take to use FastSAM:
+### Python API

-### Installation
+The FastSAM models are easy to integrate into your Python applications. Ultralytics provides a user-friendly Python API to streamline the process.
+
+#### Predict Usage
+
+To perform object detection on an image, use the `predict` method as shown below:
+
+```python
+from ultralytics import FastSAM
+from ultralytics.yolo.fastsam import FastSAMPrompt
+
+# Define image path and inference device
+IMAGE_PATH = 'ultralytics/assets/bus.jpg'
+DEVICE = 'cpu'
+
+# Create a FastSAM model
+model = FastSAM('FastSAM-s.pt')  # or FastSAM-x.pt
+
+# Run inference on an image
+everything_results = model(IMAGE_PATH,
+                           device=DEVICE,
+                           retina_masks=True,
+                           imgsz=1024,
+                           conf=0.4,
+                           iou=0.9)
+
+prompt_process = FastSAMPrompt(IMAGE_PATH, everything_results, device=DEVICE)
+
+# Everything prompt
+ann = prompt_process.everything_prompt()
+
+# Bbox default shape [0,0,0,0] -> [x1,y1,x2,y2]
+ann = prompt_process.box_prompt(bbox=[200, 200, 300, 300])
+
+# Text prompt
+ann = prompt_process.text_prompt(text='a photo of a dog')
+
+# Point prompt
+# points default [[0,0]] [[x1,y1],[x2,y2]]
+# point_label default [0] [1,0] 0:background, 1:foreground
+ann = prompt_process.point_prompt(points=[[200, 200]], pointlabel=[1])
+prompt_process.plot(annotations=ann, output='./')
+```
+
+This snippet demonstrates the simplicity of loading a pre-trained model and running a prediction on an image.
+
+#### Val Usage
+
+Validation of the model on a dataset can be done as follows:
+
+```python
+from ultralytics import FastSAM
+
+# Create a FastSAM model
+model = FastSAM('FastSAM-s.pt')  # or FastSAM-x.pt
+
+# Validate the model
+results = model.val(data='coco8-seg.yaml')
+```
+
+Please note that FastSAM only supports detection and segmentation of a single class of object. This means it will recognize and segment all objects as the same class. Therefore, when preparing the dataset, you need to convert all object category IDs to 0.
+
+### FastSAM official Usage
+
+FastSAM is also available directly from the [https://github.com/CASIA-IVA-Lab/FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM) repository. Here is a brief overview of the typical steps you might take to use FastSAM:
+
+#### Installation

 1. Clone the FastSAM repository:
   ```shell
@ -58,7 +123,7 @@ FastSAM is not yet available within the [`ultralytics` package](../quickstart.md
   pip install git+https://github.com/openai/CLIP.git
   ```

-### Example Usage
+#### Example Usage

 1. Download a [model checkpoint](https://drive.google.com/file/d/1m1sjY4ihXBU1fZXdQ-Xdj-mDltW-2Rqv/view?usp=sharing).

@ -101,4 +166,4 @@ We would like to acknowledge the FastSAM authors for their significant contribut
 }
 ```

-The original FastSAM paper can be found on [arXiv](https://arxiv.org/abs/2306.12156). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/CASIA-IVA-Lab/FastSAM). We appreciate their efforts in advancing the field and making their work accessible to the broader community.
+The original FastSAM paper can be found on [arXiv](https://arxiv.org/abs/2306.12156). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/CASIA-IVA-Lab/FastSAM). We appreciate their efforts in advancing the field and making their work accessible to the broader community.
--- a/docs/modes/export.md
+++ b/docs/modes/export.md
@ -1,7 +1,7 @@
 ---
 comments: true
 description: 'Export mode: Create a deployment-ready YOLOv8 model by converting it to various formats. Export to ONNX or OpenVINO for up to 3x CPU speedup.'
-keywords: ultralytics docs, YOLOv8, export YOLOv8, YOLOv8 model deployment, exporting YOLOv8, ONNX, OpenVINO, TensorRT, CoreML, TF SavedModel, PaddlePaddle, TorchScript, ONNX format, OpenVINO format, TensorRT format, CoreML format, TF SavedModel format, PaddlePaddle format
+keywords: ultralytics docs, YOLOv8, export YOLOv8, YOLOv8 model deployment, exporting YOLOv8, ONNX, OpenVINO, TensorRT, CoreML, TF SavedModel, PaddlePaddle, TorchScript, ONNX format, OpenVINO format, TensorRT format, CoreML format, TF SavedModel format, PaddlePaddle format, Tencent NCNN, NCNN
 ---

 <img width="1024" src="https://github.com/ultralytics/assets/raw/main/yolov8/banner-integrations.png">
@ -84,4 +84,5 @@ i.e. `format='onnx'` or `format='engine'`.
 | [TF Lite](https://www.tensorflow.org/lite)                         | `tflite`          | `yolov8n.tflite`          | ✅        | `imgsz`, `half`, `int8`                             |
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n_web_model/`      | ✅        | `imgsz`                                             |
-| [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n_paddle_model/`   | ✅        | `imgsz`                                             |
+| [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n_paddle_model/`   | ✅        | `imgsz`                                             |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |
--- a/docs/modes/val.md
+++ b/docs/modes/val.md
@ -6,9 +6,7 @@ keywords: Ultralytics, YOLO, YOLOv8, Val, Validation, Hyperparameters, Performan

 <img width="1024" src="https://github.com/ultralytics/assets/raw/main/yolov8/banner-integrations.png">

-**Val mode** is used for validating a YOLOv8 model after it has been trained. In this mode, the model is evaluated on a
-validation set to measure its accuracy and generalization performance. This mode can be used to tune the hyperparameters
-of the model to improve its performance.
+**Val mode** is used for validating a YOLOv8 model after it has been trained. In this mode, the model is evaluated on a validation set to measure its accuracy and generalization performance. This mode can be used to tune the hyperparameters of the model to improve its performance.

 !!! tip "Tip"

@ -16,8 +14,7 @@ of the model to improve its performance.

 ## Usage Examples

-Validate trained YOLOv8n model accuracy on the COCO128 dataset. No argument need to passed as the `model` retains it's
-training `data` and arguments as model attributes. See Arguments section below for a full list of export arguments.
+Validate trained YOLOv8n model accuracy on the COCO128 dataset. No argument need to passed as the `model` retains it's training `data` and arguments as model attributes. See Arguments section below for a full list of export arguments.

 !!! example ""

@ -46,13 +43,7 @@ training `data` and arguments as model attributes. See Arguments section below f

 ## Arguments

-Validation settings for YOLO models refer to the various hyperparameters and configurations used to
-evaluate the model's performance on a validation dataset. These settings can affect the model's performance, speed, and
-accuracy. Some common YOLO validation settings include the batch size, the frequency with which validation is performed
-during training, and the metrics used to evaluate the model's performance. Other factors that may affect the validation
-process include the size and composition of the validation dataset and the specific task the model is being used for. It
-is important to carefully tune and experiment with these settings to ensure that the model is performing well on the
-validation dataset and to detect and prevent overfitting.
+Validation settings for YOLO models refer to the various hyperparameters and configurations used to evaluate the model's performance on a validation dataset. These settings can affect the model's performance, speed, and accuracy. Some common YOLO validation settings include the batch size, the frequency with which validation is performed during training, and the metrics used to evaluate the model's performance. Other factors that may affect the validation process include the size and composition of the validation dataset and the specific task the model is being used for. It is important to carefully tune and experiment with these settings to ensure that the model is performing well on the validation dataset and to detect and prevent overfitting.

 | Key           | Value   | Description                                                        |
 |---------------|---------|--------------------------------------------------------------------|
@ -70,23 +61,4 @@ validation dataset and to detect and prevent overfitting.
 | `plots`       | `False` | show plots during training                                         |
 | `rect`        | `False` | rectangular val with each batch collated for minimum padding       |
 | `split`       | `val`   | dataset split to use for validation, i.e. 'val', 'test' or 'train' |
-
-## Export Formats
-
-Available YOLOv8 export formats are in the table below. You can export to any format using the `format` argument,
-i.e. `format='onnx'` or `format='engine'`.
-
-| Format                                                             | `format` Argument | Model                     | Metadata | Arguments                                           |
-|--------------------------------------------------------------------|-------------------|---------------------------|----------|-----------------------------------------------------|
-| [PyTorch](https://pytorch.org/)                                    | -                 | `yolov8n.pt`              | ✅        | -                                                   |
-| [TorchScript](https://pytorch.org/docs/stable/jit.html)            | `torchscript`     | `yolov8n.torchscript`     | ✅        | `imgsz`, `optimize`                                 |
-| [ONNX](https://onnx.ai/)                                           | `onnx`            | `yolov8n.onnx`            | ✅        | `imgsz`, `half`, `dynamic`, `simplify`, `opset`     |
-| [OpenVINO](https://docs.openvino.ai/latest/index.html)             | `openvino`        | `yolov8n_openvino_model/` | ✅        | `imgsz`, `half`                                     |
-| [TensorRT](https://developer.nvidia.com/tensorrt)                  | `engine`          | `yolov8n.engine`          | ✅        | `imgsz`, `half`, `dynamic`, `simplify`, `workspace` |
-| [CoreML](https://github.com/apple/coremltools)                     | `coreml`          | `yolov8n.mlmodel`         | ✅        | `imgsz`, `half`, `int8`, `nms`                      |
-| [TF SavedModel](https://www.tensorflow.org/guide/saved_model)      | `saved_model`     | `yolov8n_saved_model/`    | ✅        | `imgsz`, `keras`                                    |
-| [TF GraphDef](https://www.tensorflow.org/api_docs/python/tf/Graph) | `pb`              | `yolov8n.pb`              | ❌        | `imgsz`                                             |
-| [TF Lite](https://www.tensorflow.org/lite)                         | `tflite`          | `yolov8n.tflite`          | ✅        | `imgsz`, `half`, `int8`                             |
-| [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n_edgetpu.tflite`  | ✅        | `imgsz`                                             |
-| [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n_web_model/`      | ✅        | `imgsz`                                             |
-| [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n_paddle_model/`   | ✅        | `imgsz`                                             |
+|
--- a/docs/tasks/classify.md
+++ b/docs/tasks/classify.md
@ -176,5 +176,6 @@ i.e. `yolo predict model=yolov8n-cls.onnx`. Usage examples are shown for your mo
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n-cls_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n-cls_web_model/`      | ✅        | `imgsz`                                             |
 | [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n-cls_paddle_model/`   | ✅        | `imgsz`                                             |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-cls_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |

 See full `export` details in the [Export](https://docs.ultralytics.com/modes/export/) page.
--- a/docs/tasks/detect.md
+++ b/docs/tasks/detect.md
@ -167,5 +167,6 @@ Available YOLOv8 export formats are in the table below. You can predict or valid
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n_web_model/`      | ✅        | `imgsz`                                             |
 | [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n_paddle_model/`   | ✅        | `imgsz`                                             |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |

 See full `export` details in the [Export](https://docs.ultralytics.com/modes/export/) page.
--- a/docs/tasks/pose.md
+++ b/docs/tasks/pose.md
@ -181,5 +181,6 @@ i.e. `yolo predict model=yolov8n-pose.onnx`. Usage examples are shown for your m
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n-pose_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n-pose_web_model/`      | ✅        | `imgsz`                                             |
 | [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n-pose_paddle_model/`   | ✅        | `imgsz`                                             |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-pose_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |

 See full `export` details in the [Export](https://docs.ultralytics.com/modes/export/) page.
--- a/docs/tasks/segment.md
+++ b/docs/tasks/segment.md
@ -181,5 +181,6 @@ i.e. `yolo predict model=yolov8n-seg.onnx`. Usage examples are shown for your mo
 | [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n-seg_edgetpu.tflite`  | ✅        | `imgsz`                                             |
 | [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n-seg_web_model/`      | ✅        | `imgsz`                                             |
 | [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n-seg_paddle_model/`   | ✅        | `imgsz`                                             |
+| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-seg_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |

 See full `export` details in the [Export](https://docs.ultralytics.com/modes/export/) page.
--- a/docs/usage/hyperparameter_tuning.md
+++ b/docs/usage/hyperparameter_tuning.md
@ -1,29 +1,26 @@
 ---
 comments: true
-description: Discover how to integrate hyperparameter tuning with Ray Tune and Ultralytics YOLOv8. Speed up the tuning process and optimize your model's performance.
+description: Learn to integrate hyperparameter tuning using Ray Tune with Ultralytics YOLOv8, and optimize your model's performance efficiently.
 keywords: yolov8, ray tune, hyperparameter tuning, hyperparameter optimization, machine learning, computer vision, deep learning, image recognition
 ---

-# Hyperparameter Tuning with Ray Tune and YOLOv8
+# Efficient Hyperparameter Tuning with Ray Tune and YOLOv8

-Hyperparameter tuning (or hyperparameter optimization) is the process of determining the right combination of hyperparameters that maximizes model performance. It works by running multiple trials in a single training process, evaluating the performance of each trial, and selecting the best hyperparameter values based on the evaluation results.
+Hyperparameter tuning is vital in achieving peak model performance by discovering the optimal set of hyperparameters. This involves running trials with different hyperparameters and evaluating each trial’s performance.

-## Ultralytics YOLOv8 and Ray Tune Integration
+## Accelerate Tuning with Ultralytics YOLOv8 and Ray Tune

-[Ultralytics](https://ultralytics.com) YOLOv8 integrates hyperparameter tuning with Ray Tune, allowing you to easily optimize your YOLOv8 model's hyperparameters. By using Ray Tune, you can leverage advanced search algorithms, parallelism, and early stopping to speed up the tuning process and achieve better model performance.
+[Ultralytics YOLOv8](https://ultralytics.com) incorporates Ray Tune for hyperparameter tuning, streamlining the optimization of YOLOv8 model hyperparameters. With Ray Tune, you can utilize advanced search strategies, parallelism, and early stopping to expedite the tuning process.

 ### Ray Tune

-<div align="center">
-<a href="https://docs.ray.io/en/latest/tune/index.html" target="_blank">
-<img width="480" src="https://docs.ray.io/en/latest/_images/tune_overview.png"></a>
-</div>
+![Ray Tune Overview](https://docs.ray.io/en/latest/_images/tune_overview.png)

-[Ray Tune](https://docs.ray.io/en/latest/tune/index.html) is a powerful and flexible hyperparameter tuning library for machine learning models. It provides an efficient way to optimize hyperparameters by supporting various search algorithms, parallelism, and early stopping strategies. Ray Tune's flexible architecture enables seamless integration with popular machine learning frameworks, including Ultralytics YOLOv8.
+[Ray Tune](https://docs.ray.io/en/latest/tune/index.html) is a hyperparameter tuning library designed for efficiency and flexibility. It supports various search strategies, parallelism, and early stopping strategies, and seamlessly integrates with popular machine learning frameworks, including Ultralytics YOLOv8.

-### Weights & Biases
+### Integration with Weights & Biases

-YOLOv8 also supports optional integration with [Weights & Biases](https://wandb.ai/site) (wandb) for tracking the tuning progress.
+YOLOv8 also allows optional integration with [Weights & Biases](https://wandb.ai/site) for monitoring the tuning process.

 ## Installation

@ -32,8 +29,11 @@ To install the required packages, run:
 !!! tip "Installation"

    ```bash
-    pip install -U ultralytics "ray[tune]"  # install and/or update
-    pip install wandb  # optional
+    # Install and update Ultralytics and Ray Tune pacakges
+    pip install -U ultralytics 'ray[tune]'
+
+    # Optionally install W&B for logging
+    pip install wandb
    ```

 ## Usage
@ -44,21 +44,21 @@ To install the required packages, run:
    from ultralytics import YOLO

    model = YOLO("yolov8n.pt")
-    results = model.tune(data="coco128.yaml")
+    result_grid = model.tune(data="coco128.yaml")
    ```

 ## `tune()` Method Parameters

 The `tune()` method in YOLOv8 provides an easy-to-use interface for hyperparameter tuning with Ray Tune. It accepts several arguments that allow you to customize the tuning process. Below is a detailed explanation of each parameter:

-| Parameter       | Type           | Description                                                                                                                                                                                                                                                                                   | Default Value |
-|-----------------|----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
-| `data`          | str            | The dataset configuration file (in YAML format) to run the tuner on. This file should specify the training and validation data paths, as well as other dataset-specific settings.                                                                                                             |               |
-| `space`         | dict, optional | A dictionary defining the hyperparameter search space for Ray Tune. Each key corresponds to a hyperparameter name, and the value specifies the range of values to explore during tuning. If not provided, YOLOv8 uses a default search space with various hyperparameters.                    |               |
-| `grace_period`  | int, optional  | The grace period in epochs for the [ASHA scheduler]https://docs.ray.io/en/latest/tune/api/schedulers.html) in Ray Tune. The scheduler will not terminate any trial before this number of epochs, allowing the model to have some minimum training before making a decision on early stopping. | 10            |
-| `gpu_per_trial` | int, optional  | The number of GPUs to allocate per trial during tuning. This helps manage GPU usage, particularly in multi-GPU environments. If not provided, the tuner will use all available GPUs.                                                                                                          | None          |
-| `max_samples`   | int, optional  | The maximum number of trials to run during tuning. This parameter helps control the total number of hyperparameter combinations tested, ensuring the tuning process does not run indefinitely.                                                                                                | 10            |
-| `train_args`    | dict, optional | A dictionary of additional arguments to pass to the `train()` method during tuning. These arguments can include settings like the number of training epochs, batch size, and other training-specific configurations.                                                                          | {}            |
+| Parameter       | Type           | Description                                                                                                                                                                                                                                                                                    | Default Value |
+|-----------------|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
+| `data`          | str            | The dataset configuration file (in YAML format) to run the tuner on. This file should specify the training and validation data paths, as well as other dataset-specific settings.                                                                                                              |               |
+| `space`         | dict, optional | A dictionary defining the hyperparameter search space for Ray Tune. Each key corresponds to a hyperparameter name, and the value specifies the range of values to explore during tuning. If not provided, YOLOv8 uses a default search space with various hyperparameters.                     |               |
+| `grace_period`  | int, optional  | The grace period in epochs for the [ASHA scheduler](https://docs.ray.io/en/latest/tune/api/schedulers.html) in Ray Tune. The scheduler will not terminate any trial before this number of epochs, allowing the model to have some minimum training before making a decision on early stopping. | 10            |
+| `gpu_per_trial` | int, optional  | The number of GPUs to allocate per trial during tuning. This helps manage GPU usage, particularly in multi-GPU environments. If not provided, the tuner will use all available GPUs.                                                                                                           | None          |
+| `max_samples`   | int, optional  | The maximum number of trials to run during tuning. This parameter helps control the total number of hyperparameter combinations tested, ensuring the tuning process does not run indefinitely.                                                                                                 | 10            |
+| `**train_args`  | dict, optional | Additional arguments to pass to the `train()` method during tuning. These arguments can include settings like the number of training epochs, batch size, and other training-specific configurations.                                                                                           | {}            |

 By customizing these parameters, you can fine-tune the hyperparameter optimization process to suit your specific needs and available computational resources.

@ -98,14 +98,72 @@ In this example, we demonstrate how to use a custom search space for hyperparame

    ```python
    from ultralytics import YOLO
-    from ray import tune
-    
+
+    # Define a YOLO model    
    model = YOLO("yolov8n.pt")
-    result = model.tune(
-        data="coco128.yaml",
-        space={"lr0": tune.uniform(1e-5, 1e-1)},
-        train_args={"epochs": 50}
-    )
+
+    # Run Ray Tune on the model
+    result_grid = model.tune(data="coco128.yaml",
+                             space={"lr0": tune.uniform(1e-5, 1e-1)},
+                             epochs=50)
    ```

-In the code snippet above, we create a YOLO model with the "yolov8n.pt" pretrained weights. Then, we call the `tune()` method, specifying the dataset configuration with "coco128.yaml". We provide a custom search space for the initial learning rate `lr0` using a dictionary with the key "lr0" and the value `tune.uniform(1e-5, 1e-1)`. Finally, we pass additional training arguments, such as the number of epochs, using the `train_args` parameter.
+In the code snippet above, we create a YOLO model with the "yolov8n.pt" pretrained weights. Then, we call the `tune()` method, specifying the dataset configuration with "coco128.yaml". We provide a custom search space for the initial learning rate `lr0` using a dictionary with the key "lr0" and the value `tune.uniform(1e-5, 1e-1)`. Finally, we pass additional training arguments, such as the number of epochs directly to the tune method as `epochs=50`.
+
+# Processing Ray Tune Results
+
+After running a hyperparameter tuning experiment with Ray Tune, you might want to perform various analyses on the obtained results. This guide will take you through common workflows for processing and analyzing these results.
+
+## Loading Tune Experiment Results from a Directory
+
+After running the tuning experiment with `tuner.fit()`, you can load the results from a directory. This is useful, especially if you're performing the analysis after the initial training script has exited.
+
+```python
+experiment_path = f"{storage_path}/{exp_name}"
+print(f"Loading results from {experiment_path}...")
+
+restored_tuner = tune.Tuner.restore(experiment_path, trainable=train_mnist)
+result_grid = restored_tuner.get_results()
+```
+
+## Basic Experiment-Level Analysis
+
+Get an overview of how trials performed. You can quickly check if there were any errors during the trials.
+
+```python
+if result_grid.errors:
+    print("One or more trials failed!")
+else:
+    print("No errors!")
+```
+
+## Basic Trial-Level Analysis
+
+Access individual trial hyperparameter configurations and the last reported metrics.
+
+```python
+for i, result in enumerate(result_grid):
+    print(f"Trial #{i}: Configuration: {result.config}, Last Reported Metrics: {result.metrics}")
+```
+
+## Plotting the Entire History of Reported Metrics for a Trial
+
+You can plot the history of reported metrics for each trial to see how the metrics evolved over time.
+
+```python
+import matplotlib.pyplot as plt
+
+for result in result_grid:
+    plt.plot(result.metrics_dataframe["training_iteration"], result.metrics_dataframe["mean_accuracy"], label=f"Trial {i}")
+
+plt.xlabel('Training Iterations')
+plt.ylabel('Mean Accuracy')
+plt.legend()
+plt.show()
+```
+
+## Summary
+
+In this documentation, we covered common workflows to analyze the results of experiments run with Ray Tune using Ultralytics. The key steps include loading the experiment results from a directory, performing basic experiment-level and trial-level analysis and plotting metrics.
+
+Explore further by looking into Ray Tune’s [Analyze Results](https://docs.ray.io/en/latest/tune/examples/tune_analyze_results.html) docs page to get the most out of your hyperparameter tuning experiments.
--- a/docs/yolov5/index.md
+++ b/docs/yolov5/index.md
@ -85,6 +85,6 @@ This badge signifies that all [YOLOv5 GitHub Actions](https://github.com/ultraly
  <a href="https://www.instagram.com/ultralytics/" style="text-decoration:none;">
    <img src="https://github.com/ultralytics/assets/raw/main/social/logo-social-instagram.png" width="3%" alt="" /></a>
  <img src="https://github.com/ultralytics/assets/raw/main/social/logo-transparent.png" width="3%" alt="" />
-  <a href="https://discord.gg/7aegy5d8" style="text-decoration:none;">
+  <a href="https://discord.gg/2wNGbc6g9X" style="text-decoration:none;">
    <img src="https://github.com/ultralytics/assets/blob/main/social/logo-social-discord.png" width="3%" alt="" /></a>
 </div>
--- a/examples/README.md
+++ b/examples/README.md
@ -8,10 +8,12 @@ This repository features a collection of real-world applications and walkthrough
 | -------------------------------------------------------------------------------------------------------------- | ------------------ | --------------------------------------------------- |
 | [YOLO ONNX Detection Inference with C++](./YOLOv8-CPP-Inference)                                               | C++/ONNX           | [Justas Bartnykas](https://github.com/JustasBart)   |
 | [YOLO OpenCV ONNX Detection Python](./YOLOv8-OpenCV-ONNX-Python)                                               | OpenCV/Python/ONNX | [Farid Inawan](https://github.com/frdteknikelektro) |
-| [YOLO .NET ONNX ImageSharp](https://github.com/dme-compunet/YOLOv8)                                            | C#/ONNX/ImageSharp | [Compuet](https://github.com/dme-compunet)          |
+| [YOLOv8 .NET ONNX ImageSharp](https://github.com/dme-compunet/YOLOv8)                                          | C#/ONNX/ImageSharp | [Compunet](https://github.com/dme-compunet)         |
 | [YOLO .Net ONNX Detection C#](https://www.nuget.org/packages/Yolov8.Net)                                       | C# .Net            | [Samuel Stainback](https://github.com/sstainba)     |
 | [YOLOv8 on NVIDIA Jetson(TensorRT and DeepStream)](https://wiki.seeedstudio.com/YOLOv8-DeepStream-TRT-Jetson/) | Python             | [Lakshantha](https://github.com/lakshanthad)        |
 | [YOLOv8 ONNXRuntime Python](./YOLOv8-ONNXRuntime)                                                              | Python/ONNXRuntime | [Semih Demirel](https://github.com/semihhdemirel)   |
+| [YOLOv8-ONNXRuntime-CPP](./YOLOv8-ONNXRuntime-CPP)                                                             | C++/ONNXRuntime    | [DennisJcy](https://github.com/DennisJcy)           |
+| [RTDETR ONNXRuntime C#](https://github.com/Kayzwer/yolo-cs/blob/master/RTDETR.cs)                              | C#/ONNX            | [Kayzwer](https://github.com/Kayzwer)               |

 ### How to Contribute

--- a/examples/YOLOv8-ONNXRuntime-CPP/README.md
+++ b/examples/YOLOv8-ONNXRuntime-CPP/README.md
@ -0,0 +1,54 @@
+# YOLOv8 OnnxRuntime C++
+
+This example demonstrates how to perform inference using YOLOv8 in C++ with ONNX Runtime and OpenCV's API.
+
+We recommend using Visual Studio to build the project.
+
+## Benefits
+
+- Friendly for deployment in the industrial sector.
+- Faster than OpenCV's DNN inference on both CPU and GPU.
+- Supports CUDA acceleration.
+- Easy to add FP16 inference (using template functions).
+
+## Exporting YOLOv8 Models
+
+To export YOLOv8 models, use the following Python script:
+
+```python
+from ultralytics import YOLO
+
+# Load a YOLOv8 model
+model = YOLO("yolov8n.pt")
+
+# Export the model
+model.export(format="onnx", opset=12, simplify=True, dynamic=False, imgsz=640)
+```
+
+## Dependencies
+
+| Dependency              | Version  |
+| ----------------------- | -------- |
+| Onnxruntime-win-x64-gpu | >=1.14.1 |
+| OpenCV                  | >=4.0.0  |
+| C++                     | >=17     |
+
+Note: The dependency on C++17 is due to the usage of the C++17 filesystem feature.
+
+## Usage
+
+```c++
+// CPU inference
+DCSP_INIT_PARAM params{ model_path, YOLO_ORIGIN_V8, {imgsz_w, imgsz_h}, class_num, 0.1, 0.5, false};
+// GPU inference
+DCSP_INIT_PARAM params{ model_path, YOLO_ORIGIN_V8, {imgsz_w, imgsz_h}, class_num, 0.1, 0.5, true};
+
+// Load your image
+cv::Mat img = cv::imread(img_path);
+
+char* ret = p1->CreateSession(params);
+
+ret = p->RunSession(img, res);
+```
+
+This repository should also work for YOLOv5, which needs a permute operator for the output of the YOLOv5 model, but this has not been implemented yet.
--- a/examples/YOLOv8-ONNXRuntime-CPP/inference.cpp
+++ b/examples/YOLOv8-ONNXRuntime-CPP/inference.cpp
@ -0,0 +1,271 @@
+#include "inference.h"
+#include <regex>
+
+#define benchmark
+#define ELOG
+
+DCSP_CORE::DCSP_CORE()
+{
+
+}
+
+
+DCSP_CORE::~DCSP_CORE()
+{
+	delete session;
+}
+
+
+template<typename T>
+char* BlobFromImage(cv::Mat& iImg, T& iBlob)
+{
+	int channels = iImg.channels();
+	int imgHeight = iImg.rows;
+	int imgWidth = iImg.cols;
+
+	for (int c = 0; c < channels; c++)
+	{
+		for (int h = 0; h < imgHeight; h++)
+		{
+			for (int w = 0; w < imgWidth; w++)
+			{
+				iBlob[c * imgWidth * imgHeight + h * imgWidth + w] = (std::remove_pointer<T>::type)((iImg.at<cv::Vec3b>(h, w)[c]) / 255.0f);
+			}
+		}
+	}
+	return RET_OK;
+}
+
+
+char* PostProcess(cv::Mat& iImg, std::vector<int> iImgSize, cv::Mat& oImg)
+{
+	cv::Mat img = iImg.clone();
+	cv::resize(iImg, oImg, cv::Size(iImgSize.at(0), iImgSize.at(1)));
+	if (img.channels() == 1)
+	{
+		cv::cvtColor(oImg, oImg, cv::COLOR_GRAY2BGR);
+	}
+	cv::cvtColor(oImg, oImg, cv::COLOR_BGR2RGB);
+	return RET_OK;
+}
+
+
+char* DCSP_CORE::CreateSession(DCSP_INIT_PARAM &iParams)
+{
+	char* Ret = RET_OK;
+	std::regex pattern("[\u4e00-\u9fa5]");
+	bool result = std::regex_search(iParams.ModelPath, pattern);
+	if (result)
+	{
+		Ret = "[DCSP_ONNX]:model path error.change your model path without chinese characters.";
+		std::cout << Ret << std::endl;
+		return Ret;
+	}
+	try
+	{
+		rectConfidenceThreshold = iParams.RectConfidenceThreshold;
+		iouThreshold = iParams.iouThreshold;
+		imgSize = iParams.imgSize;
+		modelType = iParams.ModelType;
+		env = Ort::Env(ORT_LOGGING_LEVEL_WARNING, "Yolo");
+		Ort::SessionOptions sessionOption;
+		if (iParams.CudaEnable)
+		{
+			cudaEnable = iParams.CudaEnable;
+			OrtCUDAProviderOptions cudaOption;
+			cudaOption.device_id = 0;
+			sessionOption.AppendExecutionProvider_CUDA(cudaOption);
+			//OrtOpenVINOProviderOptions ovOption;
+			//sessionOption.AppendExecutionProvider_OpenVINO(ovOption);
+		}
+		sessionOption.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
+		sessionOption.SetIntraOpNumThreads(iParams.IntraOpNumThreads);
+		sessionOption.SetLogSeverityLevel(iParams.LogSeverityLevel);
+		int ModelPathSize = MultiByteToWideChar(CP_UTF8, 0, iParams.ModelPath.c_str(), static_cast<int>(iParams.ModelPath.length()), nullptr, 0);
+		wchar_t* wide_cstr = new wchar_t[ModelPathSize + 1];
+		MultiByteToWideChar(CP_UTF8, 0, iParams.ModelPath.c_str(), static_cast<int>(iParams.ModelPath.length()), wide_cstr, ModelPathSize);
+		wide_cstr[ModelPathSize] = L'\0';
+		const wchar_t* modelPath = wide_cstr;
+		session = new Ort::Session(env, modelPath, sessionOption);
+		Ort::AllocatorWithDefaultOptions allocator;
+		size_t inputNodesNum = session->GetInputCount();
+		for (size_t i = 0; i < inputNodesNum; i++)
+		{
+			Ort::AllocatedStringPtr input_node_name = session->GetInputNameAllocated(i, allocator);
+			char* temp_buf = new char[50];
+			strcpy(temp_buf, input_node_name.get());
+			inputNodeNames.push_back(temp_buf);
+		}
+
+		size_t OutputNodesNum = session->GetOutputCount();
+		for (size_t i = 0; i < OutputNodesNum; i++)
+		{
+			Ort::AllocatedStringPtr output_node_name = session->GetOutputNameAllocated(i, allocator);
+			char* temp_buf = new char[10];
+			strcpy(temp_buf, output_node_name.get());
+			outputNodeNames.push_back(temp_buf);
+		}
+		options = Ort::RunOptions{ nullptr };
+		WarmUpSession();
+		//std::cout << OrtGetApiBase()->GetVersionString() << std::endl;;
+		Ret = RET_OK;
+		return Ret;
+	}
+	catch (const std::exception& e)
+	{
+		const char* str1 = "[DCSP_ONNX]:";
+		const char* str2 = e.what();
+		std::string result = std::string(str1) + std::string(str2);
+		char* merged = new char[result.length() + 1];
+		std::strcpy(merged, result.c_str());
+		std::cout << merged << std::endl;
+		delete[] merged;
+		//return merged;
+		return "[DCSP_ONNX]:Create session failed.";
+	}
+
+}
+
+
+char* DCSP_CORE::RunSession(cv::Mat &iImg, std::vector<DCSP_RESULT>& oResult)
+{
+#ifdef benchmark
+	clock_t starttime_1 = clock();
+#endif // benchmark
+
+	char* Ret = RET_OK;
+	cv::Mat processedImg;
+	PostProcess(iImg, imgSize, processedImg);
+	if (modelType < 4)
+	{
+		float* blob = new float[processedImg.total() * 3];
+		BlobFromImage(processedImg, blob);
+		std::vector<int64_t> inputNodeDims = { 1,3,imgSize.at(0),imgSize.at(1) };
+		TensorProcess(starttime_1, iImg, blob, inputNodeDims, oResult);
+	}
+
+	return Ret;
+}
+
+
+template<typename N>
+char* DCSP_CORE::TensorProcess(clock_t& starttime_1, cv::Mat& iImg, N& blob, std::vector<int64_t>& inputNodeDims,  std::vector<DCSP_RESULT>& oResult)
+{
+	Ort::Value inputTensor = Ort::Value::CreateTensor<std::remove_pointer<N>::type>(Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU), blob, 3 * imgSize.at(0) * imgSize.at(1), inputNodeDims.data(), inputNodeDims.size());
+#ifdef benchmark
+	clock_t starttime_2 = clock();
+#endif // benchmark
+	auto outputTensor = session->Run(options, inputNodeNames.data(), &inputTensor, 1, outputNodeNames.data(), outputNodeNames.size());
+#ifdef benchmark
+	clock_t starttime_3 = clock();
+#endif // benchmark
+	Ort::TypeInfo typeInfo = outputTensor.front().GetTypeInfo();
+	auto tensor_info = typeInfo.GetTensorTypeAndShapeInfo();
+	std::vector<int64_t>outputNodeDims = tensor_info.GetShape();
+	std::remove_pointer<N>::type* output = outputTensor.front().GetTensorMutableData<std::remove_pointer<N>::type>();
+	delete blob;
+	switch (modelType)
+	{
+	case 1:
+	{
+		int strideNum = outputNodeDims[2];
+		int signalResultNum = outputNodeDims[1];
+		std::vector<int> class_ids;
+		std::vector<float> confidences;
+		std::vector<cv::Rect> boxes;
+		cv::Mat rowData(signalResultNum, strideNum, CV_32F, output);
+		rowData = rowData.t();
+
+		float* data = (float*)rowData.data;
+
+		float x_factor = iImg.cols / 640.;
+		float y_factor = iImg.rows / 640.;
+		for (int i = 0; i < strideNum; ++i)
+		{
+			float* classesScores = data + 4;
+			cv::Mat scores(1, classesNum, CV_32FC1, classesScores);
+			cv::Point class_id;
+			double maxClassScore;
+			cv::minMaxLoc(scores, 0, &maxClassScore, 0, &class_id);
+			if (maxClassScore > rectConfidenceThreshold)
+			{
+				confidences.push_back(maxClassScore);
+				class_ids.push_back(class_id.x);
+
+				float x = data[0];
+				float y = data[1];
+				float w = data[2];
+				float h = data[3];
+
+				int left = int((x - 0.5 * w) * x_factor);
+				int top = int((y - 0.5 * h) * y_factor);
+
+				int width = int(w * x_factor);
+				int height = int(h * y_factor);
+
+				boxes.push_back(cv::Rect(left, top, width, height));
+			}
+			data += signalResultNum;
+		}
+
+		std::vector<int> nmsResult;
+		cv::dnn::NMSBoxes(boxes, confidences, rectConfidenceThreshold, iouThreshold, nmsResult);
+		for (int i = 0; i < nmsResult.size(); ++i)
+		{
+			int idx = nmsResult[i];
+			DCSP_RESULT result;
+			result.classId = class_ids[idx];
+			result.confidence = confidences[idx];
+			result.box = boxes[idx];
+			oResult.push_back(result);
+		}
+
+
+#ifdef benchmark
+		clock_t starttime_4 = clock();
+		double pre_process_time = (double)(starttime_2 - starttime_1) / CLOCKS_PER_SEC * 1000;
+		double process_time = (double)(starttime_3 - starttime_2) / CLOCKS_PER_SEC * 1000;
+		double post_process_time = (double)(starttime_4 - starttime_3) / CLOCKS_PER_SEC * 1000;
+		if (cudaEnable)
+		{
+			std::cout << "[DCSP_ONNX(CUDA)]: " << pre_process_time << "ms pre-process, " << process_time << "ms inference, " << post_process_time << "ms post-process." << std::endl;
+		}
+		else
+		{
+			std::cout << "[DCSP_ONNX(CPU)]: " << pre_process_time << "ms pre-process, " << process_time << "ms inference, " << post_process_time << "ms post-process." << std::endl;
+		}
+#endif // benchmark
+
+		break;
+	}
+	}
+	char* Ret = RET_OK;
+	return Ret;
+}
+
+
+char* DCSP_CORE::WarmUpSession()
+{
+	clock_t starttime_1 = clock();
+	char* Ret = RET_OK;
+	cv::Mat iImg = cv::Mat(cv::Size(imgSize.at(0), imgSize.at(1)), CV_8UC3);
+	cv::Mat processedImg;
+	PostProcess(iImg, imgSize, processedImg);
+	if (modelType < 4)
+	{
+		float* blob = new float[iImg.total() * 3];
+		BlobFromImage(processedImg, blob);
+		std::vector<int64_t> YOLO_input_node_dims = { 1,3,imgSize.at(0),imgSize.at(1) };
+		Ort::Value input_tensor = Ort::Value::CreateTensor<float>(Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU), blob, 3 * imgSize.at(0) * imgSize.at(1), YOLO_input_node_dims.data(), YOLO_input_node_dims.size());
+		auto output_tensors = session->Run(options, inputNodeNames.data(), &input_tensor, 1, outputNodeNames.data(), outputNodeNames.size());
+		delete[] blob;
+		clock_t starttime_4 = clock();
+		double post_process_time = (double)(starttime_4 - starttime_1) / CLOCKS_PER_SEC * 1000;
+		if (cudaEnable)
+		{
+			std::cout << "[DCSP_ONNX(CUDA)]: " << "Cuda warm-up cost " << post_process_time << " ms. " << std::endl;
+		}
+	}
+
+	return Ret;
+}
--- a/examples/YOLOv8-ONNXRuntime-CPP/inference.h
+++ b/examples/YOLOv8-ONNXRuntime-CPP/inference.h
@ -0,0 +1,83 @@
+#pragma once
+
+#define _CRT_SECURE_NO_WARNINGS
+#define	RET_OK nullptr
+
+#include <string>
+#include <vector>
+#include <stdio.h>
+#include "io.h"
+#include "direct.h"
+#include "opencv.hpp"
+#include <Windows.h>
+#include "onnxruntime_cxx_api.h"
+
+
+enum MODEL_TYPE
+{
+	//FLOAT32 MODEL
+	YOLO_ORIGIN_V5 = 0,
+	YOLO_ORIGIN_V8 = 1,//only support v8 detector currently
+	YOLO_POSE_V8 = 2,
+	YOLO_CLS_V8 = 3
+};
+
+
+typedef struct _DCSP_INIT_PARAM
+{
+	std::string								ModelPath;
+	MODEL_TYPE								ModelType = YOLO_ORIGIN_V8;
+	std::vector<int>						imgSize={640, 640};
+
+	int										classesNum=80;
+	float									RectConfidenceThreshold = 0.6;
+	float									iouThreshold = 0.5;
+	bool									CudaEnable = false;
+	int										LogSeverityLevel = 3;
+	int										IntraOpNumThreads = 1;
+}DCSP_INIT_PARAM;
+
+
+typedef struct _DCSP_RESULT
+{
+	int classId;
+	float confidence;
+	cv::Rect box;
+}DCSP_RESULT;
+
+
+class DCSP_CORE
+{
+public:
+	DCSP_CORE();
+	~DCSP_CORE();
+
+public:
+	char* CreateSession(DCSP_INIT_PARAM &iParams);
+
+
+	char* RunSession(cv::Mat &iImg, std::vector<DCSP_RESULT>& oResult);
+
+
+	char* WarmUpSession();
+
+
+	template<typename N>
+	char* TensorProcess(clock_t& starttime_1, cv::Mat& iImg, N& blob, std::vector<int64_t>& inputNodeDims, std::vector<DCSP_RESULT>& oResult);
+
+
+private:
+	Ort::Env				env;
+	Ort::Session*			session;
+	bool					cudaEnable;
+	Ort::RunOptions			options;
+	std::vector<const char*> inputNodeNames;
+	std::vector<const char*> outputNodeNames;
+
+
+    int						classesNum;
+	MODEL_TYPE				modelType;
+	std::vector<int>		imgSize;
+	float					rectConfidenceThreshold;
+	float					iouThreshold;
+};
--- a/examples/YOLOv8-ONNXRuntime-CPP/main.cpp
+++ b/examples/YOLOv8-ONNXRuntime-CPP/main.cpp
@ -0,0 +1,44 @@
+#include <iostream>
+#include <stdio.h>
+#include "inference.h"
+#include <filesystem>
+
+
+
+void file_iterator(DCSP_CORE*& p)
+{
+	std::filesystem::path img_path = R"(E:\project\Project_C++\DCPS_ONNX\TEST_ORIGIN)";
+	int k = 0;
+	for (auto& i : std::filesystem::directory_iterator(img_path))
+	{
+		if (i.path().extension() == ".jpg")
+		{
+			std::string img_path = i.path().string();
+			//std::cout << img_path << std::endl;
+			cv::Mat img = cv::imread(img_path);
+			std::vector<DCSP_RESULT> res;
+			char* ret = p->RunSession(img, res);
+			for (int i = 0; i < res.size(); i++)
+			{
+				cv::rectangle(img, res.at(i).box, cv::Scalar(125, 123, 0), 3);
+			}
+
+			k++;
+			cv::imshow("TEST_ORIGIN", img);
+			cv::waitKey(0);
+			cv::destroyAllWindows();
+			//cv::imwrite("E:\\output\\" + std::to_string(k) + ".png", img);
+		}
+	}
+}
+
+
+
+int main()
+{
+	DCSP_CORE* p1 = new DCSP_CORE;
+	std::string model_path = "yolov8n.onnx";
+	DCSP_INIT_PARAM params{ model_path, YOLO_ORIGIN_V8, {640, 640}, 80, 0.1, 0.5, false };
+	char* ret = p1->CreateSession(params);
+	file_iterator(p1);
+}
--- a/examples/hub.ipynb
+++ b/examples/hub.ipynb
@ -33,7 +33,7 @@
        "\n",
        "Welcome to the [Ultralytics](https://ultralytics.com/) HUB notebook! \n",
        "\n",
-        "This notebook allows you to train [YOLOv5](https://github.com/ultralytics/yolov5) and [YOLOv8](https://github.com/ultralytics/ultralytics) 🚀 models using [HUB](https://hub.ultralytics.com/). Please browse the YOLOv8 <a href=\"https://docs.ultralytics.com\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics/issues/new/choose\">GitHub</a> for support, and join our <a href=\"https://discord.gg/7aegy5d8\">Discord</a> community for questions and discussions!\n",
+        "This notebook allows you to train [YOLOv5](https://github.com/ultralytics/yolov5) and [YOLOv8](https://github.com/ultralytics/ultralytics) 🚀 models using [HUB](https://hub.ultralytics.com/). Please browse the YOLOv8 <a href=\"https://docs.ultralytics.com\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics/issues/new/choose\">GitHub</a> for support, and join our <a href=\"https://discord.gg/2wNGbc6g9X\">Discord</a> community for questions and discussions!\n",
        "</div>"
      ]
    },
--- a/examples/tutorial.ipynb
+++ b/examples/tutorial.ipynb
@ -36,7 +36,7 @@
        "\n",
        "YOLOv8 models are fast, accurate, and easy to use, making them ideal for various object detection and image segmentation tasks. They can be trained on large datasets and run on diverse hardware platforms, from CPUs to GPUs.\n",
        "\n",
-        "We hope that the resources in this notebook will help you get the most out of YOLOv8. Please browse the YOLOv8 <a href=\"https://docs.ultralytics.com/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics\">GitHub</a> for support, and join our <a href=\"https://discord.gg/7aegy5d8\">Discord</a> community for questions and discussions!\n",
+        "We hope that the resources in this notebook will help you get the most out of YOLOv8. Please browse the YOLOv8 <a href=\"https://docs.ultralytics.com/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics\">GitHub</a> for support, and join our <a href=\"https://discord.gg/2wNGbc6g9X\">Discord</a> community for questions and discussions!\n",
        "\n",
        "</div>"
      ]
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -55,7 +55,7 @@ theme:
    - content.tabs.link  # all code tabs change simultaneously

 # Customization
-copyright: <a href="https://ultralytics.com" target="_blank">Ultralytics 2023.</a> All rights reserved.
+copyright: <a href="https://ultralytics.com" target="_blank">© 2023 Ultralytics Inc.</a> All rights reserved.
 extra:
  # version:
  #   provider: mike  #  version drop-down menu
@ -91,7 +91,7 @@ extra:
    - icon: fontawesome/brands/python
      link: https://pypi.org/project/ultralytics/
    - icon: fontawesome/brands/discord
-      link: https://discord.gg/7aegy5d8
+      link: https://discord.gg/2wNGbc6g9X

 extra_css:
  - stylesheets/style.css
--- a/requirements.txt
+++ b/requirements.txt
@ -14,7 +14,7 @@ tqdm>=4.64.0

 # Logging -------------------------------------
 # tensorboard>=2.13.0
-# dvclive>=2.11.0
+# dvclive>=2.12.0
 # clearml
 # comet

--- a/ultralytics/init.py
+++ b/ultralytics/init.py
@ -1,13 +1,14 @@
 # Ultralytics YOLO 🚀, AGPL-3.0 license

-__version__ = '8.0.123'
+__version__ = '8.0.131'

 from ultralytics.hub import start
 from ultralytics.vit.rtdetr import RTDETR
 from ultralytics.vit.sam import SAM
 from ultralytics.yolo.engine.model import YOLO
+from ultralytics.yolo.fastsam import FastSAM
 from ultralytics.yolo.nas import NAS
 from ultralytics.yolo.utils.checks import check_yolo as checks
 from ultralytics.yolo.utils.downloads import download

-__all__ = '__version__', 'YOLO', 'NAS', 'SAM', 'RTDETR', 'checks', 'start', 'download'  # allow simpler import
+__all__ = '__version__', 'YOLO', 'NAS', 'SAM', 'FastSAM', 'RTDETR', 'checks', 'download', 'start'  # allow simpler import
--- a/ultralytics/hub/utils.py
+++ b/ultralytics/hub/utils.py
@ -78,10 +78,13 @@ def requests_with_progress(method, url, **kwargs):
        return requests.request(method, url, **kwargs)
    response = requests.request(method, url, stream=True, **kwargs)
    total = int(response.headers.get('content-length', 0))  # total size
-    pbar = tqdm(total=total, unit='B', unit_scale=True, unit_divisor=1024, bar_format=TQDM_BAR_FORMAT)
-    for data in response.iter_content(chunk_size=1024):
-        pbar.update(len(data))
-    pbar.close()
+    try:
+        pbar = tqdm(total=total, unit='B', unit_scale=True, unit_divisor=1024, bar_format=TQDM_BAR_FORMAT)
+        for data in response.iter_content(chunk_size=1024):
+            pbar.update(len(data))
+        pbar.close()
+    except requests.exceptions.ChunkedEncodingError:  # avoid 'Connection broken: IncompleteRead' warnings
+        response.close()
    return response


--- a/ultralytics/nn/autobackend.py
+++ b/ultralytics/nn/autobackend.py
@ -79,7 +79,8 @@ class AutoBackend(nn.Module):
        super().__init__()
        w = str(weights[0] if isinstance(weights, list) else weights)
        nn_module = isinstance(weights, torch.nn.Module)
-        pt, jit, onnx, xml, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs, paddle, triton = self._model_type(w)
+        pt, jit, onnx, xml, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs, paddle, ncnn, triton = \
+            self._model_type(w)
        fp16 &= pt or jit or onnx or engine or nn_module or triton  # FP16
        nhwc = coreml or saved_model or pb or tflite or edgetpu  # BHWC formats (vs torch BCWH)
        stride = 32  # default stride
@ -237,7 +238,7 @@ class AutoBackend(nn.Module):
                    meta_file = model.namelist()[0]
                    metadata = ast.literal_eval(model.read(meta_file).decode('utf-8'))
        elif tfjs:  # TF.js
-            raise NotImplementedError('YOLOv8 TF.js inference is not supported')
+            raise NotImplementedError('YOLOv8 TF.js inference is not currently supported.')
        elif paddle:  # PaddlePaddle
            LOGGER.info(f'Loading {w} for PaddlePaddle inference...')
            check_requirements('paddlepaddle-gpu' if cuda else 'paddlepaddle')
@ -252,6 +253,8 @@ class AutoBackend(nn.Module):
            input_handle = predictor.get_input_handle(predictor.get_input_names()[0])
            output_names = predictor.get_output_names()
            metadata = w.parents[1] / 'metadata.yaml'
+        elif ncnn:  # PaddlePaddle
+            raise NotImplementedError('YOLOv8 NCNN inference is not currently supported.')
        elif triton:  # NVIDIA Triton Inference Server
            LOGGER.info('Triton Inference Server not supported...')
            '''
@ -340,7 +343,7 @@ class AutoBackend(nn.Module):
        elif self.coreml:  # CoreML
            im = im[0].cpu().numpy()
            im_pil = Image.fromarray((im * 255).astype('uint8'))
-            # im = im.resize((192, 320), Image.ANTIALIAS)
+            # im = im.resize((192, 320), Image.BILINEAR)
            y = self.model.predict({'image': im_pil})  # coordinates are xywh normalized
            if 'confidence' in y:
                box = xywh2xyxy(y['coordinates'] * [[w, h, w, h]])  # xyxy pixels
--- a/ultralytics/nn/modules/head.py
+++ b/ultralytics/nn/modules/head.py
@ -34,7 +34,7 @@ class Detect(nn.Module):
        self.reg_max = 16  # DFL channels (ch[0] // 16 to scale 4/8/12/16/20 for n/s/m/l/x)
        self.no = nc + self.reg_max * 4  # number of outputs per anchor
        self.stride = torch.zeros(self.nl)  # strides computed during build
-        c2, c3 = max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], self.nc)  # channels
+        c2, c3 = max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], min(self.nc, 100))  # channels
        self.cv2 = nn.ModuleList(
            nn.Sequential(Conv(x, c2, 3), Conv(c2, c2, 3), nn.Conv2d(c2, 4 * self.reg_max, 1)) for x in ch)
        self.cv3 = nn.ModuleList(nn.Sequential(Conv(x, c3, 3), Conv(c3, c3, 3), nn.Conv2d(c3, self.nc, 1)) for x in ch)
--- a/ultralytics/nn/tasks.py
+++ b/ultralytics/nn/tasks.py
@ -684,7 +684,7 @@ def yaml_model_load(path):
    if path.stem in (f'yolov{d}{x}6' for x in 'nsmlx' for d in (5, 8)):
        new_stem = re.sub(r'(\d+)([nslmx])6(.+)?$', r'\1\2-p6\3', path.stem)
        LOGGER.warning(f'WARNING ⚠️ Ultralytics YOLO P6 models now use -p6 suffix. Renaming {path.stem} to {new_stem}.')
-        path = path.with_stem(new_stem)
+        path = path.with_name(new_stem + path.suffix)

    unified_path = re.sub(r'(\d+)([nslmx])(.+)?$', r'\1\3', str(path))  # i.e. yolov8x.yaml -> yolov8.yaml
    yaml_file = check_yaml(unified_path, hard=False) or check_yaml(path)
--- a/ultralytics/tracker/utils/matching.py
+++ b/ultralytics/tracker/utils/matching.py
@ -13,8 +13,7 @@ try:
 except (ImportError, AssertionError, AttributeError):
    from ultralytics.yolo.utils.checks import check_requirements

-    check_requirements('cython')  # required before installing lap from source
-    check_requirements('git+https://github.com/gatagat/lap.git')  # more reliable than 'pip install lap'
+    check_requirements('lapx>=0.5.2')  # update to lap package from https://github.com/rathaROG/lapx
    import lap


--- a/ultralytics/vit/rtdetr/val.py
+++ b/ultralytics/vit/rtdetr/val.py
@ -91,6 +91,7 @@ class RTDETRValidator(DetectionValidator):
        """Apply Non-maximum suppression to prediction outputs."""
        bs, _, nd = preds[0].shape
        bboxes, scores = preds[0].split((4, nd - 4), dim=-1)
+        bboxes *= self.args.imgsz
        outputs = [torch.zeros((0, 6), device=bboxes.device)] * bs
        for i, bbox in enumerate(bboxes):  # (300, 4)
            bbox = ops.xywh2xyxy(bbox)
@ -126,8 +127,8 @@ class RTDETRValidator(DetectionValidator):
            if self.args.single_cls:
                pred[:, 5] = 0
            predn = pred.clone()
-            predn[..., [0, 2]] *= shape[1]  # native-space pred
-            predn[..., [1, 3]] *= shape[0]  # native-space pred
+            predn[..., [0, 2]] *= shape[1] / self.args.imgsz  # native-space pred
+            predn[..., [1, 3]] *= shape[0] / self.args.imgsz  # native-space pred

            # Evaluate
            if nl:
--- a/ultralytics/vit/utils/loss.py
+++ b/ultralytics/vit/utils/loss.py
@ -284,11 +284,11 @@ class RTDETRDetectionLoss(DETRLoss):
        idx_groups = torch.as_tensor([0, *gt_groups[:-1]]).cumsum_(0)
        for i, num_gt in enumerate(gt_groups):
            if num_gt > 0:
-                gt_idx = torch.arange(end=num_gt, dtype=torch.int32) + idx_groups[i]
+                gt_idx = torch.arange(end=num_gt, dtype=torch.long) + idx_groups[i]
                gt_idx = gt_idx.repeat(dn_num_group)
                assert len(dn_pos_idx[i]) == len(gt_idx), 'Expected the same length, '
                f'but got {len(dn_pos_idx[i])} and {len(gt_idx)} respectively.'
                dn_match_indices.append((dn_pos_idx[i], gt_idx))
            else:
-                dn_match_indices.append((torch.zeros([0], dtype=torch.int32), torch.zeros([0], dtype=torch.int32)))
+                dn_match_indices.append((torch.zeros([0], dtype=torch.long), torch.zeros([0], dtype=torch.long)))
        return dn_match_indices
--- a/ultralytics/vit/utils/ops.py
+++ b/ultralytics/vit/utils/ops.py
@ -71,7 +71,7 @@ class HungarianMatcher(nn.Module):
        bs, nq, nc = pred_scores.shape

        if sum(gt_groups) == 0:
-            return [(torch.tensor([], dtype=torch.int32), torch.tensor([], dtype=torch.int32)) for _ in range(bs)]
+            return [(torch.tensor([], dtype=torch.long), torch.tensor([], dtype=torch.long)) for _ in range(bs)]

        # We flatten to compute the cost matrices in a batch
        # [batch_size * num_queries, num_classes]
@ -107,7 +107,7 @@ class HungarianMatcher(nn.Module):
        indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(gt_groups, -1))]
        gt_groups = torch.as_tensor([0, *gt_groups[:-1]]).cumsum_(0)
        # (idx for queries, idx for gt)
-        return [(torch.tensor(i, dtype=torch.int32), torch.tensor(j, dtype=torch.int32) + gt_groups[k])
+        return [(torch.tensor(i, dtype=torch.long), torch.tensor(j, dtype=torch.long) + gt_groups[k])
                for k, (i, j) in enumerate(indices)]

    def _cost_mask(self, bs, num_gts, masks=None, gt_mask=None):
--- a/ultralytics/yolo/cfg/init.py
+++ b/ultralytics/yolo/cfg/init.py
@ -16,16 +16,18 @@ from ultralytics.yolo.utils import (DEFAULT_CFG, DEFAULT_CFG_DICT, DEFAULT_CFG_P
 # Define valid tasks and modes
 MODES = 'train', 'val', 'predict', 'export', 'track', 'benchmark'
 TASKS = 'detect', 'segment', 'classify', 'pose'
-TASK2DATA = {
-    'detect': 'coco128.yaml',
-    'segment': 'coco128-seg.yaml',
-    'classify': 'imagenet100',
-    'pose': 'coco8-pose.yaml'}
+TASK2DATA = {'detect': 'coco8.yaml', 'segment': 'coco8-seg.yaml', 'classify': 'imagenet100', 'pose': 'coco8-pose.yaml'}
 TASK2MODEL = {
    'detect': 'yolov8n.pt',
    'segment': 'yolov8n-seg.pt',
    'classify': 'yolov8n-cls.pt',
    'pose': 'yolov8n-pose.pt'}
+TASK2METRIC = {
+    'detect': 'metrics/mAP50-95(B)',
+    'segment': 'metrics/mAP50-95(M)',
+    'classify': 'metrics/accuracy_top1',
+    'pose': 'metrics/mAP50-95(P)'}
+

 CLI_HELP_MSG = \
    f"""
--- a/ultralytics/yolo/cfg/default.yaml
+++ b/ultralytics/yolo/cfg/default.yaml
@ -10,7 +10,7 @@ data:  # (str, optional) path to data file, i.e. coco128.yaml
 epochs: 100  # (int) number of epochs to train for
 patience: 50  # (int) epochs to wait for no observable improvement for early stopping of training
 batch: 16  # (int) number of images per batch (-1 for AutoBatch)
-imgsz: 640  # (int) size of input images as integer or w,h
+imgsz: 640  # (int | list) input images size as int for train and val modes, or list[w,h] for predict and export modes
 save: True  # (bool) save train checkpoints and predict results
 save_period: -1 # (int) Save checkpoint every x epochs (disabled if < 1)
 cache: False  # (bool) True/ram, disk or False. Use cache for data loading
@ -27,7 +27,7 @@ deterministic: True  # (bool) whether to enable deterministic mode
 single_cls: False  # (bool) train multi-class data as single-class
 rect: False  # (bool) rectangular training if mode='train' or rectangular validation if mode='val'
 cos_lr: False  # (bool) use cosine learning rate scheduler
-close_mosaic: 0  # (int) disable mosaic augmentation for final epochs
+close_mosaic: 10  # (int) disable mosaic augmentation for final epochs
 resume: False  # (bool) resume training from last checkpoint
 amp: True  # (bool) Automatic Mixed Precision (AMP) training, choices=[True, False], True runs AMP check
 fraction: 1.0  # (float) dataset fraction to train on (default is 1.0, all images in train set)
--- a/ultralytics/yolo/data/augment.py
+++ b/ultralytics/yolo/data/augment.py
@ -427,7 +427,7 @@ class RandomPerspective:
        """
        if self.pre_transform and 'mosaic_border' not in labels:
            labels = self.pre_transform(labels)
-            labels.pop('ratio_pad')  # do not need ratio pad
+        labels.pop('ratio_pad', None)  # do not need ratio pad

        img = labels['img']
        cls = labels['cls']
@ -772,10 +772,10 @@ def v8_transforms(dataset, imgsz, hyp, stretch=False):
            perspective=hyp.perspective,
            pre_transform=None if stretch else LetterBox(new_shape=(imgsz, imgsz)),
        )])
-    flip_idx = dataset.data.get('flip_idx', None)  # for keypoints augmentation
+    flip_idx = dataset.data.get('flip_idx', [])  # for keypoints augmentation
    if dataset.use_keypoints:
        kpt_shape = dataset.data.get('kpt_shape', None)
-        if flip_idx is None and hyp.fliplr > 0.0:
+        if len(flip_idx) == 0 and hyp.fliplr > 0.0:
            hyp.fliplr = 0.0
            LOGGER.warning("WARNING ⚠️ No 'flip_idx' array defined in data.yaml, setting augmentation 'fliplr=0.0'")
        elif flip_idx and (len(flip_idx) != kpt_shape[0]):
--- a/ultralytics/yolo/data/converter.py
+++ b/ultralytics/yolo/data/converter.py
@ -55,7 +55,7 @@ def convert_coco(labels_dir='../coco/annotations/', use_segments=False, use_keyp
            data = json.load(f)

        # Create image dict
-        images = {'%g' % x['id']: x for x in data['images']}
+        images = {f'{x["id"]:d}': x for x in data['images']}
        # Create image-annotations dict
        imgToAnns = defaultdict(list)
        for ann in data['annotations']:
@ -63,7 +63,7 @@ def convert_coco(labels_dir='../coco/annotations/', use_segments=False, use_keyp

        # Write labels file
        for img_id, anns in tqdm(imgToAnns.items(), desc=f'Annotations {json_file}'):
-            img = images['%g' % img_id]
+            img = images[f'{img_id:d}']
            h, w, f = img['height'], img['width'], img['file_name']

            bboxes = []
--- a/ultralytics/yolo/data/dataloaders/stream_loaders.py
+++ b/ultralytics/yolo/data/dataloaders/stream_loaders.py
@ -39,7 +39,7 @@ class LoadStreams:
        sources = Path(sources).read_text().rsplit() if os.path.isfile(sources) else [sources]
        n = len(sources)
        self.sources = [ops.clean_str(x) for x in sources]  # clean source names for later
-        self.imgs, self.fps, self.frames, self.threads = [None] * n, [0] * n, [0] * n, [None] * n
+        self.imgs, self.fps, self.frames, self.threads, self.shape = [[]] * n, [0] * n, [0] * n, [None] * n, [None] * n
        for i, s in enumerate(sources):  # index, source
            # Start thread to read frames from video stream
            st = f'{i + 1}/{n}: {s}... '
@ -59,9 +59,11 @@ class LoadStreams:
            self.frames[i] = max(int(cap.get(cv2.CAP_PROP_FRAME_COUNT)), 0) or float('inf')  # infinite stream fallback
            self.fps[i] = max((fps if math.isfinite(fps) else 0) % 100, 0) or 30  # 30 FPS fallback

-            success, self.imgs[i] = cap.read()  # guarantee first frame
-            if not success or self.imgs[i] is None:
+            success, im = cap.read()  # guarantee first frame
+            if not success or im is None:
                raise ConnectionError(f'{st}Failed to read images from {s}')
+            self.imgs[i].append(im)
+            self.shape[i] = im.shape
            self.threads[i] = Thread(target=self.update, args=([i, cap, s]), daemon=True)
            LOGGER.info(f'{st}Success ✅ ({self.frames[i]} frames of shape {w}x{h} at {self.fps[i]:.2f} FPS)')
            self.threads[i].start()
@ -74,17 +76,20 @@ class LoadStreams:
        """Read stream `i` frames in daemon thread."""
        n, f = 0, self.frames[i]  # frame number, frame array
        while cap.isOpened() and n < f:
-            n += 1
-            cap.grab()  # .read() = .grab() followed by .retrieve()
-            if n % self.vid_stride == 0:
-                success, im = cap.retrieve()
-                if success:
-                    self.imgs[i] = im
-                else:
-                    LOGGER.warning('WARNING ⚠️ Video stream unresponsive, please check your IP camera connection.')
-                    self.imgs[i] = np.zeros_like(self.imgs[i])
-                    cap.open(stream)  # re-open stream if signal was lost
-            time.sleep(0.0)  # wait time
+            # Only read a new frame if the buffer is empty
+            if not self.imgs[i]:
+                n += 1
+                cap.grab()  # .read() = .grab() followed by .retrieve()
+                if n % self.vid_stride == 0:
+                    success, im = cap.retrieve()
+                    if success:
+                        self.imgs[i].append(im)  # add image to buffer
+                    else:
+                        LOGGER.warning('WARNING ⚠️ Video stream unresponsive, please check your IP camera connection.')
+                        self.imgs[i].append(np.zeros(self.shape[i]))
+                        cap.open(stream)  # re-open stream if signal was lost
+            else:
+                time.sleep(0.01)  # wait until the buffer is empty

    def __iter__(self):
        """Iterates through YOLO image feed and re-opens unresponsive streams."""
@ -92,14 +97,18 @@ class LoadStreams:
        return self

    def __next__(self):
-        """Returns source paths, transformed and original images for processing YOLOv5."""
+        """Returns source paths, transformed and original images for processing."""
        self.count += 1
-        if not all(x.is_alive() for x in self.threads) or cv2.waitKey(1) == ord('q'):  # q to quit
-            cv2.destroyAllWindows()
-            raise StopIteration

-        im0 = self.imgs.copy()
-        return self.sources, im0, None, ''
+        # Wait until a frame is available in each buffer
+        while not all(self.imgs):
+            if not all(x.is_alive() for x in self.threads) or cv2.waitKey(1) == ord('q'):  # q to quit
+                cv2.destroyAllWindows()
+                raise StopIteration
+            time.sleep(1 / min(self.fps))
+
+        # Get and remove the next frame from imgs buffer
+        return self.sources, [x.pop(0) for x in self.imgs], None, ''

    def __len__(self):
        """Return the length of the sources object."""
--- a/ultralytics/yolo/engine/exporter.py
+++ b/ultralytics/yolo/engine/exporter.py
@ -16,6 +16,7 @@ TensorFlow Lite         | `tflite`                  | yolov8n.tflite
 TensorFlow Edge TPU     | `edgetpu`                 | yolov8n_edgetpu.tflite
 TensorFlow.js           | `tfjs`                    | yolov8n_web_model/
 PaddlePaddle            | `paddle`                  | yolov8n_paddle_model/
+NCNN                    | `ncnn`                    | yolov8n_ncnn_model/

 Requirements:
    $ pip install ultralytics[export]
@ -49,7 +50,7 @@ TensorFlow.js:
 """
 import json
 import os
-import platform
+import shutil
 import subprocess
 import time
 import warnings
@ -59,18 +60,17 @@ from pathlib import Path
 import torch

 from ultralytics.nn.autobackend import check_class_names
-from ultralytics.nn.modules import C2f, Detect, Segment
+from ultralytics.nn.modules import C2f, Detect, RTDETRDecoder
 from ultralytics.nn.tasks import DetectionModel, SegmentationModel
 from ultralytics.yolo.cfg import get_cfg
-from ultralytics.yolo.utils import (DEFAULT_CFG, LINUX, LOGGER, MACOS, __version__, callbacks, colorstr,
+from ultralytics.yolo.utils import (ARM64, DEFAULT_CFG, LINUX, LOGGER, MACOS, ROOT, __version__, callbacks, colorstr,
                                    get_default_args, yaml_save)
 from ultralytics.yolo.utils.checks import check_imgsz, check_requirements, check_version
+from ultralytics.yolo.utils.downloads import attempt_download_asset, get_github_assets
 from ultralytics.yolo.utils.files import file_size
 from ultralytics.yolo.utils.ops import Profile
 from ultralytics.yolo.utils.torch_utils import get_latest_opset, select_device, smart_inference_mode

-ARM64 = platform.machine() in ('arm64', 'aarch64')
-

 def export_formats():
    """YOLOv8 export formats."""
@ -87,7 +87,8 @@ def export_formats():
        ['TensorFlow Lite', 'tflite', '.tflite', True, False],
        ['TensorFlow Edge TPU', 'edgetpu', '_edgetpu.tflite', True, False],
        ['TensorFlow.js', 'tfjs', '_web_model', True, False],
-        ['PaddlePaddle', 'paddle', '_paddle_model', True, True], ]
+        ['PaddlePaddle', 'paddle', '_paddle_model', True, True],
+        ['NCNN', 'ncnn', '_ncnn_model', True, True], ]
    return pandas.DataFrame(x, columns=['Format', 'Argument', 'Suffix', 'CPU', 'GPU'])


@ -153,20 +154,21 @@ class Exporter:
        flags = [x == format for x in fmts]
        if sum(flags) != 1:
            raise ValueError(f"Invalid export format='{format}'. Valid formats are {fmts}")
-        jit, onnx, xml, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs, paddle = flags  # export booleans
+        jit, onnx, xml, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs, paddle, ncnn = flags  # export booleans

        # Load PyTorch model
        self.device = select_device('cpu' if self.args.device is None else self.args.device)
+
+        # Checks
+        model.names = check_class_names(model.names)
        if self.args.half and onnx and self.device.type == 'cpu':
            LOGGER.warning('WARNING ⚠️ half=True only compatible with GPU export, i.e. use device=0')
            self.args.half = False
            assert not self.args.dynamic, 'half=True not compatible with dynamic=True, i.e. use only one.'
-
-        # Checks
-        model.names = check_class_names(model.names)
        self.imgsz = check_imgsz(self.args.imgsz, stride=model.stride, min_dim=2)  # check image size
        if self.args.optimize:
-            assert self.device.type == 'cpu', '--optimize not compatible with cuda devices, i.e. use --device cpu'
+            assert not ncnn, "optimize=True not compatible with format='ncnn', i.e. use optimize=False"
+            assert self.device.type == 'cpu', "optimize=True not compatible with cuda devices, i.e. use device='cpu'"
        if edgetpu and not LINUX:
            raise SystemError('Edge TPU export only supported on Linux. See https://coral.ai/docs/edgetpu/compiler/')

@ -185,7 +187,7 @@ class Exporter:
        model.float()
        model = model.fuse()
        for k, m in model.named_modules():
-            if isinstance(m, (Detect, Segment)):
+            if isinstance(m, (Detect, RTDETRDecoder)):  # Segment and Pose use Detect base class
                m.dynamic = self.args.dynamic
                m.export = True
                m.format = self.args.format
@ -231,7 +233,7 @@ class Exporter:

        # Exports
        f = [''] * len(fmts)  # exported filenames
-        if jit:  # TorchScript
+        if jit or ncnn:  # TorchScript
            f[0], _ = self.export_torchscript()
        if engine:  # TensorRT required before ONNX
            f[1], _ = self.export_engine()
@ -254,6 +256,8 @@ class Exporter:
                f[9], _ = self.export_tfjs()
        if paddle:  # PaddlePaddle
            f[10], _ = self.export_paddle()
+        if ncnn:  # NCNN
+            f[11], _ = self.export_ncnn()

        # Finish
        f = [str(x) for x in f if x]  # filter out '' and None
@ -394,6 +398,57 @@ class Exporter:
        yaml_save(Path(f) / 'metadata.yaml', self.metadata)  # add metadata.yaml
        return f, None

+    @try_export
+    def export_ncnn(self, prefix=colorstr('NCNN:')):
+        """
+        YOLOv8 NCNN export using PNNX https://github.com/pnnx/pnnx.
+        """
+        check_requirements('git+https://github.com/Tencent/ncnn.git' if ARM64 else 'ncnn')  # requires NCNN
+        import ncnn  # noqa
+
+        LOGGER.info(f'\n{prefix} starting export with NCNN {ncnn.__version__}...')
+        f = Path(str(self.file).replace(self.file.suffix, f'_ncnn_model{os.sep}'))
+        f_ts = str(self.file.with_suffix('.torchscript'))
+
+        if Path('./pnnx').is_file():
+            pnnx = './pnnx'
+        elif (ROOT / 'pnnx').is_file():
+            pnnx = ROOT / 'pnnx'
+        else:
+            LOGGER.warning(
+                f'{prefix} WARNING ⚠️ PNNX not found. Attempting to download binary file from '
+                'https://github.com/pnnx/pnnx/.\nNote PNNX Binary file must be placed in current working directory '
+                f'or in {ROOT}. See PNNX repo for full installation instructions.')
+            _, assets = get_github_assets(repo='pnnx/pnnx')
+            asset = [x for x in assets if ('macos' if MACOS else 'ubuntu' if LINUX else 'windows') in x][0]
+            attempt_download_asset(asset, repo='pnnx/pnnx', release='latest')
+            unzip_dir = Path(asset).with_suffix('')
+            pnnx = ROOT / 'pnnx'  # new location
+            (unzip_dir / 'pnnx').rename(pnnx)  # move binary to ROOT
+            shutil.rmtree(unzip_dir)  # delete unzip dir
+            Path(asset).unlink()  # delete zip
+            pnnx.chmod(0o777)  # set read, write, and execute permissions for everyone
+
+        cmd = [
+            str(pnnx),
+            f_ts,
+            f'pnnxparam={f / "model.pnnx.param"}',
+            f'pnnxbin={f / "model.pnnx.bin"}',
+            f'pnnxpy={f / "model_pnnx.py"}',
+            f'pnnxonnx={f / "model.pnnx.onnx"}',
+            f'ncnnparam={f / "model.ncnn.param"}',
+            f'ncnnbin={f / "model.ncnn.bin"}',
+            f'ncnnpy={f / "model_ncnn.py"}',
+            f'fp16={int(self.args.half)}',
+            f'device={self.device.type}',
+            f'inputshape="{[self.args.batch, 3, *self.imgsz]}"', ]
+        f.mkdir(exist_ok=True)  # make ncnn_model directory
+        LOGGER.info(f"{prefix} running '{' '.join(cmd)}'")
+        subprocess.run(cmd, check=True)
+
+        yaml_save(f / 'metadata.yaml', self.metadata)  # add metadata.yaml
+        return str(f), None
+
    @try_export
    def export_coreml(self, prefix=colorstr('CoreML:')):
        """YOLOv8 CoreML export."""
@ -447,7 +502,7 @@ class Exporter:
                check_requirements('nvidia-tensorrt', cmds='-U --index-url https://pypi.ngc.nvidia.com')
            import tensorrt as trt  # noqa

-        check_version(trt.__version__, '7.0.0', hard=True)  # require tensorrt>=8.0.0
+        check_version(trt.__version__, '7.0.0', hard=True)  # require tensorrt>=7.0.0
        self.args.simplify = True
        f_onnx, _ = self.export_onnx()

@ -534,7 +589,7 @@ class Exporter:
        # Remove/rename TFLite models
        if self.args.int8:
            for file in f.rglob('*_dynamic_range_quant.tflite'):
-                file.rename(file.with_stem(file.stem.replace('_dynamic_range_quant', '_int8')))
+                file.rename(file.with_name(file.stem.replace('_dynamic_range_quant', '_int8') + file.suffix))
            for file in f.rglob('*_integer_quant_with_int16_act.tflite'):
                file.unlink()  # delete extra fp16 activation TFLite files

--- a/ultralytics/yolo/engine/model.py
+++ b/ultralytics/yolo/engine/model.py
@ -9,8 +9,8 @@ from ultralytics.nn.tasks import (ClassificationModel, DetectionModel, PoseModel
                                  attempt_load_one_weight, guess_model_task, nn, yaml_model_load)
 from ultralytics.yolo.cfg import get_cfg
 from ultralytics.yolo.engine.exporter import Exporter
-from ultralytics.yolo.utils import (DEFAULT_CFG, DEFAULT_CFG_DICT, DEFAULT_CFG_KEYS, LOGGER, NUM_THREADS, RANK, ROOT,
-                                    callbacks, is_git_dir, yaml_load)
+from ultralytics.yolo.utils import (DEFAULT_CFG, DEFAULT_CFG_DICT, DEFAULT_CFG_KEYS, LOGGER, RANK, ROOT, callbacks,
+                                    is_git_dir, yaml_load)
 from ultralytics.yolo.utils.checks import check_file, check_imgsz, check_pip_update_available, check_yaml
 from ultralytics.yolo.utils.downloads import GITHUB_ASSET_STEMS
 from ultralytics.yolo.utils.torch_utils import smart_inference_mode
@ -387,13 +387,7 @@ class YOLO:
        self._check_is_pytorch_model()
        self.model.to(device)

-    def tune(self,
-             data: str,
-             space: dict = None,
-             grace_period: int = 10,
-             gpu_per_trial: int = None,
-             max_samples: int = 10,
-             train_args: dict = None):
+    def tune(self, *args, **kwargs):
        """
        Runs hyperparameter tuning using Ray Tune.

@ -411,66 +405,9 @@ class YOLO:
        Raises:
            ModuleNotFoundError: If Ray Tune is not installed.
        """
-        if train_args is None:
-            train_args = {}
-
-        try:
-            from ultralytics.yolo.utils.tuner import (ASHAScheduler, RunConfig, WandbLoggerCallback, default_space,
-                                                      task_metric_map, tune)
-        except ImportError:
-            raise ModuleNotFoundError("Install Ray Tune: `pip install 'ray[tune]'`")
-
-        try:
-            import wandb
-            from wandb import __version__  # noqa
-        except ImportError:
-            wandb = False
-
-        def _tune(config):
-            """
-            Trains the YOLO model with the specified hyperparameters and additional arguments.
-
-            Args:
-                config (dict): A dictionary of hyperparameters to use for training.
-
-            Returns:
-                None.
-            """
-            self._reset_callbacks()
-            config.update(train_args)
-            self.train(**config)
-
-        if not space:
-            LOGGER.warning('WARNING: search space not provided. Using default search space')
-            space = default_space
-
-        space['data'] = data
-
-        # Define the trainable function with allocated resources
-        trainable_with_resources = tune.with_resources(_tune, {'cpu': NUM_THREADS, 'gpu': gpu_per_trial or 0})
-
-        # Define the ASHA scheduler for hyperparameter search
-        asha_scheduler = ASHAScheduler(time_attr='epoch',
-                                       metric=task_metric_map[self.task],
-                                       mode='max',
-                                       max_t=train_args.get('epochs') or 100,
-                                       grace_period=grace_period,
-                                       reduction_factor=3)
-
-        # Define the callbacks for the hyperparameter search
-        tuner_callbacks = [WandbLoggerCallback(project='YOLOv8-tune')] if wandb else []
-
-        # Create the Ray Tune hyperparameter search tuner
-        tuner = tune.Tuner(trainable_with_resources,
-                           param_space=space,
-                           tune_config=tune.TuneConfig(scheduler=asha_scheduler, num_samples=max_samples),
-                           run_config=RunConfig(callbacks=tuner_callbacks, local_dir='./runs'))
-
-        # Run the hyperparameter search
-        tuner.fit()
-
-        # Return the results of the hyperparameter search
-        return tuner.get_results()
+        self._check_is_pytorch_model()
+        from ultralytics.yolo.utils.tuner import run_ray_tune
+        return run_ray_tune(self, *args, **kwargs)

    @property
    def names(self):
--- a/ultralytics/yolo/engine/predictor.py
+++ b/ultralytics/yolo/engine/predictor.py
@ -230,10 +230,6 @@ class BasePredictor:
            self.model.warmup(imgsz=(1 if self.model.pt or self.model.triton else self.dataset.bs, 3, *self.imgsz))
            self.done_warmup = True

-        # Checks
-        if self.source_type.tensor and (self.args.save or self.args.save_txt or self.args.show):
-            LOGGER.warning("WARNING ⚠️ 'save', 'save_txt' and 'show' arguments not enabled for torch.Tensor inference.")
-
        self.seen, self.windows, self.batch, profilers = 0, [], None, (ops.Profile(), ops.Profile(), ops.Profile())
        self.run_callbacks('on_predict_start')
        for batch in self.dataset:
--- a/ultralytics/yolo/engine/results.py
+++ b/ultralytics/yolo/engine/results.py
@ -199,8 +199,7 @@ class Results(SimpleClass):
            (numpy.ndarray): A numpy array of the annotated image.
        """
        if img is None and isinstance(self.orig_img, torch.Tensor):
-            LOGGER.warning('WARNING ⚠️ Results plotting is not supported for torch.Tensor image types.')
-            return
+            img = np.ascontiguousarray(self.orig_img[0].permute(1, 2, 0).cpu().detach().numpy()) * 255

        # Deprecation warn TODO: remove in 8.2
        if 'show_conf' in kwargs:
@ -291,8 +290,8 @@ class Results(SimpleClass):
                    seg = masks[j].xyn[0].copy().reshape(-1)  # reversed mask.xyn, (n,2) to (n*2)
                    line = (c, *seg)
                if kpts is not None:
-                    kpt = kpts[j].xyn.reshape(-1).tolist()
-                    line += (*kpt, )
+                    kpt = torch.cat((kpts[j].xyn, kpts[j].conf[..., None]), 2) if kpts[j].has_visible else kpts[j].xyn
+                    line += (*kpt.reshape(-1).tolist(), )
                line += (conf, ) * save_conf + (() if id is None else (id, ))
                texts.append(('%g ' * len(line)).rstrip() % line)

--- a/ultralytics/yolo/fastsam/init.py
+++ b/ultralytics/yolo/fastsam/init.py
@ -0,0 +1,8 @@
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+
+from .model import FastSAM
+from .predict import FastSAMPredictor
+from .prompt import FastSAMPrompt
+from .val import FastSAMValidator
+
+__all__ = 'FastSAMPredictor', 'FastSAM', 'FastSAMPrompt', 'FastSAMValidator'
--- a/ultralytics/yolo/fastsam/model.py
+++ b/ultralytics/yolo/fastsam/model.py
@ -0,0 +1,111 @@
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+"""
+FastSAM model interface.
+
+Usage - Predict:
+    from ultralytics import FastSAM
+
+    model = FastSAM('last.pt')
+    results = model.predict('ultralytics/assets/bus.jpg')
+"""
+
+from ultralytics.yolo.cfg import get_cfg
+from ultralytics.yolo.engine.exporter import Exporter
+from ultralytics.yolo.engine.model import YOLO
+from ultralytics.yolo.utils import DEFAULT_CFG, LOGGER, ROOT, is_git_dir
+from ultralytics.yolo.utils.checks import check_imgsz
+
+from ...yolo.utils.torch_utils import model_info, smart_inference_mode
+from .predict import FastSAMPredictor
+
+
+class FastSAM(YOLO):
+
+    def __init__(self, model='FastSAM-x.pt'):
+        """Call the __init__ method of the parent class (YOLO) with the updated default model"""
+        if model == 'FastSAM.pt':
+            model = 'FastSAM-x.pt'
+        super().__init__(model=model)
+        # any additional initialization code for FastSAM
+
+    @smart_inference_mode()
+    def predict(self, source=None, stream=False, **kwargs):
+        """
+        Perform prediction using the YOLO model.
+
+        Args:
+            source (str | int | PIL | np.ndarray): The source of the image to make predictions on.
+                          Accepts all source types accepted by the YOLO model.
+            stream (bool): Whether to stream the predictions or not. Defaults to False.
+            **kwargs : Additional keyword arguments passed to the predictor.
+                       Check the 'configuration' section in the documentation for all available options.
+
+        Returns:
+            (List[ultralytics.yolo.engine.results.Results]): The prediction results.
+        """
+        if source is None:
+            source = ROOT / 'assets' if is_git_dir() else 'https://ultralytics.com/images/bus.jpg'
+            LOGGER.warning(f"WARNING ⚠️ 'source' is missing. Using 'source={source}'.")
+        overrides = self.overrides.copy()
+        overrides['conf'] = 0.25
+        overrides.update(kwargs)  # prefer kwargs
+        overrides['mode'] = kwargs.get('mode', 'predict')
+        assert overrides['mode'] in ['track', 'predict']
+        overrides['save'] = kwargs.get('save', False)  # do not save by default if called in Python
+        self.predictor = FastSAMPredictor(overrides=overrides)
+        self.predictor.setup_model(model=self.model, verbose=False)
+
+        return self.predictor(source, stream=stream)
+
+    def train(self, **kwargs):
+        """Function trains models but raises an error as FastSAM models do not support training."""
+        raise NotImplementedError("FastSAM models don't support training")
+
+    def val(self, **kwargs):
+        """Run validation given dataset."""
+        overrides = dict(task='segment', mode='val')
+        overrides.update(kwargs)  # prefer kwargs
+        args = get_cfg(cfg=DEFAULT_CFG, overrides=overrides)
+        args.imgsz = check_imgsz(args.imgsz, max_dim=1)
+        validator = FastSAM(args=args)
+        validator(model=self.model)
+        self.metrics = validator.metrics
+        return validator.metrics
+
+    @smart_inference_mode()
+    def export(self, **kwargs):
+        """
+        Export model.
+
+        Args:
+            **kwargs : Any other args accepted by the predictors. To see all args check 'configuration' section in docs
+        """
+        overrides = dict(task='detect')
+        overrides.update(kwargs)
+        overrides['mode'] = 'export'
+        args = get_cfg(cfg=DEFAULT_CFG, overrides=overrides)
+        args.task = self.task
+        if args.imgsz == DEFAULT_CFG.imgsz:
+            args.imgsz = self.model.args['imgsz']  # use trained imgsz unless custom value is passed
+        if args.batch == DEFAULT_CFG.batch:
+            args.batch = 1  # default to 1 if not modified
+        return Exporter(overrides=args)(model=self.model)
+
+    def info(self, detailed=False, verbose=True):
+        """
+        Logs model info.
+
+        Args:
+            detailed (bool): Show detailed information about model.
+            verbose (bool): Controls verbosity.
+        """
+        return model_info(self.model, detailed=detailed, verbose=verbose, imgsz=640)
+
+    def __call__(self, source=None, stream=False, **kwargs):
+        """Calls the 'predict' function with given arguments to perform object detection."""
+        return self.predict(source, stream, **kwargs)
+
+    def __getattr__(self, attr):
+        """Raises error if object has no requested attribute."""
+        name = self.__class__.__name__
+        raise AttributeError(f"'{name}' object has no attribute '{attr}'. See valid attributes below.\n{self.__doc__}")
--- a/ultralytics/yolo/fastsam/predict.py
+++ b/ultralytics/yolo/fastsam/predict.py
@ -0,0 +1,53 @@
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+
+import torch
+
+from ultralytics.yolo.engine.results import Results
+from ultralytics.yolo.fastsam.utils import bbox_iou
+from ultralytics.yolo.utils import DEFAULT_CFG, ops
+from ultralytics.yolo.v8.detect.predict import DetectionPredictor
+
+
+class FastSAMPredictor(DetectionPredictor):
+
+    def __init__(self, cfg=DEFAULT_CFG, overrides=None, _callbacks=None):
+        super().__init__(cfg, overrides, _callbacks)
+        self.args.task = 'segment'
+
+    def postprocess(self, preds, img, orig_imgs):
+        """TODO: filter by classes."""
+        p = ops.non_max_suppression(preds[0],
+                                    self.args.conf,
+                                    self.args.iou,
+                                    agnostic=self.args.agnostic_nms,
+                                    max_det=self.args.max_det,
+                                    nc=len(self.model.names),
+                                    classes=self.args.classes)
+        full_box = torch.zeros_like(p[0][0])
+        full_box[2], full_box[3], full_box[4], full_box[6:] = img.shape[3], img.shape[2], 1.0, 1.0
+        full_box = full_box.view(1, -1)
+        critical_iou_index = bbox_iou(full_box[0][:4], p[0][:, :4], iou_thres=0.9, image_shape=img.shape[2:])
+        if critical_iou_index.numel() != 0:
+            full_box[0][4] = p[0][critical_iou_index][:, 4]
+            full_box[0][6:] = p[0][critical_iou_index][:, 6:]
+            p[0][critical_iou_index] = full_box
+        results = []
+        proto = preds[1][-1] if len(preds[1]) == 3 else preds[1]  # second output is len 3 if pt, but only 1 if exported
+        for i, pred in enumerate(p):
+            orig_img = orig_imgs[i] if isinstance(orig_imgs, list) else orig_imgs
+            path = self.batch[0]
+            img_path = path[i] if isinstance(path, list) else path
+            if not len(pred):  # save empty boxes
+                results.append(Results(orig_img=orig_img, path=img_path, names=self.model.names, boxes=pred[:, :6]))
+                continue
+            if self.args.retina_masks:
+                if not isinstance(orig_imgs, torch.Tensor):
+                    pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape)
+                masks = ops.process_mask_native(proto[i], pred[:, 6:], pred[:, :4], orig_img.shape[:2])  # HWC
+            else:
+                masks = ops.process_mask(proto[i], pred[:, 6:], pred[:, :4], img.shape[2:], upsample=True)  # HWC
+                if not isinstance(orig_imgs, torch.Tensor):
+                    pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape)
+            results.append(
+                Results(orig_img=orig_img, path=img_path, names=self.model.names, boxes=pred[:, :6], masks=masks))
+        return results
--- a/ultralytics/yolo/fastsam/prompt.py
+++ b/ultralytics/yolo/fastsam/prompt.py
@ -0,0 +1,406 @@
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+
+import os
+
+import cv2
+import matplotlib.pyplot as plt
+import numpy as np
+import torch
+from PIL import Image
+
+
+class FastSAMPrompt:
+
+    def __init__(self, img_path, results, device='cuda') -> None:
+        # self.img_path = img_path
+        self.device = device
+        self.results = results
+        self.img_path = img_path
+        self.ori_img = cv2.imread(img_path)
+
+        # Import and assign clip
+        try:
+            import clip  # for linear_assignment
+        except ImportError:
+            from ultralytics.yolo.utils.checks import check_requirements
+            check_requirements('git+https://github.com/openai/CLIP.git')  # required before installing lap from source
+            import clip
+        self.clip = clip
+
+    @staticmethod
+    def _segment_image(image, bbox):
+        image_array = np.array(image)
+        segmented_image_array = np.zeros_like(image_array)
+        x1, y1, x2, y2 = bbox
+        segmented_image_array[y1:y2, x1:x2] = image_array[y1:y2, x1:x2]
+        segmented_image = Image.fromarray(segmented_image_array)
+        black_image = Image.new('RGB', image.size, (255, 255, 255))
+        # transparency_mask = np.zeros_like((), dtype=np.uint8)
+        transparency_mask = np.zeros((image_array.shape[0], image_array.shape[1]), dtype=np.uint8)
+        transparency_mask[y1:y2, x1:x2] = 255
+        transparency_mask_image = Image.fromarray(transparency_mask, mode='L')
+        black_image.paste(segmented_image, mask=transparency_mask_image)
+        return black_image
+
+    @staticmethod
+    def _format_results(result, filter=0):
+        annotations = []
+        n = len(result.masks.data)
+        for i in range(n):
+            mask = result.masks.data[i] == 1.0
+
+            if torch.sum(mask) < filter:
+                continue
+            annotation = {
+                'id': i,
+                'segmentation': mask.cpu().numpy(),
+                'bbox': result.boxes.data[i],
+                'score': result.boxes.conf[i]}
+            annotation['area'] = annotation['segmentation'].sum()
+            annotations.append(annotation)
+        return annotations
+
+    @staticmethod
+    def filter_masks(annotations):  # filter the overlap mask
+        annotations.sort(key=lambda x: x['area'], reverse=True)
+        to_remove = set()
+        for i in range(len(annotations)):
+            a = annotations[i]
+            for j in range(i + 1, len(annotations)):
+                b = annotations[j]
+                if i != j and j not in to_remove and b['area'] < a['area'] and \
+                        (a['segmentation'] & b['segmentation']).sum() / b['segmentation'].sum() > 0.8:
+                    to_remove.add(j)
+
+        return [a for i, a in enumerate(annotations) if i not in to_remove], to_remove
+
+    @staticmethod
+    def _get_bbox_from_mask(mask):
+        mask = mask.astype(np.uint8)
+        contours, hierarchy = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
+        x1, y1, w, h = cv2.boundingRect(contours[0])
+        x2, y2 = x1 + w, y1 + h
+        if len(contours) > 1:
+            for b in contours:
+                x_t, y_t, w_t, h_t = cv2.boundingRect(b)
+                # 将多个bbox合并成一个
+                x1 = min(x1, x_t)
+                y1 = min(y1, y_t)
+                x2 = max(x2, x_t + w_t)
+                y2 = max(y2, y_t + h_t)
+            h = y2 - y1
+            w = x2 - x1
+        return [x1, y1, x2, y2]
+
+    def plot(self,
+             annotations,
+             output,
+             bbox=None,
+             points=None,
+             point_label=None,
+             mask_random_color=True,
+             better_quality=True,
+             retina=False,
+             withContours=True):
+        if isinstance(annotations[0], dict):
+            annotations = [annotation['segmentation'] for annotation in annotations]
+        result_name = os.path.basename(self.img_path)
+        image = self.ori_img
+        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+        original_h = image.shape[0]
+        original_w = image.shape[1]
+        # for macOS only
+        # plt.switch_backend('TkAgg')
+        plt.figure(figsize=(original_w / 100, original_h / 100))
+        # Add subplot with no margin.
+        plt.subplots_adjust(top=1, bottom=0, right=1, left=0, hspace=0, wspace=0)
+        plt.margins(0, 0)
+        plt.gca().xaxis.set_major_locator(plt.NullLocator())
+        plt.gca().yaxis.set_major_locator(plt.NullLocator())
+
+        plt.imshow(image)
+        if better_quality:
+            if isinstance(annotations[0], torch.Tensor):
+                annotations = np.array(annotations.cpu())
+            for i, mask in enumerate(annotations):
+                mask = cv2.morphologyEx(mask.astype(np.uint8), cv2.MORPH_CLOSE, np.ones((3, 3), np.uint8))
+                annotations[i] = cv2.morphologyEx(mask.astype(np.uint8), cv2.MORPH_OPEN, np.ones((8, 8), np.uint8))
+        if self.device == 'cpu':
+            annotations = np.array(annotations)
+            self.fast_show_mask(
+                annotations,
+                plt.gca(),
+                random_color=mask_random_color,
+                bbox=bbox,
+                points=points,
+                pointlabel=point_label,
+                retinamask=retina,
+                target_height=original_h,
+                target_width=original_w,
+            )
+        else:
+            if isinstance(annotations[0], np.ndarray):
+                annotations = torch.from_numpy(annotations)
+            self.fast_show_mask_gpu(
+                annotations,
+                plt.gca(),
+                random_color=mask_random_color,
+                bbox=bbox,
+                points=points,
+                pointlabel=point_label,
+                retinamask=retina,
+                target_height=original_h,
+                target_width=original_w,
+            )
+        if isinstance(annotations, torch.Tensor):
+            annotations = annotations.cpu().numpy()
+        if withContours:
+            contour_all = []
+            temp = np.zeros((original_h, original_w, 1))
+            for i, mask in enumerate(annotations):
+                if type(mask) == dict:
+                    mask = mask['segmentation']
+                annotation = mask.astype(np.uint8)
+                if not retina:
+                    annotation = cv2.resize(
+                        annotation,
+                        (original_w, original_h),
+                        interpolation=cv2.INTER_NEAREST,
+                    )
+                contours, hierarchy = cv2.findContours(annotation, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
+                contour_all.extend(iter(contours))
+            cv2.drawContours(temp, contour_all, -1, (255, 255, 255), 2)
+            color = np.array([0 / 255, 0 / 255, 1.0, 0.8])
+            contour_mask = temp / 255 * color.reshape(1, 1, -1)
+            plt.imshow(contour_mask)
+
+        save_path = output
+        if not os.path.exists(save_path):
+            os.makedirs(save_path)
+        plt.axis('off')
+        fig = plt.gcf()
+        plt.draw()
+
+        try:
+            buf = fig.canvas.tostring_rgb()
+        except AttributeError:
+            fig.canvas.draw()
+            buf = fig.canvas.tostring_rgb()
+        cols, rows = fig.canvas.get_width_height()
+        img_array = np.frombuffer(buf, dtype=np.uint8).reshape(rows, cols, 3)
+        cv2.imwrite(os.path.join(save_path, result_name), cv2.cvtColor(img_array, cv2.COLOR_RGB2BGR))
+
+    #   CPU post process
+    def fast_show_mask(
+        self,
+        annotation,
+        ax,
+        random_color=False,
+        bbox=None,
+        points=None,
+        pointlabel=None,
+        retinamask=True,
+        target_height=960,
+        target_width=960,
+    ):
+        msak_sum = annotation.shape[0]
+        height = annotation.shape[1]
+        weight = annotation.shape[2]
+        # 将annotation 按照面积 排序
+        areas = np.sum(annotation, axis=(1, 2))
+        sorted_indices = np.argsort(areas)
+        annotation = annotation[sorted_indices]
+
+        index = (annotation != 0).argmax(axis=0)
+        if random_color:
+            color = np.random.random((msak_sum, 1, 1, 3))
+        else:
+            color = np.ones((msak_sum, 1, 1, 3)) * np.array([30 / 255, 144 / 255, 1.0])
+        transparency = np.ones((msak_sum, 1, 1, 1)) * 0.6
+        visual = np.concatenate([color, transparency], axis=-1)
+        mask_image = np.expand_dims(annotation, -1) * visual
+
+        show = np.zeros((height, weight, 4))
+        h_indices, w_indices = np.meshgrid(np.arange(height), np.arange(weight), indexing='ij')
+        indices = (index[h_indices, w_indices], h_indices, w_indices, slice(None))
+        # 使用向量化索引更新show的值
+        show[h_indices, w_indices, :] = mask_image[indices]
+        if bbox is not None:
+            x1, y1, x2, y2 = bbox
+            ax.add_patch(plt.Rectangle((x1, y1), x2 - x1, y2 - y1, fill=False, edgecolor='b', linewidth=1))
+        # draw point
+        if points is not None:
+            plt.scatter(
+                [point[0] for i, point in enumerate(points) if pointlabel[i] == 1],
+                [point[1] for i, point in enumerate(points) if pointlabel[i] == 1],
+                s=20,
+                c='y',
+            )
+            plt.scatter(
+                [point[0] for i, point in enumerate(points) if pointlabel[i] == 0],
+                [point[1] for i, point in enumerate(points) if pointlabel[i] == 0],
+                s=20,
+                c='m',
+            )
+
+        if not retinamask:
+            show = cv2.resize(show, (target_width, target_height), interpolation=cv2.INTER_NEAREST)
+        ax.imshow(show)
+
+    def fast_show_mask_gpu(
+        self,
+        annotation,
+        ax,
+        random_color=False,
+        bbox=None,
+        points=None,
+        pointlabel=None,
+        retinamask=True,
+        target_height=960,
+        target_width=960,
+    ):
+        msak_sum = annotation.shape[0]
+        height = annotation.shape[1]
+        weight = annotation.shape[2]
+        areas = torch.sum(annotation, dim=(1, 2))
+        sorted_indices = torch.argsort(areas, descending=False)
+        annotation = annotation[sorted_indices]
+        # 找每个位置第一个非零值下标
+        index = (annotation != 0).to(torch.long).argmax(dim=0)
+        if random_color:
+            color = torch.rand((msak_sum, 1, 1, 3)).to(annotation.device)
+        else:
+            color = torch.ones((msak_sum, 1, 1, 3)).to(annotation.device) * torch.tensor([30 / 255, 144 / 255, 1.0]).to(
+                annotation.device)
+        transparency = torch.ones((msak_sum, 1, 1, 1)).to(annotation.device) * 0.6
+        visual = torch.cat([color, transparency], dim=-1)
+        mask_image = torch.unsqueeze(annotation, -1) * visual
+        # 按index取数，index指每个位置选哪个batch的数，把mask_image转成一个batch的形式
+        show = torch.zeros((height, weight, 4)).to(annotation.device)
+        h_indices, w_indices = torch.meshgrid(torch.arange(height), torch.arange(weight), indexing='ij')
+        indices = (index[h_indices, w_indices], h_indices, w_indices, slice(None))
+        # 使用向量化索引更新show的值
+        show[h_indices, w_indices, :] = mask_image[indices]
+        show_cpu = show.cpu().numpy()
+        if bbox is not None:
+            x1, y1, x2, y2 = bbox
+            ax.add_patch(plt.Rectangle((x1, y1), x2 - x1, y2 - y1, fill=False, edgecolor='b', linewidth=1))
+        # draw point
+        if points is not None:
+            plt.scatter(
+                [point[0] for i, point in enumerate(points) if pointlabel[i] == 1],
+                [point[1] for i, point in enumerate(points) if pointlabel[i] == 1],
+                s=20,
+                c='y',
+            )
+            plt.scatter(
+                [point[0] for i, point in enumerate(points) if pointlabel[i] == 0],
+                [point[1] for i, point in enumerate(points) if pointlabel[i] == 0],
+                s=20,
+                c='m',
+            )
+        if not retinamask:
+            show_cpu = cv2.resize(show_cpu, (target_width, target_height), interpolation=cv2.INTER_NEAREST)
+        ax.imshow(show_cpu)
+
+    # clip
+    @torch.no_grad()
+    def retrieve(self, model, preprocess, elements, search_text: str, device) -> int:
+        preprocessed_images = [preprocess(image).to(device) for image in elements]
+        tokenized_text = self.clip.tokenize([search_text]).to(device)
+        stacked_images = torch.stack(preprocessed_images)
+        image_features = model.encode_image(stacked_images)
+        text_features = model.encode_text(tokenized_text)
+        image_features /= image_features.norm(dim=-1, keepdim=True)
+        text_features /= text_features.norm(dim=-1, keepdim=True)
+        probs = 100.0 * image_features @ text_features.T
+        return probs[:, 0].softmax(dim=0)
+
+    def _crop_image(self, format_results):
+
+        image = Image.fromarray(cv2.cvtColor(self.ori_img, cv2.COLOR_BGR2RGB))
+        ori_w, ori_h = image.size
+        annotations = format_results
+        mask_h, mask_w = annotations[0]['segmentation'].shape
+        if ori_w != mask_w or ori_h != mask_h:
+            image = image.resize((mask_w, mask_h))
+        cropped_boxes = []
+        cropped_images = []
+        not_crop = []
+        filter_id = []
+        # annotations, _ = filter_masks(annotations)
+        # filter_id = list(_)
+        for _, mask in enumerate(annotations):
+            if np.sum(mask['segmentation']) <= 100:
+                filter_id.append(_)
+                continue
+            bbox = self._get_bbox_from_mask(mask['segmentation'])  # mask 的 bbox
+            cropped_boxes.append(self._segment_image(image, bbox))  # 保存裁剪的图片
+            # cropped_boxes.append(segment_image(image,mask["segmentation"]))
+            cropped_images.append(bbox)  # 保存裁剪的图片的bbox
+
+        return cropped_boxes, cropped_images, not_crop, filter_id, annotations
+
+    def box_prompt(self, bbox):
+
+        assert (bbox[2] != 0 and bbox[3] != 0)
+        masks = self.results[0].masks.data
+        target_height = self.ori_img.shape[0]
+        target_width = self.ori_img.shape[1]
+        h = masks.shape[1]
+        w = masks.shape[2]
+        if h != target_height or w != target_width:
+            bbox = [
+                int(bbox[0] * w / target_width),
+                int(bbox[1] * h / target_height),
+                int(bbox[2] * w / target_width),
+                int(bbox[3] * h / target_height), ]
+        bbox[0] = max(round(bbox[0]), 0)
+        bbox[1] = max(round(bbox[1]), 0)
+        bbox[2] = min(round(bbox[2]), w)
+        bbox[3] = min(round(bbox[3]), h)
+
+        # IoUs = torch.zeros(len(masks), dtype=torch.float32)
+        bbox_area = (bbox[3] - bbox[1]) * (bbox[2] - bbox[0])
+
+        masks_area = torch.sum(masks[:, bbox[1]:bbox[3], bbox[0]:bbox[2]], dim=(1, 2))
+        orig_masks_area = torch.sum(masks, dim=(1, 2))
+
+        union = bbox_area + orig_masks_area - masks_area
+        IoUs = masks_area / union
+        max_iou_index = torch.argmax(IoUs)
+
+        return np.array([masks[max_iou_index].cpu().numpy()])
+
+    def point_prompt(self, points, pointlabel):  # numpy 处理
+
+        masks = self._format_results(self.results[0], 0)
+        target_height = self.ori_img.shape[0]
+        target_width = self.ori_img.shape[1]
+        h = masks[0]['segmentation'].shape[0]
+        w = masks[0]['segmentation'].shape[1]
+        if h != target_height or w != target_width:
+            points = [[int(point[0] * w / target_width), int(point[1] * h / target_height)] for point in points]
+        onemask = np.zeros((h, w))
+        for i, annotation in enumerate(masks):
+            mask = annotation['segmentation'] if type(annotation) == dict else annotation
+            for i, point in enumerate(points):
+                if mask[point[1], point[0]] == 1 and pointlabel[i] == 1:
+                    onemask += mask
+                if mask[point[1], point[0]] == 1 and pointlabel[i] == 0:
+                    onemask -= mask
+        onemask = onemask >= 1
+        return np.array([onemask])
+
+    def text_prompt(self, text):
+        format_results = self._format_results(self.results[0], 0)
+        cropped_boxes, cropped_images, not_crop, filter_id, annotations = self._crop_image(format_results)
+        clip_model, preprocess = self.clip.load('ViT-B/32', device=self.device)
+        scores = self.retrieve(clip_model, preprocess, cropped_boxes, text, device=self.device)
+        max_idx = scores.argsort()
+        max_idx = max_idx[-1]
+        max_idx += sum(np.array(filter_id) <= int(max_idx))
+        return np.array([annotations[max_idx]['segmentation']])
+
+    def everything_prompt(self):
+        return self.results[0].masks.data
--- a/ultralytics/yolo/fastsam/utils.py
+++ b/ultralytics/yolo/fastsam/utils.py
@ -0,0 +1,64 @@
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+
+import torch
+
+
+def adjust_bboxes_to_image_border(boxes, image_shape, threshold=20):
+    """
+    Adjust bounding boxes to stick to image border if they are within a certain threshold.
+
+    Args:
+        boxes: (n, 4)
+        image_shape: (height, width)
+        threshold: pixel threshold
+
+    Returns:
+        adjusted_boxes: adjusted bounding boxes
+    """
+
+    # Image dimensions
+    h, w = image_shape
+
+    # Adjust boxes
+    boxes[boxes[:, 0] < threshold, 0] = 0  # x1
+    boxes[boxes[:, 1] < threshold, 1] = 0  # y1
+    boxes[boxes[:, 2] > w - threshold, 2] = w  # x2
+    boxes[boxes[:, 3] > h - threshold, 3] = h  # y2
+    return boxes
+
+
+def bbox_iou(box1, boxes, iou_thres=0.9, image_shape=(640, 640), raw_output=False):
+    """
+    Compute the Intersection-Over-Union of a bounding box with respect to an array of other bounding boxes.
+
+    Args:
+        box1: (4, )
+        boxes: (n, 4)
+
+    Returns:
+        high_iou_indices: Indices of boxes with IoU > thres
+    """
+    boxes = adjust_bboxes_to_image_border(boxes, image_shape)
+    # obtain coordinates for intersections
+    x1 = torch.max(box1[0], boxes[:, 0])
+    y1 = torch.max(box1[1], boxes[:, 1])
+    x2 = torch.min(box1[2], boxes[:, 2])
+    y2 = torch.min(box1[3], boxes[:, 3])
+
+    # compute the area of intersection
+    intersection = (x2 - x1).clamp(0) * (y2 - y1).clamp(0)
+
+    # compute the area of both individual boxes
+    box1_area = (box1[2] - box1[0]) * (box1[3] - box1[1])
+    box2_area = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
+
+    # compute the area of union
+    union = box1_area + box2_area - intersection
+
+    # compute the IoU
+    iou = intersection / union  # Should be shape (n, )
+    if raw_output:
+        return 0 if iou.numel() == 0 else iou
+
+    # return indices of boxes with IoU > thres
+    return torch.nonzero(iou > iou_thres).flatten()
--- a/ultralytics/yolo/fastsam/val.py
+++ b/ultralytics/yolo/fastsam/val.py
@ -0,0 +1,244 @@
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+
+from multiprocessing.pool import ThreadPool
+from pathlib import Path
+
+import numpy as np
+import torch
+import torch.nn.functional as F
+
+from ultralytics.yolo.utils import LOGGER, NUM_THREADS, ops
+from ultralytics.yolo.utils.checks import check_requirements
+from ultralytics.yolo.utils.metrics import SegmentMetrics, box_iou, mask_iou
+from ultralytics.yolo.utils.plotting import output_to_target, plot_images
+from ultralytics.yolo.v8.detect import DetectionValidator
+
+
+class FastSAMValidator(DetectionValidator):
+
+    def __init__(self, dataloader=None, save_dir=None, pbar=None, args=None, _callbacks=None):
+        """Initialize SegmentationValidator and set task to 'segment', metrics to SegmentMetrics."""
+        super().__init__(dataloader, save_dir, pbar, args, _callbacks)
+        self.args.task = 'segment'
+        self.metrics = SegmentMetrics(save_dir=self.save_dir, on_plot=self.on_plot)
+
+    def preprocess(self, batch):
+        """Preprocesses batch by converting masks to float and sending to device."""
+        batch = super().preprocess(batch)
+        batch['masks'] = batch['masks'].to(self.device).float()
+        return batch
+
+    def init_metrics(self, model):
+        """Initialize metrics and select mask processing function based on save_json flag."""
+        super().init_metrics(model)
+        self.plot_masks = []
+        if self.args.save_json:
+            check_requirements('pycocotools>=2.0.6')
+            self.process = ops.process_mask_upsample  # more accurate
+        else:
+            self.process = ops.process_mask  # faster
+
+    def get_desc(self):
+        """Return a formatted description of evaluation metrics."""
+        return ('%22s' + '%11s' * 10) % ('Class', 'Images', 'Instances', 'Box(P', 'R', 'mAP50', 'mAP50-95)', 'Mask(P',
+                                         'R', 'mAP50', 'mAP50-95)')
+
+    def postprocess(self, preds):
+        """Postprocesses YOLO predictions and returns output detections with proto."""
+        p = ops.non_max_suppression(preds[0],
+                                    self.args.conf,
+                                    self.args.iou,
+                                    labels=self.lb,
+                                    multi_label=True,
+                                    agnostic=self.args.single_cls,
+                                    max_det=self.args.max_det,
+                                    nc=self.nc)
+        proto = preds[1][-1] if len(preds[1]) == 3 else preds[1]  # second output is len 3 if pt, but only 1 if exported
+        return p, proto
+
+    def update_metrics(self, preds, batch):
+        """Metrics."""
+        for si, (pred, proto) in enumerate(zip(preds[0], preds[1])):
+            idx = batch['batch_idx'] == si
+            cls = batch['cls'][idx]
+            bbox = batch['bboxes'][idx]
+            nl, npr = cls.shape[0], pred.shape[0]  # number of labels, predictions
+            shape = batch['ori_shape'][si]
+            correct_masks = torch.zeros(npr, self.niou, dtype=torch.bool, device=self.device)  # init
+            correct_bboxes = torch.zeros(npr, self.niou, dtype=torch.bool, device=self.device)  # init
+            self.seen += 1
+
+            if npr == 0:
+                if nl:
+                    self.stats.append((correct_bboxes, correct_masks, *torch.zeros(
+                        (2, 0), device=self.device), cls.squeeze(-1)))
+                    if self.args.plots:
+                        self.confusion_matrix.process_batch(detections=None, labels=cls.squeeze(-1))
+                continue
+
+            # Masks
+            midx = [si] if self.args.overlap_mask else idx
+            gt_masks = batch['masks'][midx]
+            pred_masks = self.process(proto, pred[:, 6:], pred[:, :4], shape=batch['img'][si].shape[1:])
+
+            # Predictions
+            if self.args.single_cls:
+                pred[:, 5] = 0
+            predn = pred.clone()
+            ops.scale_boxes(batch['img'][si].shape[1:], predn[:, :4], shape,
+                            ratio_pad=batch['ratio_pad'][si])  # native-space pred
+
+            # Evaluate
+            if nl:
+                height, width = batch['img'].shape[2:]
+                tbox = ops.xywh2xyxy(bbox) * torch.tensor(
+                    (width, height, width, height), device=self.device)  # target boxes
+                ops.scale_boxes(batch['img'][si].shape[1:], tbox, shape,
+                                ratio_pad=batch['ratio_pad'][si])  # native-space labels
+                labelsn = torch.cat((cls, tbox), 1)  # native-space labels
+                correct_bboxes = self._process_batch(predn, labelsn)
+                # TODO: maybe remove these `self.` arguments as they already are member variable
+                correct_masks = self._process_batch(predn,
+                                                    labelsn,
+                                                    pred_masks,
+                                                    gt_masks,
+                                                    overlap=self.args.overlap_mask,
+                                                    masks=True)
+                if self.args.plots:
+                    self.confusion_matrix.process_batch(predn, labelsn)
+
+            # Append correct_masks, correct_boxes, pconf, pcls, tcls
+            self.stats.append((correct_bboxes, correct_masks, pred[:, 4], pred[:, 5], cls.squeeze(-1)))
+
+            pred_masks = torch.as_tensor(pred_masks, dtype=torch.uint8)
+            if self.args.plots and self.batch_i < 3:
+                self.plot_masks.append(pred_masks[:15].cpu())  # filter top 15 to plot
+
+            # Save
+            if self.args.save_json:
+                pred_masks = ops.scale_image(pred_masks.permute(1, 2, 0).contiguous().cpu().numpy(),
+                                             shape,
+                                             ratio_pad=batch['ratio_pad'][si])
+                self.pred_to_json(predn, batch['im_file'][si], pred_masks)
+            # if self.args.save_txt:
+            #    save_one_txt(predn, save_conf, shape, file=save_dir / 'labels' / f'{path.stem}.txt')
+
+    def finalize_metrics(self, *args, **kwargs):
+        """Sets speed and confusion matrix for evaluation metrics."""
+        self.metrics.speed = self.speed
+        self.metrics.confusion_matrix = self.confusion_matrix
+
+    def _process_batch(self, detections, labels, pred_masks=None, gt_masks=None, overlap=False, masks=False):
+        """
+        Return correct prediction matrix
+        Arguments:
+            detections (array[N, 6]), x1, y1, x2, y2, conf, class
+            labels (array[M, 5]), class, x1, y1, x2, y2
+        Returns:
+            correct (array[N, 10]), for 10 IoU levels
+        """
+        if masks:
+            if overlap:
+                nl = len(labels)
+                index = torch.arange(nl, device=gt_masks.device).view(nl, 1, 1) + 1
+                gt_masks = gt_masks.repeat(nl, 1, 1)  # shape(1,640,640) -> (n,640,640)
+                gt_masks = torch.where(gt_masks == index, 1.0, 0.0)
+            if gt_masks.shape[1:] != pred_masks.shape[1:]:
+                gt_masks = F.interpolate(gt_masks[None], pred_masks.shape[1:], mode='bilinear', align_corners=False)[0]
+                gt_masks = gt_masks.gt_(0.5)
+            iou = mask_iou(gt_masks.view(gt_masks.shape[0], -1), pred_masks.view(pred_masks.shape[0], -1))
+        else:  # boxes
+            iou = box_iou(labels[:, 1:], detections[:, :4])
+
+        correct = np.zeros((detections.shape[0], self.iouv.shape[0])).astype(bool)
+        correct_class = labels[:, 0:1] == detections[:, 5]
+        for i in range(len(self.iouv)):
+            x = torch.where((iou >= self.iouv[i]) & correct_class)  # IoU > threshold and classes match
+            if x[0].shape[0]:
+                matches = torch.cat((torch.stack(x, 1), iou[x[0], x[1]][:, None]),
+                                    1).cpu().numpy()  # [label, detect, iou]
+                if x[0].shape[0] > 1:
+                    matches = matches[matches[:, 2].argsort()[::-1]]
+                    matches = matches[np.unique(matches[:, 1], return_index=True)[1]]
+                    # matches = matches[matches[:, 2].argsort()[::-1]]
+                    matches = matches[np.unique(matches[:, 0], return_index=True)[1]]
+                correct[matches[:, 1].astype(int), i] = True
+        return torch.tensor(correct, dtype=torch.bool, device=detections.device)
+
+    def plot_val_samples(self, batch, ni):
+        """Plots validation samples with bounding box labels."""
+        plot_images(batch['img'],
+                    batch['batch_idx'],
+                    batch['cls'].squeeze(-1),
+                    batch['bboxes'],
+                    batch['masks'],
+                    paths=batch['im_file'],
+                    fname=self.save_dir / f'val_batch{ni}_labels.jpg',
+                    names=self.names,
+                    on_plot=self.on_plot)
+
+    def plot_predictions(self, batch, preds, ni):
+        """Plots batch predictions with masks and bounding boxes."""
+        plot_images(
+            batch['img'],
+            *output_to_target(preds[0], max_det=15),  # not set to self.args.max_det due to slow plotting speed
+            torch.cat(self.plot_masks, dim=0) if len(self.plot_masks) else self.plot_masks,
+            paths=batch['im_file'],
+            fname=self.save_dir / f'val_batch{ni}_pred.jpg',
+            names=self.names,
+            on_plot=self.on_plot)  # pred
+        self.plot_masks.clear()
+
+    def pred_to_json(self, predn, filename, pred_masks):
+        """Save one JSON result."""
+        # Example result = {"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}
+        from pycocotools.mask import encode  # noqa
+
+        def single_encode(x):
+            """Encode predicted masks as RLE and append results to jdict."""
+            rle = encode(np.asarray(x[:, :, None], order='F', dtype='uint8'))[0]
+            rle['counts'] = rle['counts'].decode('utf-8')
+            return rle
+
+        stem = Path(filename).stem
+        image_id = int(stem) if stem.isnumeric() else stem
+        box = ops.xyxy2xywh(predn[:, :4])  # xywh
+        box[:, :2] -= box[:, 2:] / 2  # xy center to top-left corner
+        pred_masks = np.transpose(pred_masks, (2, 0, 1))
+        with ThreadPool(NUM_THREADS) as pool:
+            rles = pool.map(single_encode, pred_masks)
+        for i, (p, b) in enumerate(zip(predn.tolist(), box.tolist())):
+            self.jdict.append({
+                'image_id': image_id,
+                'category_id': self.class_map[int(p[5])],
+                'bbox': [round(x, 3) for x in b],
+                'score': round(p[4], 5),
+                'segmentation': rles[i]})
+
+    def eval_json(self, stats):
+        """Return COCO-style object detection evaluation metrics."""
+        if self.args.save_json and self.is_coco and len(self.jdict):
+            anno_json = self.data['path'] / 'annotations/instances_val2017.json'  # annotations
+            pred_json = self.save_dir / 'predictions.json'  # predictions
+            LOGGER.info(f'\nEvaluating pycocotools mAP using {pred_json} and {anno_json}...')
+            try:  # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
+                check_requirements('pycocotools>=2.0.6')
+                from pycocotools.coco import COCO  # noqa
+                from pycocotools.cocoeval import COCOeval  # noqa
+
+                for x in anno_json, pred_json:
+                    assert x.is_file(), f'{x} file not found'
+                anno = COCO(str(anno_json))  # init annotations api
+                pred = anno.loadRes(str(pred_json))  # init predictions api (must pass string, not Path)
+                for i, eval in enumerate([COCOeval(anno, pred, 'bbox'), COCOeval(anno, pred, 'segm')]):
+                    if self.is_coco:
+                        eval.params.imgIds = [int(Path(x).stem) for x in self.dataloader.dataset.im_files]  # im to eval
+                    eval.evaluate()
+                    eval.accumulate()
+                    eval.summarize()
+                    idx = i * 4 + 2
+                    stats[self.metrics.keys[idx + 1]], stats[
+                        self.metrics.keys[idx]] = eval.stats[:2]  # update mAP50-95 and mAP50
+            except Exception as e:
+                LOGGER.warning(f'pycocotools unable to run: {e}')
+        return stats
--- a/ultralytics/yolo/utils/init.py
+++ b/ultralytics/yolo/utils/init.py
@ -38,6 +38,7 @@ VERBOSE = str(os.getenv('YOLO_VERBOSE', True)).lower() == 'true'  # global verbo
 TQDM_BAR_FORMAT = '{l_bar}{bar:10}{r_bar}'  # tqdm bar format
 LOGGING_NAME = 'ultralytics'
 MACOS, LINUX, WINDOWS = (platform.system() == x for x in ['Darwin', 'Linux', 'Windows'])  # environment booleans
+ARM64 = platform.machine() in ('arm64', 'aarch64')  # ARM64 booleans
 HELP_MSG = \
    """
    Usage examples for running YOLOv8:
--- a/ultralytics/yolo/utils/benchmarks.py
+++ b/ultralytics/yolo/utils/benchmarks.py
@ -21,6 +21,7 @@ TensorFlow Lite         | `tflite`                  | yolov8n.tflite
 TensorFlow Edge TPU     | `edgetpu`                 | yolov8n_edgetpu.tflite
 TensorFlow.js           | `tfjs`                    | yolov8n_web_model/
 PaddlePaddle            | `paddle`                  | yolov8n_paddle_model/
+NCNN                    | `ncnn`                    | yolov8n_ncnn_model/
 """

 import glob
@ -33,6 +34,7 @@ import torch.cuda
 from tqdm import tqdm

 from ultralytics import YOLO
+from ultralytics.yolo.cfg import TASK2DATA, TASK2METRIC
 from ultralytics.yolo.engine.exporter import export_formats
 from ultralytics.yolo.utils import LINUX, LOGGER, MACOS, ROOT, SETTINGS
 from ultralytics.yolo.utils.checks import check_requirements, check_yolo
@ -96,22 +98,16 @@ def benchmark(model=Path(SETTINGS['weights_dir']) / 'yolov8n.pt',
            emoji = '❎'  # indicates export succeeded

            # Predict
-            assert i not in (9, 10), 'inference not supported'  # Edge TPU and TF.js are unsupported
+            assert model.task != 'pose' or i != 7, 'GraphDef Pose inference is not supported'
+            assert i not in (9, 10, 12), 'inference not supported'  # Edge TPU, TF.js and NCNN are unsupported
            assert i != 5 or platform.system() == 'Darwin', 'inference only supported on macOS>=10.13'  # CoreML
            if not (ROOT / 'assets/bus.jpg').exists():
                download(url='https://ultralytics.com/images/bus.jpg', dir=ROOT / 'assets')
            export.predict(ROOT / 'assets/bus.jpg', imgsz=imgsz, device=device, half=half)

            # Validate
-            if model.task == 'detect':
-                data, key = 'coco8.yaml', 'metrics/mAP50-95(B)'
-            elif model.task == 'segment':
-                data, key = 'coco8-seg.yaml', 'metrics/mAP50-95(M)'
-            elif model.task == 'classify':
-                data, key = 'imagenet100', 'metrics/accuracy_top5'
-            elif model.task == 'pose':
-                data, key = 'coco8-pose.yaml', 'metrics/mAP50-95(P)'
-
+            data = TASK2DATA[model.task]  # task to dataset, i.e. coco8.yaml for task=detect
+            key = TASK2METRIC[model.task]  # task to metric, i.e. metrics/mAP50-95(B) for task=detect
            results = export.val(data=data,
                                 batch=1,
                                 imgsz=imgsz,
--- a/ultralytics/yolo/utils/callbacks/dvc.py
+++ b/ultralytics/yolo/utils/callbacks/dvc.py
@ -67,7 +67,7 @@ def on_pretrain_routine_start(trainer):
    try:
        global live
        if not _logger_disabled():
-            live = dvclive.Live(save_dvc_exp=True)
+            live = dvclive.Live(save_dvc_exp=True, cache_images=True)
            LOGGER.info(
                'DVCLive is detected and auto logging is enabled (can be disabled with `ULTRALYTICS_DVC_DISABLED=true`).'
            )
--- a/ultralytics/yolo/utils/callbacks/mlflow.py
+++ b/ultralytics/yolo/utils/callbacks/mlflow.py
@ -27,6 +27,7 @@ def on_pretrain_routine_end(trainer):
        mlflow.set_tracking_uri(mlflow_location)

        experiment_name = os.environ.get('MLFLOW_EXPERIMENT') or trainer.args.project or '/Shared/YOLOv8'
+        run_name = os.environ.get('MLFLOW_RUN') or trainer.args.name
        experiment = mlflow.get_experiment_by_name(experiment_name)
        if experiment is None:
            mlflow.create_experiment(experiment_name)
@ -36,7 +37,7 @@ def on_pretrain_routine_end(trainer):
        try:
            run, active_run = mlflow, mlflow.active_run()
            if not active_run:
-                active_run = mlflow.start_run(experiment_id=experiment.experiment_id)
+                active_run = mlflow.start_run(experiment_id=experiment.experiment_id, run_name=run_name)
            run_id = active_run.info.run_id
            LOGGER.info(f'{prefix}Using run_id({run_id}) at {mlflow_location}')
            run.log_params(vars(trainer.model.args))
--- a/ultralytics/yolo/utils/callbacks/wb.py
+++ b/ultralytics/yolo/utils/callbacks/wb.py
@ -50,7 +50,7 @@ def on_train_end(trainer):
    art = wb.Artifact(type='model', name=f'run_{wb.run.id}_model')
    if trainer.best.exists():
        art.add_file(trainer.best)
-        wb.run.log_artifact(art)
+        wb.run.log_artifact(art, aliases=['best'])


 callbacks = {
--- a/ultralytics/yolo/utils/checks.py
+++ b/ultralytics/yolo/utils/checks.py
@ -8,6 +8,7 @@ import platform
 import re
 import shutil
 import subprocess
+import time
 from pathlib import Path
 from typing import Optional

@ -235,13 +236,16 @@ def check_requirements(requirements=ROOT.parent / 'requirements.txt', exclude=()

    if s:
        if install and AUTOINSTALL:  # check environment variable
-            LOGGER.info(f"{prefix} Ultralytics requirement{'s' * (n > 1)} {s}not found, attempting AutoUpdate...")
+            pkgs = file or requirements  # missing packages
+            LOGGER.info(f"{prefix} Ultralytics requirement{'s' * (n > 1)} {pkgs} not found, attempting AutoUpdate...")
            try:
+                t = time.time()
                assert is_online(), 'AutoUpdate skipped (offline)'
                LOGGER.info(subprocess.check_output(f'pip install --no-cache {s} {cmds}', shell=True).decode())
-                s = f"{prefix} {n} package{'s' * (n > 1)} updated per {file or requirements}\n" \
-                    f"{prefix} ⚠️ {colorstr('bold', 'Restart runtime or rerun command for updates to take effect')}\n"
-                LOGGER.info(s)
+                dt = time.time() - t
+                LOGGER.info(
+                    f"{prefix} AutoUpdate success ✅ {dt:.1f}s, installed {n} package{'s' * (n > 1)}: {pkgs}\n"
+                    f"{prefix} ⚠️ {colorstr('bold', 'Restart runtime or rerun command for updates to take effect')}\n")
            except Exception as e:
                LOGGER.warning(f'{prefix} ❌ {e}')
                return False
--- a/ultralytics/yolo/utils/downloads.py
+++ b/ultralytics/yolo/utils/downloads.py
@ -20,6 +20,7 @@ GITHUB_ASSET_NAMES = [f'yolov8{k}{suffix}.pt' for k in 'nsmlx' for suffix in (''
                     [f'yolov3{k}u.pt' for k in ('', '-spp', '-tiny')] + \
                     [f'yolo_nas_{k}.pt' for k in 'sml'] + \
                     [f'sam_{k}.pt' for k in 'bl'] + \
+                     [f'FastSAM-{k}.pt' for k in 'sx'] + \
                     [f'rtdetr-{k}.pt' for k in 'lx']
 GITHUB_ASSET_STEMS = [Path(k).stem for k in GITHUB_ASSET_NAMES]

@ -37,7 +38,7 @@ def is_url(url, check=True):
    return False


-def unzip_file(file, path=None, exclude=('.DS_Store', '__MACOSX')):
+def unzip_file(file, path=None, exclude=('.DS_Store', '__MACOSX'), exist_ok=False):
    """
    Unzips a *.zip file to the specified path, excluding files containing strings in the exclude list.

@ -49,6 +50,7 @@ def unzip_file(file, path=None, exclude=('.DS_Store', '__MACOSX')):
        file (str): The path to the zipfile to be extracted.
        path (str, optional): The path to extract the zipfile to. Defaults to None.
        exclude (tuple, optional): A tuple of filename strings to be excluded. Defaults to ('.DS_Store', '__MACOSX').
+        exist_ok (bool, optional): Whether to overwrite existing contents if they exist. Defaults to False.

    Raises:
        BadZipFile: If the provided file does not exist or is not a valid zipfile.
@ -61,6 +63,7 @@ def unzip_file(file, path=None, exclude=('.DS_Store', '__MACOSX')):
    if path is None:
        path = Path(file).parent  # default path

+    # Unzip the file contents
    with ZipFile(file) as zipObj:
        file_list = [f for f in zipObj.namelist() if all(x not in f for x in exclude)]
        top_level_dirs = {Path(f).parts[0] for f in file_list}
@ -68,6 +71,13 @@ def unzip_file(file, path=None, exclude=('.DS_Store', '__MACOSX')):
        if len(top_level_dirs) > 1 or not file_list[0].endswith('/'):
            path = Path(path) / Path(file).stem  # define new unzip directory

+        # Check if destination directory already exists and contains files
+        extract_path = Path(path) / list(top_level_dirs)[0]
+        if extract_path.exists() and any(extract_path.iterdir()) and not exist_ok:
+            # If it exists and is not empty, return the path without unzipping
+            LOGGER.info(f'Skipping {file} unzip (already unzipped)')
+            return path
+
        for f in file_list:
            zipObj.extract(f, path=path)

@ -179,7 +189,7 @@ def safe_download(url,

    if unzip and f.exists() and f.suffix in ('', '.zip', '.tar', '.gz'):
        unzip_dir = dir or f.parent  # unzip to dir if provided else unzip in place
-        LOGGER.info(f'Unzipping {f} to {unzip_dir}...')
+        LOGGER.info(f'Unzipping {f} to {unzip_dir.absolute()}...')
        if is_zipfile(f):
            unzip_dir = unzip_file(file=f, path=unzip_dir)  # unzip
        elif f.suffix == '.tar':
@ -191,17 +201,18 @@ def safe_download(url,
        return unzip_dir


+def get_github_assets(repo='ultralytics/assets', version='latest'):
+    """Return GitHub repo tag and assets (i.e. ['yolov8n.pt', 'yolov8s.pt', ...])."""
+    if version != 'latest':
+        version = f'tags/{version}'  # i.e. tags/v6.2
+    response = requests.get(f'https://api.github.com/repos/{repo}/releases/{version}').json()  # github api
+    return response['tag_name'], [x['name'] for x in response['assets']]  # tag, assets
+
+
 def attempt_download_asset(file, repo='ultralytics/assets', release='v0.0.0'):
    """Attempt file download from GitHub release assets if not found locally. release = 'latest', 'v6.2', etc."""
    from ultralytics.yolo.utils import SETTINGS  # scoped for circular import

-    def github_assets(repository, version='latest'):
-        """Return GitHub repo tag and assets (i.e. ['yolov8n.pt', 'yolov8s.pt', ...])."""
-        if version != 'latest':
-            version = f'tags/{version}'  # i.e. tags/v6.2
-        response = requests.get(f'https://api.github.com/repos/{repository}/releases/{version}').json()  # github api
-        return response['tag_name'], [x['name'] for x in response['assets']]  # tag, assets
-
    # YOLOv3/5u updates
    file = str(file)
    file = checks.check_yolov5u_filename(file)
@ -225,10 +236,10 @@ def attempt_download_asset(file, repo='ultralytics/assets', release='v0.0.0'):
        # GitHub assets
        assets = GITHUB_ASSET_NAMES
        try:
-            tag, assets = github_assets(repo, release)
+            tag, assets = get_github_assets(repo, release)
        except Exception:
            try:
-                tag, assets = github_assets(repo)  # latest release
+                tag, assets = get_github_assets(repo)  # latest release
            except Exception:
                try:
                    tag = subprocess.check_output(['git', 'tag']).decode().split()[-1]
--- a/ultralytics/yolo/utils/ops.py
+++ b/ultralytics/yolo/utils/ops.py
@ -200,12 +200,15 @@ def non_max_suppression(
    multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)
    merge = False  # use merge-NMS

+    prediction = prediction.transpose(-1, -2)  # shape(1,84,6300) to shape(1,6300,84)
+    prediction[..., :4] = xywh2xyxy(prediction[..., :4])  # xywh to xyxy
+
    t = time.time()
    output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs
    for xi, x in enumerate(prediction):  # image index, image inference
        # Apply constraints
        # x[((x[:, 2:4] < min_wh) | (x[:, 2:4] > max_wh)).any(1), 4] = 0  # width-height
-        x = x.transpose(0, -1)[xc[xi]]  # confidence
+        x = x[xc[xi]]  # confidence

        # Cat apriori labels if autolabelling
        if labels and len(labels[xi]):
@ -221,9 +224,9 @@ def non_max_suppression(

        # Detections matrix nx6 (xyxy, conf, cls)
        box, cls, mask = x.split((4, nc, nm), 1)
-        box = xywh2xyxy(box)  # center_x, center_y, width, height) to (x1, y1, x2, y2)
+
        if multi_label:
-            i, j = (cls > conf_thres).nonzero(as_tuple=False).T
+            i, j = torch.where(cls > conf_thres)
            x = torch.cat((box[i], x[i, 4 + j, None], j[:, None].float(), mask[i]), 1)
        else:  # best class only
            conf, j = cls.max(1, keepdim=True)
@ -241,7 +244,8 @@ def non_max_suppression(
        n = x.shape[0]  # number of boxes
        if not n:  # no boxes
            continue
-        x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence and remove excess boxes
+        if n > max_nms:  # excess boxes
+            x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence and remove excess boxes

        # Batched NMS
        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
--- a/ultralytics/yolo/utils/tuner.py
+++ b/ultralytics/yolo/utils/tuner.py
@ -1,44 +1,120 @@
 # Ultralytics YOLO 🚀, AGPL-3.0 license
+from ultralytics.yolo.cfg import TASK2DATA, TASK2METRIC
+from ultralytics.yolo.utils import DEFAULT_CFG_DICT, LOGGER, NUM_THREADS

-from ultralytics.yolo.utils import LOGGER
-
-try:
-    from ray import tune
-    from ray.air import RunConfig, session  # noqa
-    from ray.air.integrations.wandb import WandbLoggerCallback  # noqa
-    from ray.tune.schedulers import ASHAScheduler  # noqa
-    from ray.tune.schedulers import AsyncHyperBandScheduler as AHB  # noqa
-
-except ImportError:
-    LOGGER.info("Tuning hyperparameters requires ray/tune. Install using `pip install 'ray[tune]'`")
-    tune = None
-
-default_space = {
-    # 'optimizer': tune.choice(['SGD', 'Adam', 'AdamW', 'NAdam', 'RAdam', 'RMSProp']),
-    'lr0': tune.uniform(1e-5, 1e-1),
-    'lrf': tune.uniform(0.01, 1.0),  # final OneCycleLR learning rate (lr0 * lrf)
-    'momentum': tune.uniform(0.6, 0.98),  # SGD momentum/Adam beta1
-    'weight_decay': tune.uniform(0.0, 0.001),  # optimizer weight decay 5e-4
-    'warmup_epochs': tune.uniform(0.0, 5.0),  # warmup epochs (fractions ok)
-    'warmup_momentum': tune.uniform(0.0, 0.95),  # warmup initial momentum
-    'box': tune.uniform(0.02, 0.2),  # box loss gain
-    'cls': tune.uniform(0.2, 4.0),  # cls loss gain (scale with pixels)
-    'hsv_h': tune.uniform(0.0, 0.1),  # image HSV-Hue augmentation (fraction)
-    'hsv_s': tune.uniform(0.0, 0.9),  # image HSV-Saturation augmentation (fraction)
-    'hsv_v': tune.uniform(0.0, 0.9),  # image HSV-Value augmentation (fraction)
-    'degrees': tune.uniform(0.0, 45.0),  # image rotation (+/- deg)
-    'translate': tune.uniform(0.0, 0.9),  # image translation (+/- fraction)
-    'scale': tune.uniform(0.0, 0.9),  # image scale (+/- gain)
-    'shear': tune.uniform(0.0, 10.0),  # image shear (+/- deg)
-    'perspective': tune.uniform(0.0, 0.001),  # image perspective (+/- fraction), range 0-0.001
-    'flipud': tune.uniform(0.0, 1.0),  # image flip up-down (probability)
-    'fliplr': tune.uniform(0.0, 1.0),  # image flip left-right (probability)
-    'mosaic': tune.uniform(0.0, 1.0),  # image mixup (probability)
-    'mixup': tune.uniform(0.0, 1.0),  # image mixup (probability)
-    'copy_paste': tune.uniform(0.0, 1.0)}  # segment copy-paste (probability)
-
-task_metric_map = {
-    'detect': 'metrics/mAP50-95(B)',
-    'segment': 'metrics/mAP50-95(M)',
-    'classify': 'metrics/accuracy_top1',
-    'pose': 'metrics/mAP50-95(P)'}
+
+def run_ray_tune(model,
+                 space: dict = None,
+                 grace_period: int = 10,
+                 gpu_per_trial: int = None,
+                 max_samples: int = 10,
+                 **train_args):
+    """
+    Runs hyperparameter tuning using Ray Tune.
+
+    Args:
+        model (YOLO): Model to run the tuner on.
+        space (dict, optional): The hyperparameter search space. Defaults to None.
+        grace_period (int, optional): The grace period in epochs of the ASHA scheduler. Defaults to 10.
+        gpu_per_trial (int, optional): The number of GPUs to allocate per trial. Defaults to None.
+        max_samples (int, optional): The maximum number of trials to run. Defaults to 10.
+        train_args (dict, optional): Additional arguments to pass to the `train()` method. Defaults to {}.
+
+    Returns:
+        (dict): A dictionary containing the results of the hyperparameter search.
+
+    Raises:
+        ModuleNotFoundError: If Ray Tune is not installed.
+    """
+    if train_args is None:
+        train_args = {}
+
+    try:
+        from ray import tune
+        from ray.air import RunConfig
+        from ray.air.integrations.wandb import WandbLoggerCallback
+        from ray.tune.schedulers import ASHAScheduler
+    except ImportError:
+        raise ModuleNotFoundError("Tuning hyperparameters requires Ray Tune. Install with: pip install 'ray[tune]'")
+
+    try:
+        import wandb
+
+        assert hasattr(wandb, '__version__')
+    except (ImportError, AssertionError):
+        wandb = False
+
+    default_space = {
+        # 'optimizer': tune.choice(['SGD', 'Adam', 'AdamW', 'NAdam', 'RAdam', 'RMSProp']),
+        'lr0': tune.uniform(1e-5, 1e-1),
+        'lrf': tune.uniform(0.01, 1.0),  # final OneCycleLR learning rate (lr0 * lrf)
+        'momentum': tune.uniform(0.6, 0.98),  # SGD momentum/Adam beta1
+        'weight_decay': tune.uniform(0.0, 0.001),  # optimizer weight decay 5e-4
+        'warmup_epochs': tune.uniform(0.0, 5.0),  # warmup epochs (fractions ok)
+        'warmup_momentum': tune.uniform(0.0, 0.95),  # warmup initial momentum
+        'box': tune.uniform(0.02, 0.2),  # box loss gain
+        'cls': tune.uniform(0.2, 4.0),  # cls loss gain (scale with pixels)
+        'hsv_h': tune.uniform(0.0, 0.1),  # image HSV-Hue augmentation (fraction)
+        'hsv_s': tune.uniform(0.0, 0.9),  # image HSV-Saturation augmentation (fraction)
+        'hsv_v': tune.uniform(0.0, 0.9),  # image HSV-Value augmentation (fraction)
+        'degrees': tune.uniform(0.0, 45.0),  # image rotation (+/- deg)
+        'translate': tune.uniform(0.0, 0.9),  # image translation (+/- fraction)
+        'scale': tune.uniform(0.0, 0.9),  # image scale (+/- gain)
+        'shear': tune.uniform(0.0, 10.0),  # image shear (+/- deg)
+        'perspective': tune.uniform(0.0, 0.001),  # image perspective (+/- fraction), range 0-0.001
+        'flipud': tune.uniform(0.0, 1.0),  # image flip up-down (probability)
+        'fliplr': tune.uniform(0.0, 1.0),  # image flip left-right (probability)
+        'mosaic': tune.uniform(0.0, 1.0),  # image mixup (probability)
+        'mixup': tune.uniform(0.0, 1.0),  # image mixup (probability)
+        'copy_paste': tune.uniform(0.0, 1.0)}  # segment copy-paste (probability)
+
+    def _tune(config):
+        """
+        Trains the YOLO model with the specified hyperparameters and additional arguments.
+
+        Args:
+            config (dict): A dictionary of hyperparameters to use for training.
+
+        Returns:
+            None.
+        """
+        model._reset_callbacks()
+        config.update(train_args)
+        model.train(**config)
+
+    # Get search space
+    if not space:
+        space = default_space
+        LOGGER.warning('WARNING ⚠️ search space not provided, using default search space.')
+
+    # Get dataset
+    data = train_args.get('data', TASK2DATA[model.task])
+    space['data'] = data
+    if 'data' not in train_args:
+        LOGGER.warning(f'WARNING ⚠️ data not provided, using default "data={data}".')
+
+    # Define the trainable function with allocated resources
+    trainable_with_resources = tune.with_resources(_tune, {'cpu': NUM_THREADS, 'gpu': gpu_per_trial or 0})
+
+    # Define the ASHA scheduler for hyperparameter search
+    asha_scheduler = ASHAScheduler(time_attr='epoch',
+                                   metric=TASK2METRIC[model.task],
+                                   mode='max',
+                                   max_t=train_args.get('epochs') or DEFAULT_CFG_DICT['epochs'] or 100,
+                                   grace_period=grace_period,
+                                   reduction_factor=3)
+
+    # Define the callbacks for the hyperparameter search
+    tuner_callbacks = [WandbLoggerCallback(project='YOLOv8-tune')] if wandb else []
+
+    # Create the Ray Tune hyperparameter search tuner
+    tuner = tune.Tuner(trainable_with_resources,
+                       param_space=space,
+                       tune_config=tune.TuneConfig(scheduler=asha_scheduler, num_samples=max_samples),
+                       run_config=RunConfig(callbacks=tuner_callbacks, storage_path='./runs/tune'))
+
+    # Run the hyperparameter search
+    tuner.fit()
+
+    # Return the results of the hyperparameter search
+    return tuner.get_results()