`ultralytics 8.0.236` dataset semantic & SQL search API (#7136)

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: Laughing-q <1182102784@qq.com>
1 year ago · aca8eb1fd4
parent 40a5c0abe7
commit aca8eb1fd4
27 changed files with 1749 additions and 192 deletions
--- a/docs/ar/models/yolov5.md
+++ b/docs/ar/models/yolov5.md
@ -59,7 +59,7 @@ keywords: YOLOv5u، كشف الكائنات، النماذج المدربة مس
 ```python
 from ultralytics import YOLO

-#قم بتحميل نموذج YOLOv5n المدرب مسبقًا على مجموعة بيانات COCO
+# قم بتحميل نموذج YOLOv5n المدرب مسبقًا على مجموعة بيانات COCO
 model = YOLO('yolov5n.pt')

 # قم بعرض معلومات النموذج (اختياري)
--- a/docs/en/datasets/explorer/api.md
+++ b/docs/en/datasets/explorer/api.md
@ -0,0 +1,297 @@
+---
+comments: true
+description: Explore and analyze CV datasets with Ultralytics Explorer API, offering SQL, vector similarity, and semantic searches for efficient dataset insights.
+keywords: Ultralytics Explorer API, Dataset Exploration, SQL Queries, Vector Similarity Search, Semantic Search, Embeddings Table, Image Similarity, Python API for Datasets, CV Dataset Analysis, LanceDB Integration
+---
+
+# Ultralytics Explorer API
+
+## Introduction
+
+The Explorer API is a Python API for exploring your datasets. It supports filtering and searching your dataset using SQL queries, vector similarity search and semantic search.
+
+## Installation
+
+Explorer depends on external libraries for some of its functionality. These are automatically installed on usage. To manually install these dependencies, use the following command:
+
+```bash
+pip install ultralytics[explorer]
+```
+
+## Usage
+
+```python
+from ultralytics import Explorer
+
+# Create an Explorer object
+explorer = Explorer(data='coco128.yaml', model='yolov8n.pt')
+
+# Create embeddings for your dataset
+explorer.create_embeddings_table()
+
+# Search for similar images to a given image/images
+dataframe = explorer.get_similar(img='path/to/image.jpg')
+
+# Or search for similar images to a given index/indices
+dataframe = explorer.get_similar()(idx=0)
+```
+
+## 1. Similarity Search
+
+Similarity search is a technique for finding similar images to a given image. It is based on the idea that similar images will have similar embeddings.
+One the embeddings table is built, you can get run semantic search in any of the following ways:
+
+- On a given index / list of indices in the dataset like - `exp.get_similar(idx=[1,10], limit=10)`
+- On any image/ list of images not in the dataset - `exp.get_similar(img=["path/to/img1", "path/to/img2"], limit=10)`
+-
+
+In case of multiple inputs, the aggregate of their embeddings is used.
+
+You get a pandas dataframe with the `limit` number of most similar data points to the input, along with their distance in the embedding space. You can use this dataset to perform further filtering
+
+!!! Example "Semantic Search"
+
+    === "Using Images"
+
+        ```python
+        from ultralytics import Explorer
+
+        # create an Explorer object
+        exp = Explorer(data='coco128.yaml', model='yolov8n.pt')
+        exp.create_embeddings_table()
+
+        similar = exp.get_similar(img='https://ultralytics.com/images/bus.jpg', limit=10)
+        print(similar.head())
+
+        # Search using multiple indices
+        similar = exp.get_similar(
+                                img=['https://ultralytics.com/images/bus.jpg',
+                                     'https://ultralytics.com/images/bus.jpg'],
+                                limit=10
+                                )
+        print(similar.head())
+        ```
+
+    === "Using Dataset Indices"
+
+        ```python
+        from ultralytics import Explorer
+
+        # create an Explorer object
+        exp = Explorer(data='coco128.yaml', model='yolov8n.pt')
+        exp.create_embeddings_table()
+
+        similar = exp.get_similar(idx=1, limit=10)
+        print(similar.head())
+
+        # Search using multiple indices
+        similar = exp.get_similar(idx=[1,10], limit=10)
+        print(similar.head())
+        ```
+
+### Plotting Similar Images
+
+You can also plot the similar images using the `plot_similar` method. This method takes the same arguments as `get_similar` and plots the similar images in a grid.
+
+!!! Example "Plotting Similar Images"
+
+    === "Using Images"
+
+        ```python
+        from ultralytics import Explorer
+
+        # create an Explorer object
+        exp = Explorer(data='coco128.yaml', model='yolov8n.pt')
+        exp.create_embeddings_table()
+
+        plt = exp.plot_similar(img='https://ultralytics.com/images/bus.jpg', limit=10)
+        plt.show()
+        ```
+
+    === "Using Dataset Indices"
+
+        ```python
+        from ultralytics import Explorer
+
+        # create an Explorer object
+        exp = Explorer(data='coco128.yaml', model='yolov8n.pt')
+        exp.create_embeddings_table()
+
+        plt = exp.plot_similar(idx=1, limit=10)
+        plt.show()
+        ```
+
+## 2. SQL Querying
+
+You can run SQL queries on your dataset using the `sql_query` method. This method takes a SQL query as input and returns a pandas dataframe with the results.
+
+!!! Example "SQL Query"
+
+    ```python
+    from ultralytics import Explorer
+
+    # create an Explorer object
+    exp = Explorer(data='coco128.yaml', model='yolov8n.pt')
+    exp.create_embeddings_table()
+
+    df = exp.sql_query("WHERE labels LIKE '%person%' AND labels LIKE '%dog%'")
+    print(df.head())
+    ```
+
+### Plotting SQL Query Results
+
+You can also plot the results of a SQL query using the `plot_sql_query` method. This method takes the same arguments as `sql_query` and plots the results in a grid.
+
+!!! Example "Plotting SQL Query Results"
+
+    ```python
+    from ultralytics import Explorer
+
+    # create an Explorer object
+    exp = Explorer(data='coco128.yaml', model='yolov8n.pt')
+    exp.create_embeddings_table()
+
+    df = exp.sql_query("WHERE labels LIKE '%person%' AND labels LIKE '%dog%'")
+    print(df.head())
+    ```
+
+## 3. Working with embeddings Table (Advanced)
+
+You can also work with the embeddings table directly. Once the embeddings table is created, you can access it using the `Explorer.table`
+
+!!! Tip "Explorer works on [LanceDB](https://lancedb.github.io/lancedb/) tables internally. You can access this table directly, using `Explorer.table` object and run raw queries, push down pre- and post-filters, etc."
+
+    ```python
+    from ultralytics import Explorer
+
+    exp = Explorer()
+    exp.create_embeddings_table()
+    table = exp.table
+    ```
+
+Here are some examples of what you can do with the table:
+
+### Get raw Embeddings
+
+!!! Example
+
+    ```python
+    from ultralytics import Explorer
+
+    exp = Explorer()
+    exp.create_embeddings_table()
+    table = exp.table
+
+    embeddings = table.to_pandas()["vector"]
+    print(embeddings)
+    ```
+
+### Advanced Querying with pre and post filters
+
+!!! Example
+
+    ```python
+    from ultralytics import Explorer
+
+    exp = Explorer(model="yolov8n.pt")
+    exp.create_embeddings_table()
+    table = exp.table
+
+    # Dummy embedding
+    embedding = [i for i in range(256)]
+    rs = table.search(embedding).metric("cosine").where("").limit(10)
+    ```
+
+### Create Vector Index
+
+When using large datasets, you can also create a dedicated vector index for faster querying. This is done using the `create_index` method on LanceDB table.
+
+```python
+    table.create_index(num_partitions=..., num_sub_vectors=...)
+```
+
+Find more details on the type vector indices available and parameters [here](https://lancedb.github.io/lancedb/ann_indexes/#types-of-index)
+In the future, we will add support for creating vector indices directly from Explorer API.
+
+## 4. Embeddings Applications
+
+You can use the embeddings table to perform a variety of exploratory analysis. Here are some examples:
+
+### Similarity Index
+
+Explorer comes with a `similarity_index` operation:
+
+* It tries to estimate how similar each data point is with the rest of the dataset.
+* It does that by counting how many image embeddings lie closer than `max_dist` to the current image in the generated embedding space, considering `top_k` similar images at a time.
+
+It returns a pandas dataframe with the following columns:
+
+* `idx`: Index of the image in the dataset
+* `im_file`: Path to the image file
+* `count`: Number of images in the dataset that are closer than `max_dist` to the current image
+* `sim_im_files`: List of paths to the `count` similar images
+
+!!! Tip
+
+    For a given dataset, model, `max_dist` & `top_k` the similarity index once generated will be reused. In case, your dataset has changed, or you simply need to regenerate the similarity index, you can pass `force=True`.
+
+!!! Example "Similarity Index"
+
+    ```python
+    from ultralytics import Explorer
+
+    exp = Explorer()
+    exp.create_embeddings_table()
+
+    sim_idx = exp.similarity_index()
+    ```
+
+You can use similarity index to build custom conditions to filter out the dataset. For example, you can filter out images that are not similar to any other image in the dataset using the following code:
+
+```python
+import numpy as np
+
+sim_count = np.array(sim_idx["count"])
+sim_idx['im_file'][sim_count > 30]
+```
+
+### Visualize Embedding Space
+
+You can also visualize the embedding space using the plotting tool of your choice. For example here is a simple example using matplotlib:
+
+```python
+import numpy as np
+from sklearn.decomposition import PCA
+import matplotlib.pyplot as plt
+from mpl_toolkits.mplot3d import Axes3D
+
+# Reduce dimensions using PCA to 3 components for visualization in 3D
+pca = PCA(n_components=3)
+reduced_data = pca.fit_transform(embeddings)
+
+# Create a 3D scatter plot using Matplotlib Axes3D
+fig = plt.figure(figsize=(8, 6))
+ax = fig.add_subplot(111, projection='3d')
+
+# Scatter plot
+ax.scatter(reduced_data[:, 0], reduced_data[:, 1], reduced_data[:, 2], alpha=0.5)
+ax.set_title('3D Scatter Plot of Reduced 256-Dimensional Data (PCA)')
+ax.set_xlabel('Component 1')
+ax.set_ylabel('Component 2')
+ax.set_zlabel('Component 3')
+
+plt.show()
+```
+
+Start creating your own CV dataset exploration reports using the Explorer API. For inspiration, check out the
+
+# Apps Built Using Ultralytics Explorer
+
+Try our GUI Demo based on Explorer API
+
+# Coming Soon
+
+- [ ] Merge specific labels from datasets. Example - Import all `person` labels from COCO and `car` labels from Cityscapes
+- [ ] Remove images that have a higher similarity index than the given threshold
+- [ ] Automatically persist new datasets after merging/removing entries
+- [ ] Advanced Dataset Visualizations
--- a/docs/en/datasets/explorer/dash.md
+++ b/docs/en/datasets/explorer/dash.md
--- a/docs/en/datasets/explorer/explorer.ipynb
+++ b/docs/en/datasets/explorer/explorer.ipynb
@ -0,0 +1,457 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "aa923c26-81c8-4565-9277-1cb686e3702e",
+   "metadata": {},
+   "source": [
+    "# VOC Exploration Example \n",
+    "<div align=\"center\">\n",
+    "\n",
+    "  <a href=\"https://ultralytics.com/yolov8\" target=\"_blank\">\n",
+    "    <img width=\"1024\", src=\"https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png\"></a>\n",
+    "\n",
+    "  [中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [हिन्दी](https://docs.ultralytics.com/hi/) | [العربية](https://docs.ultralytics.com/ar/)\n",
+    "\n",
+    "  <a href=\"https://console.paperspace.com/github/ultralytics/ultralytics\"><img src=\"https://assets.paperspace.io/img/gradient-badge.svg\" alt=\"Run on Gradient\"/></a>\n",
+    "  <a href=\"https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/tutorial.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
+    "  <a href=\"https://www.kaggle.com/ultralytics/yolov8\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" alt=\"Open In Kaggle\"></a>\n",
+    "\n",
+    "Welcome to the Ultralytics Explorer API notebook!  This notebook serves as the starting point for exploring the various resources available to help you get started with using Ultralytics to explore your datasets using with the power of semantic search. You can utilities out of the box that allow you to examine specific types of labels using vector search or even SQL queries.\n",
+    "\n",
+    "We hope that the resources in this notebook will help you get the most out of Ultralytics. Please browse the Explorer <a href=\"https://docs.ultralytics.com/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics\">GitHub</a> for support, and join our <a href=\"https://ultralytics.com/discord\">Discord</a> community for questions and discussions!\n",
+    "\n",
+    "Try `yolo explorer` powered by Exlorer API\n",
+    "\n",
+    "Simply `pip install ultralytics` and run `yolo explorer` in your terminal to run custom queries and semantic search on your datasets right inside your browser!\n",
+    "\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2454d9ba-9db4-4b37-98e8-201ba285c92f",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "433f3a4d-a914-42cb-b0b6-be84a84e5e41",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install ultralytics\n",
+    "ultralytics.checks()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ae602549-3419-4909-9f82-35cba515483f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from ultralytics import Explorer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d8c06350-be8e-45cf-b3a6-b5017bbd943c",
+   "metadata": {},
+   "source": [
+    "# Similarity search\n",
+    "Utilize the power of vector similarity search to find the similar data points in your dataset along with their distance in the embedding space. Simply create an embeddings table for the given dataset-model pair. It is only needed once and it is reused automatically.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "334619da-6deb-4b32-9fe0-74e0a79cee20",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exp = Explorer(\"VOC.yaml\", model=\"yolov8n.pt\")\n",
+    "exp.create_embeddings_table()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6c5e42d-bc7e-4b4c-bde0-643072a2165d",
+   "metadata": {},
+   "source": [
+    "One the embeddings table is built, you can get run semantic search in any of the following ways:\n",
+    "- On a given index / list of indices in the dataset like - `exp.get_similar(idx=[1,10], limit=10)`\n",
+    "- On any image/ list of images not in the dataset  - `exp.get_similar(img=[\"path/to/img1\", \"path/to/img2\"], limit=10)`\n",
+    "In case of multiple inputs, the aggregade of their embeddings is used.\n",
+    "\n",
+    "You get a pandas dataframe with the `limit` number of most similar data points to the input, along with their distance in the embedding space. You can use this dataset to perform further filtering\n",
+    "<img width=\"1120\" alt=\"Screenshot 2024-01-06 at 9 45 42 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/7742ac57-e22a-4cea-a0f9-2b2a257483c5\">\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b485f05b-d92d-42bc-8da7-5e361667b341",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "similar = exp.get_similar(idx=1, limit=10)\n",
+    "similar.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "acf4b489-2161-4176-a1fe-d1d067d8083d",
+   "metadata": {},
+   "source": [
+    "You can use the also plot the similar samples directly using the `plot_similar` util\n",
+    "<img width=\"689\" alt=\"Screenshot 2024-01-06 at 9 46 48 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/70e1a4c4-6c67-4664-b77a-ad27b1fba8f8\">\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9dbfe7d0-8613-4529-adb6-6e0632d7cce7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exp.plot_similar(idx=6500, limit=20)\n",
+    "#exp.plot_similar(idx=[100,101], limit=10) # Can also pass list of idxs or imgs\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "260e09bf-4960-4089-a676-cb0e76ff3c0d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exp.plot_similar(img=\"https://ultralytics.com/images/bus.jpg\", limit=10, labels=False) # Can also pass any external images\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "faa0b7a7-6318-40e4-b0f4-45a8113bdc3a",
+   "metadata": {},
+   "source": [
+    "<p>\n",
+    "<img width=\"766\" alt=\"Screenshot 2024-01-06 at 10 05 10 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/faa9c544-d96b-4528-a2ea-95c5d8856744\">\n",
+    "\n",
+    "</p>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35315ae6-d827-40e4-8813-279f97a83b34",
+   "metadata": {},
+   "source": [
+    "## 2. Run SQL queries on your Dataset!\n",
+    "Sometimes you might want to investigate a certain type of entries in your dataset. For this Explorer allows you to execute SQL queries.\n",
+    "It accepts either of the formats:\n",
+    "- Queries beginning with \"WHERE\" will automatically select all columns. This can be thought of as a short-hand query\n",
+    "- You can also write full queries where you can specify which columns to select\n",
+    "\n",
+    "This can be used to investigate model performance and specific data points. For example:\n",
+    "- let's say your model struggles on images that have humans and dogs. You can write a query like this to select the points that have at least 2 humans AND at least one dog.\n",
+    "\n",
+    "You can combine SQL query and semantic search to filter down to specific type of results\n",
+    "<img width=\"994\" alt=\"Screenshot 2024-01-06 at 9 47 30 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/92bc3178-c151-4cd5-8007-c76178deb113\">\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8cd1072f-3100-4331-a0e3-4e2f6b1005bf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "table = exp.sql_query(\"WHERE labels LIKE '%person, person%' AND labels LIKE '%dog%' LIMIT 10\")\n",
+    "table"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "debf8a00-c9f6-448b-bd3b-454cf62f39ab",
+   "metadata": {},
+   "source": [
+    "Just like similarity search, you also get a util to directly plot the sql queries using `exp.plot_sql_query`\n",
+    "<img width=\"771\" alt=\"Screenshot 2024-01-06 at 9 48 08 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/332f5acd-3a4e-462d-a281-5d5effd1886e\">\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "18b977e7-d048-4b22-b8c4-084a03b04f23",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exp.plot_sql_query(\"WHERE labels LIKE '%person, person%' AND labels LIKE '%dog%' LIMIT 10\", labels=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f26804c5-840b-4fd1-987f-e362f29e3e06",
+   "metadata": {},
+   "source": [
+    "## 3. Working with embeddings Table (Advanced)\n",
+    "Explorer works on [LanceDB](https://lancedb.github.io/lancedb/) tables internally. You can access this table directly, using `Explorer.table` object and run raw queries, push down pre and post filters, etc."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ea69260a-3407-40c9-9f42-8b34a6e6af7a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "table = exp.table\n",
+    "table.schema"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "238db292-8610-40b3-9af7-dfd6be174892",
+   "metadata": {},
+   "source": [
+    "### Run raw queries\n",
+    "Vector Search finds the nearest vectors from the database. In a recommendation system or search engine, you can find similar products from the one you searched. In LLM and other AI applications, each data point can be presented by the embeddings generated from some models, it returns the most relevant features.\n",
+    "\n",
+    "A search in high-dimensional vector space, is to find K-Nearest-Neighbors (KNN) of the query vector.\n",
+    "\n",
+    "Metric\n",
+    "In LanceDB, a Metric is the way to describe the distance between a pair of vectors. Currently, it supports the following metrics:\n",
+    "- L2\n",
+    "- Cosine\n",
+    "- Dot\n",
+    "Explorer's similarity search uses L2 by default. You can run queries on tables directly, or use the lance format to build custom utilities to manage datasets. More details on available LanceDB table ops in the [docs](https://lancedb.github.io/lancedb/)\n",
+    "\n",
+    "<img width=\"1015\" alt=\"Screenshot 2024-01-06 at 9 48 35 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/a2ccdaf3-8877-4f70-bf47-8a9bd2bb20c0\">\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d74430fe-5aee-45a1-8863-3f2c31338792",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dummy_img_embedding = [i for i in range(256)] \n",
+    "table.search(dummy_img_embedding).limit(5).to_pandas()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "587486b4-0d19-4214-b994-f032fb2e8eb5",
+   "metadata": {},
+   "source": [
+    "### Inter-conversion to popular data formats"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bb2876ea-999b-4eba-96bc-c196ba02c41c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = table.to_pandas()\n",
+    "pa_table = table.to_arrow()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42659d63-ad76-49d6-8dfc-78d77278db72",
+   "metadata": {},
+   "source": [
+    "### Work with Embeddings\n",
+    "You can access the raw embedding from lancedb Table and analyse it. The image embeddings are stored in column `vector`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "66d69e9b-046e-41c8-80d7-c0ee40be3bca",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "embeddings = table.to_pandas()[\"vector\"].tolist()\n",
+    "embeddings = np.array(embeddings)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8df0a49-9596-4399-954b-b8ae1fd7a602",
+   "metadata": {},
+   "source": [
+    "### Scatterplot\n",
+    "One of the preliminary steps in analysing embeddings is by plotting them in 2D space via dimensionality reduction. Let's try an example\n",
+    "\n",
+    "<img width=\"646\" alt=\"Screenshot 2024-01-06 at 9 48 58 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/9e1da25c-face-4426-abc0-2f64a4e4952c\">\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d9a150e8-8092-41b3-82f8-2247f8187fc8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install scikit-learn --q"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "196079c3-45a9-4325-81ab-af79a881e37a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "import numpy as np\n",
+    "from sklearn.decomposition import PCA\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "\n",
+    "# Reduce dimensions using PCA to 3 components for visualization in 3D\n",
+    "pca = PCA(n_components=3)\n",
+    "reduced_data = pca.fit_transform(embeddings)\n",
+    "\n",
+    "# Create a 3D scatter plot using Matplotlib's Axes3D\n",
+    "fig = plt.figure(figsize=(8, 6))\n",
+    "ax = fig.add_subplot(111, projection='3d')\n",
+    "\n",
+    "# Scatter plot\n",
+    "ax.scatter(reduced_data[:, 0], reduced_data[:, 1], reduced_data[:, 2], alpha=0.5)\n",
+    "ax.set_title('3D Scatter Plot of Reduced 256-Dimensional Data (PCA)')\n",
+    "ax.set_xlabel('Component 1')\n",
+    "ax.set_ylabel('Component 2')\n",
+    "ax.set_zlabel('Component 3')\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c843c23-e3f2-490e-8d6c-212fa038a149",
+   "metadata": {},
+   "source": [
+    "## 4. Similarity Index\n",
+    "Here's a simple example of an operation powered by the embeddings table. Explorer comes with a `similarity_index` operation-\n",
+    "* It tries to estimate how similar each data point is with the rest of the dataset.\n",
+    "*  It does that by counting how many image embeddings lie closer than `max_dist` to the current image in the generated embedding space, considering `top_k` similar images at a time.\n",
+    "\n",
+    "For a given dataset, model, `max_dist` & `top_k` the similarity index once generated will be reused. In case, your dataset has changed, or you simply need to regenerate the similarity index, you can pass `force=True`.\n",
+    "Similar to vector and SQL search, this also comes with a util to directly plot it. Let's look at the plot first\n",
+    "<img width=\"633\" alt=\"Screenshot 2024-01-06 at 9 49 36 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/96a9d984-4a72-4784-ace1-428676ee2bdd\">\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "953c2a5f-1b61-4acf-a8e4-ed08547dbafc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exp.plot_similarity_index(max_dist=0.2, top_k=0.01)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28228a9a-b727-45b5-8ca7-8db662c0b937",
+   "metadata": {},
+   "source": [
+    "Now let's look at the output of the operation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f4161aaa-20e6-4df0-8e87-d2293ee0530a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "sim_idx = exp.similarity_index(max_dist=0.2, top_k=0.01, force=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b01d5b1a-9adb-4c3c-a873-217c71527c8d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sim_idx"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22b28e54-4fbb-400e-ad8c-7068cbba11c4",
+   "metadata": {},
+   "source": [
+    "Let's create a query to see what data points have similarity count of more than 30 and plot images similar to them."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "58d2557b-d401-43cf-937d-4f554c7bc808",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "sim_count = np.array(sim_idx[\"count\"])\n",
+    "sim_idx['im_file'][sim_count > 30]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5ec8d76-271a-41ab-ac74-cf8c0084ba5e",
+   "metadata": {},
+   "source": [
+    "You should see something like this\n",
+    "<img width=\"897\" alt=\"Screenshot 2024-01-06 at 9 50 48 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/5d3f0e35-2ad4-4a67-8df7-3a4c17867b72\">\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3a7b2ee3-9f35-48a2-9c38-38379516f4d2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exp.plot_similar(idx=[7146, 14035]) # Using avg embeddings of 2 images"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/en/datasets/explorer/index.md
+++ b/docs/en/datasets/explorer/index.md
@ -0,0 +1,31 @@
+---
+comments: true
+description: Discover the Ultralytics Explorer, a versatile tool and Python API for CV dataset exploration, enabling semantic search, SQL queries, and vector similarity searches.
+keywords: Ultralytics Explorer, CV Dataset Tools, Semantic Search, SQL Dataset Queries, Vector Similarity, Python API, GUI Explorer, Dataset Analysis, YOLO Explorer, Data Insights
+---
+
+# Ultralytics Explorer
+
+Ultralytics Explorer is a tool for exploring CV datasets using semantic search, SQL queries and vector similarity search. It is also a Python API for accessing the same functionality.
+
+### Installation of optional dependencies
+
+Explorer depends on external libraries for some of its functionality. These are automatically installed on usage. To manually install these dependencies, use the following command:
+
+```bash
+pip install ultralytics[explorer]
+```
+
+## GUI Explorer Usage
+
+The GUI demo runs in your browser allowing you to create embeddings for your dataset and search for similar images, run SQL queries and perform semantic search. It can be run using the following command:
+
+```bash
+yolo explorer
+```
+
+### Explorer API
+
+This is a Python API for Exploring your datasets. It also powers the GUI Explorer. You can use this to create your own exploratory notebooks or scripts to get insights into your datasets.
+
+Learn more about the Explorer API [here](api.md).
--- a/docs/en/datasets/index.md
+++ b/docs/en/datasets/index.md
@ -8,6 +8,13 @@ keywords: computer vision, datasets, Ultralytics, YOLO, object detection, instan

 Ultralytics provides support for various datasets to facilitate computer vision tasks such as detection, instance segmentation, pose estimation, classification, and multi-object tracking. Below is a list of the main Ultralytics datasets, followed by a summary of each computer vision task and the respective datasets.

+## 🌟 New: Ultralytics Explorer 🌟
+
+Create embeddings for your dataset, search for similar images, run SQL queries and perform semantic search. You can get started with our GUI app or build your own using the API. Learn more [here](explorer/index.md).
+
+- Try the [GUI Demo](explorer/index.md)
+- Learn more about the [Explorer API](explorer/index.md)
+
 ## [Detection Datasets](detect/index.md)

 Bounding box object detection is a computer vision technique that involves detecting and localizing objects in an image by drawing a bounding box around each object.
--- a/docs/en/guides/distance-calculation.md
+++ b/docs/en/guides/distance-calculation.md
@ -65,7 +65,6 @@ Measuring the gap between two objects is known as distance calculation within a
    - Mouse Right Click will delete all drawn points
    - Mouse Left Click can be used to draw points

-
 ### Optional Arguments `set_args`

 | Name           | Type   | Default         | Description                                            |
--- a/docs/en/guides/heatmaps.md
+++ b/docs/en/guides/heatmaps.md
@ -25,7 +25,7 @@ A heatmap generated with [Ultralytics YOLOv8](https://github.com/ultralytics/ult

 - **Intuitive Data Distribution Visualization:** Heatmaps simplify the comprehension of data concentration and distribution, converting complex datasets into easy-to-understand visual formats.
 - **Efficient Pattern Detection:** By visualizing data in heatmap format, it becomes easier to spot trends, clusters, and outliers, facilitating quicker analysis and insights.
- **Enhanced Spatial Analysis and Decision Making:** Heatmaps are instrumental in illustrating spatial relationships, aiding in decision-making processes in sectors such as business intelligence, environmental studies, and urban planning.
+- **Enhanced Spatial Analysis and Decision-Making:** Heatmaps are instrumental in illustrating spatial relationships, aiding in decision-making processes in sectors such as business intelligence, environmental studies, and urban planning.

 ## Real World Applications

@ -34,12 +34,11 @@ A heatmap generated with [Ultralytics YOLOv8](https://github.com/ultralytics/ult
 | ![Ultralytics YOLOv8 Transportation Heatmap](https://github.com/RizwanMunawar/ultralytics/assets/62513924/288d7053-622b-4452-b4e4-1f41aeb764aa) | ![Ultralytics YOLOv8 Retail Heatmap](https://github.com/RizwanMunawar/ultralytics/assets/62513924/edef75ad-50a7-4c0a-be4a-a66cdfc12802) |
 |                                                    Ultralytics YOLOv8 Transportation Heatmap                                                    |                                                    Ultralytics YOLOv8 Retail Heatmap                                                    |

+!!! tip "Heatmap Configuration"

-???+ tip "Heatmap Configuration"
    - `heatmap_alpha`: Ensure this value is within the range (0.0 - 1.0).
    - `decay_factor`: Used for removing heatmap after an object is no longer in the frame, its value should also be in the range (0.0 - 1.0).

-
 !!! Example "Heatmaps using Ultralytics YOLOv8 Example"

    === "Heatmap"
--- a/docs/en/guides/object-counting.md
+++ b/docs/en/guides/object-counting.md
@ -167,7 +167,6 @@ Object counting with [Ultralytics YOLOv8](https://github.com/ultralytics/ultraly

 ### Optional Arguments `set_args`

-
 | Name                | Type        | Default                    | Description                                   |
 |---------------------|-------------|----------------------------|-----------------------------------------------|
 | view_img            | `bool`      | `False`                    | Display frames with counts                    |
--- a/docs/en/guides/speed-estimation.md
+++ b/docs/en/guides/speed-estimation.md
@ -73,17 +73,16 @@ Speed estimation is the process of calculating the rate of movement of an object

    Speed will be an estimate and may not be completely accurate. Additionally, the estimation can vary depending on GPU speed.

-
 ### Optional Arguments `set_args`

-| Name                | Type        | Default                    | Description                                       |
-|---------------------|-------------|----------------------------|---------------------------------------------------|
-| reg_pts             | `list`      | `[(20, 400), (1260, 400)]` | Points defining the Region Area                   |
-| names               | `dict`      | `None`                     | Classes names                                     |
-| view_img            | `bool`      | `False`                    | Display frames with counts                        |
-| line_thickness      | `int`       | `2`                        | Increase bounding boxes thickness                 |
-| region_thickness    | `int`       | `5`                        | Thickness for object counter region or line       |
-| spdl_dist_thresh    | `int`       | `10`                       | Euclidean Distance threshold for speed check line |
+| Name             | Type   | Default                    | Description                                       |
+|------------------|--------|----------------------------|---------------------------------------------------|
+| reg_pts          | `list` | `[(20, 400), (1260, 400)]` | Points defining the Region Area                   |
+| names            | `dict` | `None`                     | Classes names                                     |
+| view_img         | `bool` | `False`                    | Display frames with counts                        |
+| line_thickness   | `int`  | `2`                        | Increase bounding boxes thickness                 |
+| region_thickness | `int`  | `5`                        | Thickness for object counter region or line       |
+| spdl_dist_thresh | `int`  | `10`                       | Euclidean Distance threshold for speed check line |

 ### Arguments `model.track`

--- a/docs/en/index.md
+++ b/docs/en/index.md
@ -39,12 +39,17 @@ Introducing [Ultralytics](https://ultralytics.com) [YOLOv8](https://github.com/u

 Explore the YOLOv8 Docs, a comprehensive resource designed to help you understand and utilize its features and capabilities. Whether you are a seasoned machine learning practitioner or new to the field, this hub aims to maximize YOLOv8's potential in your projects

+# 🌟 New: Ultralytics Explorer 🌟
+
+Create embeddings for your dataset, search for similar images, run SQL queries and perform semantic search. You can get started with our GUI app or build your own using the API. Learn more [here](datasets/explorer/index.md).
+
 ## Where to Start

 - **Install** `ultralytics` with pip and get up and running in minutes &nbsp; [:material-clock-fast: Get Started](quickstart.md){ .md-button }
 - **Predict** new images and videos with YOLOv8 &nbsp; [:octicons-image-16: Predict on Images](modes/predict.md){ .md-button }
 - **Train** a new YOLOv8 model on your own custom dataset &nbsp; [:fontawesome-solid-brain: Train a Model](modes/train.md){ .md-button }
- **Explore** YOLOv8 tasks like segment, classify, pose and track &nbsp; [:material-magnify-expand: Explore Tasks](tasks/index.md){ .md-button }
+- **Tasks** YOLOv8 tasks like segment, classify, pose and track &nbsp; [:material-magnify-expand: Explore Tasks](tasks/index.md){ .md-button }
+- **Explore** datasets with advanced semantic and SQL search &nbsp; [:material-magnify-expand: Run Explorer](datasets/explorer/index.md){ .md-button }

 <p align="center">
  <br>
--- a/docs/en/integrations/comet.md
+++ b/docs/en/integrations/comet.md
@ -133,6 +133,7 @@ You can control the number of image predictions that Comet ML logs during your e

 ```python
 import os
+
 os.environ["COMET_MAX_IMAGE_PREDICTIONS"] = "200"
 ```

@ -142,6 +143,7 @@ Comet ML allows you to specify how often batches of image predictions are logged

 ```python
 import os
+
 os.environ['COMET_EVAL_BATCH_LOGGING_INTERVAL'] = "4"
 ```

@ -151,6 +153,7 @@ In some cases, you may not want to log the confusion matrix from your validation

 ```python
 import os
+
 os.environ["COMET_EVAL_LOG_CONFUSION_MATRIX"] = "false"
 ```

@ -160,6 +163,7 @@ If you find yourself in a situation where internet access is limited, Comet ML p

 ```python
 import os
+
 os.environ["COMET_MODE"] = "offline"
 ```

--- a/docs/en/models/yolov8.md
+++ b/docs/en/models/yolov8.md
@ -38,11 +38,11 @@ Each variant of the YOLOv8 series is optimized for its respective task, ensuring

 | Model       | Filenames                                                                                                      | Task                                         | Inference | Validation | Training | Export |
 |-------------|----------------------------------------------------------------------------------------------------------------|----------------------------------------------|-----------|------------|----------|--------|
-| YOLOv8      | `yolov8n.pt` `yolov8s.pt` `yolov8m.pt` `yolov8l.pt` `yolov8x.pt`                                               | [Detection](../tasks/detect.md)              | ✅        | ✅         | ✅       | ✅     |
-| YOLOv8-seg  | `yolov8n-seg.pt` `yolov8s-seg.pt` `yolov8m-seg.pt` `yolov8l-seg.pt` `yolov8x-seg.pt`                           | [Instance Segmentation](../tasks/segment.md) | ✅        | ✅         | ✅       | ✅     |
-| YOLOv8-pose | `yolov8n-pose.pt` `yolov8s-pose.pt` `yolov8m-pose.pt` `yolov8l-pose.pt` `yolov8x-pose.pt` `yolov8x-pose-p6.pt` | [Pose/Keypoints](../tasks/pose.md)           | ✅        | ✅         | ✅       | ✅     |
-| YOLOv8-obb  | `yolov8n-obb.pt` `yolov8s-obb.pt` `yolov8m-obb.pt` `yolov8l-obb.pt` `yolov8x-obb.pt`                           | [Oriented Detection](../tasks/obb.md)        | ✅        | ✅         | ✅       | ✅     |
-| YOLOv8-cls  | `yolov8n-cls.pt` `yolov8s-cls.pt` `yolov8m-cls.pt` `yolov8l-cls.pt` `yolov8x-cls.pt`                           | [Classification](../tasks/classify.md)       | ✅        | ✅         | ✅       | ✅     |
+| YOLOv8      | `yolov8n.pt` `yolov8s.pt` `yolov8m.pt` `yolov8l.pt` `yolov8x.pt`                                               | [Detection](../tasks/detect.md)              | ✅         | ✅          | ✅        | ✅      |
+| YOLOv8-seg  | `yolov8n-seg.pt` `yolov8s-seg.pt` `yolov8m-seg.pt` `yolov8l-seg.pt` `yolov8x-seg.pt`                           | [Instance Segmentation](../tasks/segment.md) | ✅         | ✅          | ✅        | ✅      |
+| YOLOv8-pose | `yolov8n-pose.pt` `yolov8s-pose.pt` `yolov8m-pose.pt` `yolov8l-pose.pt` `yolov8x-pose.pt` `yolov8x-pose-p6.pt` | [Pose/Keypoints](../tasks/pose.md)           | ✅         | ✅          | ✅        | ✅      |
+| YOLOv8-obb  | `yolov8n-obb.pt` `yolov8s-obb.pt` `yolov8m-obb.pt` `yolov8l-obb.pt` `yolov8x-obb.pt`                           | [Oriented Detection](../tasks/obb.md)        | ✅         | ✅          | ✅        | ✅      |
+| YOLOv8-cls  | `yolov8n-cls.pt` `yolov8s-cls.pt` `yolov8m-cls.pt` `yolov8l-cls.pt` `yolov8x-cls.pt`                           | [Classification](../tasks/classify.md)       | ✅         | ✅          | ✅        | ✅      |

 This table provides an overview of the YOLOv8 model variants, highlighting their applicability in specific tasks and their compatibility with various operational modes such as Inference, Validation, Training, and Export. It showcases the versatility and robustness of the YOLOv8 series, making them suitable for a variety of applications in computer vision.

--- a/docs/en/tasks/obb.md
+++ b/docs/en/tasks/obb.md
@ -35,6 +35,7 @@ YOLOv8 pretrained Obb models are shown here, which are pretrained on the [DOTAv1
 | [YOLOv8x-obb](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8x-obb.pt) | 1024                  | <++>              | <++>                           | <++>                                | 69.5               | 676.7             |

 <!-- TODO: should we report multi-scale results only as they're better or both multi-scale and single-scale. -->
+
 - **mAP<sup>val</sup>** values are for single-model single-scale on [DOTAv1 test](http://cocodataset.org) dataset.
  <br>Reproduce by `yolo val obb data=DOTAv1.yaml device=0`
 - **Speed** averaged over DOTAv1 val images using an [Amazon EC2 P4d](https://aws.amazon.com/ec2/instance-types/p4/)
@ -76,7 +77,7 @@ Train YOLOv8n-obb on the dota128.yaml dataset for 100 epochs at image size 640.

 ### Dataset format

-yolo obb dataset format can be found in detail in the [Dataset Guide](../datasets/obb/index.md)..
+OBB dataset format can be found in detail in the [Dataset Guide](../datasets/obb/index.md).

 ## Val

@ -164,18 +165,18 @@ Available YOLOv8-obb export formats are in the table below. You can predict or v

 | Format                                                             | `format` Argument | Model                         | Metadata | Arguments                                           |
 |--------------------------------------------------------------------|-------------------|-------------------------------|----------|-----------------------------------------------------|
-| [PyTorch](https://pytorch.org/)                                    | -                 | `yolov8n-obb.pt`              | ✅       | -                                                   |
-| [TorchScript](https://pytorch.org/docs/stable/jit.html)            | `torchscript`     | `yolov8n-obb.torchscript`     | ✅       | `imgsz`, `optimize`                                 |
-| [ONNX](https://onnx.ai/)                                           | `onnx`            | `yolov8n-obb.onnx`            | ✅       | `imgsz`, `half`, `dynamic`, `simplify`, `opset`     |
-| [OpenVINO](https://docs.openvino.ai/latest/index.html)             | `openvino`        | `yolov8n-obb_openvino_model/` | ✅       | `imgsz`, `half`                                     |
-| [TensorRT](https://developer.nvidia.com/tensorrt)                  | `engine`          | `yolov8n-obb.engine`          | ✅       | `imgsz`, `half`, `dynamic`, `simplify`, `workspace` |
-| [CoreML](https://github.com/apple/coremltools)                     | `coreml`          | `yolov8n-obb.mlpackage`       | ✅       | `imgsz`, `half`, `int8`, `nms`                      |
-| [TF SavedModel](https://www.tensorflow.org/guide/saved_model)      | `saved_model`     | `yolov8n-obb_saved_model/`    | ✅       | `imgsz`, `keras`                                    |
-| [TF GraphDef](https://www.tensorflow.org/api_docs/python/tf/Graph) | `pb`              | `yolov8n-obb.pb`              | ❌       | `imgsz`                                             |
-| [TF Lite](https://www.tensorflow.org/lite)                         | `tflite`          | `yolov8n-obb.tflite`          | ✅       | `imgsz`, `half`, `int8`                             |
-| [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n-obb_edgetpu.tflite`  | ✅       | `imgsz`                                             |
-| [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n-obb_web_model/`      | ✅       | `imgsz`, `half`, `int8`                             |
-| [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n-obb_paddle_model/`   | ✅       | `imgsz`                                             |
-| [ncnn](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-obb_ncnn_model/`     | ✅       | `imgsz`, `half`                                     |
+| [PyTorch](https://pytorch.org/)                                    | -                 | `yolov8n-obb.pt`              | ✅        | -                                                   |
+| [TorchScript](https://pytorch.org/docs/stable/jit.html)            | `torchscript`     | `yolov8n-obb.torchscript`     | ✅        | `imgsz`, `optimize`                                 |
+| [ONNX](https://onnx.ai/)                                           | `onnx`            | `yolov8n-obb.onnx`            | ✅        | `imgsz`, `half`, `dynamic`, `simplify`, `opset`     |
+| [OpenVINO](https://docs.openvino.ai/latest/index.html)             | `openvino`        | `yolov8n-obb_openvino_model/` | ✅        | `imgsz`, `half`                                     |
+| [TensorRT](https://developer.nvidia.com/tensorrt)                  | `engine`          | `yolov8n-obb.engine`          | ✅        | `imgsz`, `half`, `dynamic`, `simplify`, `workspace` |
+| [CoreML](https://github.com/apple/coremltools)                     | `coreml`          | `yolov8n-obb.mlpackage`       | ✅        | `imgsz`, `half`, `int8`, `nms`                      |
+| [TF SavedModel](https://www.tensorflow.org/guide/saved_model)      | `saved_model`     | `yolov8n-obb_saved_model/`    | ✅        | `imgsz`, `keras`                                    |
+| [TF GraphDef](https://www.tensorflow.org/api_docs/python/tf/Graph) | `pb`              | `yolov8n-obb.pb`              | ❌        | `imgsz`                                             |
+| [TF Lite](https://www.tensorflow.org/lite)                         | `tflite`          | `yolov8n-obb.tflite`          | ✅        | `imgsz`, `half`, `int8`                             |
+| [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n-obb_edgetpu.tflite`  | ✅        | `imgsz`                                             |
+| [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n-obb_web_model/`      | ✅        | `imgsz`, `half`, `int8`                             |
+| [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n-obb_paddle_model/`   | ✅        | `imgsz`                                             |
+| [ncnn](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n-obb_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |

 See full `export` details in the [Export](https://docs.ultralytics.com/modes/export/) page.
--- a/docs/en/usage/cfg.md
+++ b/docs/en/usage/cfg.md
@ -224,23 +224,23 @@ Export settings for YOLO models encompass configurations and options related to

 Augmentation settings for YOLO models refer to the various transformations and modifications applied to the training data to increase the diversity and size of the dataset. These settings can affect the model's performance, speed, and accuracy. Some common YOLO augmentation settings include the type and intensity of the transformations applied (e.g. random flips, rotations, cropping, color changes), the probability with which each transformation is applied, and the presence of additional features such as masks or multiple labels per box. Other factors that may affect the augmentation process include the size and composition of the original dataset and the specific task the model is being used for. It is important to carefully tune and experiment with these settings to ensure that the augmented dataset is diverse and representative enough to train a high-performing model.

-| Key             | Value           | Description                                                                    |
-|-----------------|-----------------|--------------------------------------------------------------------------------|
-| `hsv_h`         | `0.015`         | image HSV-Hue augmentation (fraction)                                          |
-| `hsv_s`         | `0.7`           | image HSV-Saturation augmentation (fraction)                                   |
-| `hsv_v`         | `0.4`           | image HSV-Value augmentation (fraction)                                        |
-| `degrees`       | `0.0`           | image rotation (+/- deg)                                                       |
-| `translate`     | `0.1`           | image translation (+/- fraction)                                               |
-| `scale`         | `0.5`           | image scale (+/- gain)                                                         |
-| `shear`         | `0.0`           | image shear (+/- deg)                                                          |
-| `perspective`   | `0.0`           | image perspective (+/- fraction), range 0-0.001                                |
-| `flipud`        | `0.0`           | image flip up-down (probability)                                               |
-| `fliplr`        | `0.5`           | image flip left-right (probability)                                            |
-| `mosaic`        | `1.0`           | image mosaic (probability)                                                     |
-| `mixup`         | `0.0`           | image mixup (probability)                                                      |
-| `copy_paste`    | `0.0`           | segment copy-paste (probability)                                               |
-| `auto_augment`  | `'randaugment'` | auto augmentation policy for classification (randaugment, autoaugment, augmix) |
-| `erasing`       | `0.4`           | probability o random erasing during classification training (0-1) training     |
+| Key            | Value           | Description                                                                    |
+|----------------|-----------------|--------------------------------------------------------------------------------|
+| `hsv_h`        | `0.015`         | image HSV-Hue augmentation (fraction)                                          |
+| `hsv_s`        | `0.7`           | image HSV-Saturation augmentation (fraction)                                   |
+| `hsv_v`        | `0.4`           | image HSV-Value augmentation (fraction)                                        |
+| `degrees`      | `0.0`           | image rotation (+/- deg)                                                       |
+| `translate`    | `0.1`           | image translation (+/- fraction)                                               |
+| `scale`        | `0.5`           | image scale (+/- gain)                                                         |
+| `shear`        | `0.0`           | image shear (+/- deg)                                                          |
+| `perspective`  | `0.0`           | image perspective (+/- fraction), range 0-0.001                                |
+| `flipud`       | `0.0`           | image flip up-down (probability)                                               |
+| `fliplr`       | `0.5`           | image flip left-right (probability)                                            |
+| `mosaic`       | `1.0`           | image mosaic (probability)                                                     |
+| `mixup`        | `0.0`           | image mixup (probability)                                                      |
+| `copy_paste`   | `0.0`           | segment copy-paste (probability)                                               |
+| `auto_augment` | `'randaugment'` | auto augmentation policy for classification (randaugment, autoaugment, augmix) |
+| `erasing`      | `0.4`           | probability o random erasing during classification training (0-1) training     |

 ## Logging, checkpoints, plotting and file management

--- a/docs/hi/models/rtdetr.md
+++ b/docs/hi/models/rtdetr.md
@ -1,7 +1,6 @@
 ---
 comments: true
-description:
-  Baidu के RT-DETR का अवलोकन करें: विज़न ट्रांसफॉर्मर के द्वारा संचालित, उन्नत और अनुकूलनयोग्य वास्तविक समय ऑब्जेक्ट डिटेक्टर, जिसमें तैयार मॉडल शामिल हैं।
+description: Baidu के RT-DETR का अवलोकन करें: विज़न ट्रांसफॉर्मर के द्वारा संचालित, उन्नत और अनुकूलनयोग्य वास्तविक समय ऑब्जेक्ट डिटेक्टर, जिसमें तैयार मॉडल शामिल हैं।
 keywords: RT-DETR, Baidu, विज़न ट्रांसफॉर्मर्स, ऑब्जेक्ट डिटेक्शन, वास्तविक समय प्रदर्शन, CUDA, TensorRT, IoU-जागरूक क्वेरी चयन, Ultralytics, पायथन एपीआई, PaddlePaddle
 ---

--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@ -65,8 +65,7 @@ theme:

 # Customization
 copyright: <a href="https://ultralytics.com" target="_blank">© 2023 Ultralytics Inc.</a> All rights reserved.
-extra:
-  # version:
+extra: # version:
  #   provider: mike  #  version drop-down menu
  robots: robots.txt
  analytics:
@ -220,6 +219,11 @@ nav:
      - RT-DETR (Realtime Detection Transformer): models/rtdetr.md
  - Datasets:
      - datasets/index.md
+      - Explorer:
+          - datasets/explorer/index.md
+          - Explorer API: datasets/explorer/api.md
+          - GUI Dashboard Demo: datasets/explorer/dash.md
+          - VOC Exploration Example: datasets/explorer/explorer.ipynb
      - Detection:
          - datasets/detect/index.md
          - Argoverse: datasets/detect/argoverse.md
@ -341,137 +345,137 @@ nav:
      - Inference API: hub/inference_api.md

  - Reference:
-    - cfg:
-      - __init__: reference/cfg/__init__.md
-    - data:
-      - annotator: reference/data/annotator.md
-      - augment: reference/data/augment.md
-      - base: reference/data/base.md
-      - build: reference/data/build.md
-      - converter: reference/data/converter.md
-      - dataset: reference/data/dataset.md
-      - loaders: reference/data/loaders.md
-      - split_dota: reference/data/split_dota.md
-      - utils: reference/data/utils.md
-    - engine:
-      - exporter: reference/engine/exporter.md
-      - model: reference/engine/model.md
-      - predictor: reference/engine/predictor.md
-      - results: reference/engine/results.md
-      - trainer: reference/engine/trainer.md
-      - tuner: reference/engine/tuner.md
-      - validator: reference/engine/validator.md
-    - hub:
-      - __init__: reference/hub/__init__.md
-      - auth: reference/hub/auth.md
-      - session: reference/hub/session.md
-      - utils: reference/hub/utils.md
-    - models:
-      - fastsam:
-        - model: reference/models/fastsam/model.md
-        - predict: reference/models/fastsam/predict.md
-        - prompt: reference/models/fastsam/prompt.md
-        - utils: reference/models/fastsam/utils.md
-        - val: reference/models/fastsam/val.md
-      - nas:
-        - model: reference/models/nas/model.md
-        - predict: reference/models/nas/predict.md
-        - val: reference/models/nas/val.md
-      - rtdetr:
-        - model: reference/models/rtdetr/model.md
-        - predict: reference/models/rtdetr/predict.md
-        - train: reference/models/rtdetr/train.md
-        - val: reference/models/rtdetr/val.md
-      - sam:
-        - amg: reference/models/sam/amg.md
-        - build: reference/models/sam/build.md
-        - model: reference/models/sam/model.md
-        - modules:
-          - decoders: reference/models/sam/modules/decoders.md
-          - encoders: reference/models/sam/modules/encoders.md
-          - sam: reference/models/sam/modules/sam.md
-          - tiny_encoder: reference/models/sam/modules/tiny_encoder.md
-          - transformer: reference/models/sam/modules/transformer.md
-        - predict: reference/models/sam/predict.md
+      - cfg:
+          - __init__: reference/cfg/__init__.md
+      - data:
+          - annotator: reference/data/annotator.md
+          - augment: reference/data/augment.md
+          - base: reference/data/base.md
+          - build: reference/data/build.md
+          - converter: reference/data/converter.md
+          - dataset: reference/data/dataset.md
+          - loaders: reference/data/loaders.md
+          - split_dota: reference/data/split_dota.md
+          - utils: reference/data/utils.md
+      - engine:
+          - exporter: reference/engine/exporter.md
+          - model: reference/engine/model.md
+          - predictor: reference/engine/predictor.md
+          - results: reference/engine/results.md
+          - trainer: reference/engine/trainer.md
+          - tuner: reference/engine/tuner.md
+          - validator: reference/engine/validator.md
+      - hub:
+          - __init__: reference/hub/__init__.md
+          - auth: reference/hub/auth.md
+          - session: reference/hub/session.md
+          - utils: reference/hub/utils.md
+      - models:
+          - fastsam:
+              - model: reference/models/fastsam/model.md
+              - predict: reference/models/fastsam/predict.md
+              - prompt: reference/models/fastsam/prompt.md
+              - utils: reference/models/fastsam/utils.md
+              - val: reference/models/fastsam/val.md
+          - nas:
+              - model: reference/models/nas/model.md
+              - predict: reference/models/nas/predict.md
+              - val: reference/models/nas/val.md
+          - rtdetr:
+              - model: reference/models/rtdetr/model.md
+              - predict: reference/models/rtdetr/predict.md
+              - train: reference/models/rtdetr/train.md
+              - val: reference/models/rtdetr/val.md
+          - sam:
+              - amg: reference/models/sam/amg.md
+              - build: reference/models/sam/build.md
+              - model: reference/models/sam/model.md
+              - modules:
+                  - decoders: reference/models/sam/modules/decoders.md
+                  - encoders: reference/models/sam/modules/encoders.md
+                  - sam: reference/models/sam/modules/sam.md
+                  - tiny_encoder: reference/models/sam/modules/tiny_encoder.md
+                  - transformer: reference/models/sam/modules/transformer.md
+              - predict: reference/models/sam/predict.md
+          - utils:
+              - loss: reference/models/utils/loss.md
+              - ops: reference/models/utils/ops.md
+          - yolo:
+              - classify:
+                  - predict: reference/models/yolo/classify/predict.md
+                  - train: reference/models/yolo/classify/train.md
+                  - val: reference/models/yolo/classify/val.md
+              - detect:
+                  - predict: reference/models/yolo/detect/predict.md
+                  - train: reference/models/yolo/detect/train.md
+                  - val: reference/models/yolo/detect/val.md
+              - model: reference/models/yolo/model.md
+              - obb:
+                  - predict: reference/models/yolo/obb/predict.md
+                  - train: reference/models/yolo/obb/train.md
+                  - val: reference/models/yolo/obb/val.md
+              - pose:
+                  - predict: reference/models/yolo/pose/predict.md
+                  - train: reference/models/yolo/pose/train.md
+                  - val: reference/models/yolo/pose/val.md
+              - segment:
+                  - predict: reference/models/yolo/segment/predict.md
+                  - train: reference/models/yolo/segment/train.md
+                  - val: reference/models/yolo/segment/val.md
+      - nn:
+          - autobackend: reference/nn/autobackend.md
+          - modules:
+              - block: reference/nn/modules/block.md
+              - conv: reference/nn/modules/conv.md
+              - head: reference/nn/modules/head.md
+              - transformer: reference/nn/modules/transformer.md
+              - utils: reference/nn/modules/utils.md
+          - tasks: reference/nn/tasks.md
+      - solutions:
+          - ai_gym: reference/solutions/ai_gym.md
+          - heatmap: reference/solutions/heatmap.md
+          - object_counter: reference/solutions/object_counter.md
+          - speed_estimation: reference/solutions/speed_estimation.md
+          - distance_calculation: reference/solutions/distance_calculation.md
+      - trackers:
+          - basetrack: reference/trackers/basetrack.md
+          - bot_sort: reference/trackers/bot_sort.md
+          - byte_tracker: reference/trackers/byte_tracker.md
+          - track: reference/trackers/track.md
+          - utils:
+              - gmc: reference/trackers/utils/gmc.md
+              - kalman_filter: reference/trackers/utils/kalman_filter.md
+              - matching: reference/trackers/utils/matching.md
      - utils:
-        - loss: reference/models/utils/loss.md
-        - ops: reference/models/utils/ops.md
-      - yolo:
-        - classify:
-          - predict: reference/models/yolo/classify/predict.md
-          - train: reference/models/yolo/classify/train.md
-          - val: reference/models/yolo/classify/val.md
-        - detect:
-          - predict: reference/models/yolo/detect/predict.md
-          - train: reference/models/yolo/detect/train.md
-          - val: reference/models/yolo/detect/val.md
-        - model: reference/models/yolo/model.md
-        - obb:
-          - predict: reference/models/yolo/obb/predict.md
-          - train: reference/models/yolo/obb/train.md
-          - val: reference/models/yolo/obb/val.md
-        - pose:
-          - predict: reference/models/yolo/pose/predict.md
-          - train: reference/models/yolo/pose/train.md
-          - val: reference/models/yolo/pose/val.md
-        - segment:
-          - predict: reference/models/yolo/segment/predict.md
-          - train: reference/models/yolo/segment/train.md
-          - val: reference/models/yolo/segment/val.md
-    - nn:
-      - autobackend: reference/nn/autobackend.md
-      - modules:
-        - block: reference/nn/modules/block.md
-        - conv: reference/nn/modules/conv.md
-        - head: reference/nn/modules/head.md
-        - transformer: reference/nn/modules/transformer.md
-        - utils: reference/nn/modules/utils.md
-      - tasks: reference/nn/tasks.md
-    - solutions:
-      - ai_gym: reference/solutions/ai_gym.md
-      - heatmap: reference/solutions/heatmap.md
-      - object_counter: reference/solutions/object_counter.md
-      - speed_estimation: reference/solutions/speed_estimation.md
-      - distance_calculation: reference/solutions/distance_calculation.md
-    - trackers:
-      - basetrack: reference/trackers/basetrack.md
-      - bot_sort: reference/trackers/bot_sort.md
-      - byte_tracker: reference/trackers/byte_tracker.md
-      - track: reference/trackers/track.md
-      - utils:
-        - gmc: reference/trackers/utils/gmc.md
-        - kalman_filter: reference/trackers/utils/kalman_filter.md
-        - matching: reference/trackers/utils/matching.md
-    - utils:
-      - __init__: reference/utils/__init__.md
-      - autobatch: reference/utils/autobatch.md
-      - benchmarks: reference/utils/benchmarks.md
-      - callbacks:
-        - base: reference/utils/callbacks/base.md
-        - clearml: reference/utils/callbacks/clearml.md
-        - comet: reference/utils/callbacks/comet.md
-        - dvc: reference/utils/callbacks/dvc.md
-        - hub: reference/utils/callbacks/hub.md
-        - mlflow: reference/utils/callbacks/mlflow.md
-        - neptune: reference/utils/callbacks/neptune.md
-        - raytune: reference/utils/callbacks/raytune.md
-        - tensorboard: reference/utils/callbacks/tensorboard.md
-        - wb: reference/utils/callbacks/wb.md
-      - checks: reference/utils/checks.md
-      - dist: reference/utils/dist.md
-      - downloads: reference/utils/downloads.md
-      - errors: reference/utils/errors.md
-      - files: reference/utils/files.md
-      - instance: reference/utils/instance.md
-      - loss: reference/utils/loss.md
-      - metrics: reference/utils/metrics.md
-      - ops: reference/utils/ops.md
-      - patches: reference/utils/patches.md
-      - plotting: reference/utils/plotting.md
-      - tal: reference/utils/tal.md
-      - torch_utils: reference/utils/torch_utils.md
-      - triton: reference/utils/triton.md
-      - tuner: reference/utils/tuner.md
+          - __init__: reference/utils/__init__.md
+          - autobatch: reference/utils/autobatch.md
+          - benchmarks: reference/utils/benchmarks.md
+          - callbacks:
+              - base: reference/utils/callbacks/base.md
+              - clearml: reference/utils/callbacks/clearml.md
+              - comet: reference/utils/callbacks/comet.md
+              - dvc: reference/utils/callbacks/dvc.md
+              - hub: reference/utils/callbacks/hub.md
+              - mlflow: reference/utils/callbacks/mlflow.md
+              - neptune: reference/utils/callbacks/neptune.md
+              - raytune: reference/utils/callbacks/raytune.md
+              - tensorboard: reference/utils/callbacks/tensorboard.md
+              - wb: reference/utils/callbacks/wb.md
+          - checks: reference/utils/checks.md
+          - dist: reference/utils/dist.md
+          - downloads: reference/utils/downloads.md
+          - errors: reference/utils/errors.md
+          - files: reference/utils/files.md
+          - instance: reference/utils/instance.md
+          - loss: reference/utils/loss.md
+          - metrics: reference/utils/metrics.md
+          - ops: reference/utils/ops.md
+          - patches: reference/utils/patches.md
+          - plotting: reference/utils/plotting.md
+          - tal: reference/utils/tal.md
+          - torch_utils: reference/utils/torch_utils.md
+          - triton: reference/utils/triton.md
+          - tuner: reference/utils/tuner.md

  - Help:
      - Help: help/index.md
@ -503,6 +507,7 @@ plugins:
      add_image: True
      add_share_buttons: True
      default_image: https://github.com/ultralytics/ultralytics/assets/26833433/6d09221c-c52a-4234-9a5d-b862e93c6529
+  - mkdocs-jupyter
  - redirects:
      redirect_maps:
        callbacks.md: usage/callbacks.md
--- a/pyproject.toml
+++ b/pyproject.toml
@ -91,6 +91,7 @@ dev = [
    "coverage[toml]",
    "mkdocs-material",
    "mkdocstrings[python]",
+    "mkdocs-jupyter", # for notebooks
    "mkdocs-redirects", # for 301 redirects
    "mkdocs-ultralytics-plugin>=0.0.34", # for meta descriptions and images, dates and authors
 ]
@ -103,6 +104,13 @@ export = [
    "jaxlib<=0.4.21", # tensorflowjs bug https://github.com/google/jax/issues/18978
    "tensorflowjs>=3.9.0", # TF.js export, automatically installs tensorflow
 ]
+
+explorer = [
+    "lancedb",  # vector search
+    "duckdb",  # SQL queries, supports lancedb tables
+    "streamlit",  # visualizing with GUI
+]
+
 # tensorflow>=2.4.1,<=2.13.1  # TF exports (-cpu, -aarch64, -macos)
 # tflite-support  # for TFLite model metadata
 # scikit-learn==0.19.2  # CoreML quantization
--- a/tests/test_explorer.py
+++ b/tests/test_explorer.py
@ -0,0 +1,50 @@
+from ultralytics import Explorer
+
+
+def test_similarity():
+    exp = Explorer()
+    exp.create_embeddings_table()
+    similar = exp.get_similar(idx=1)
+    assert len(similar) == 25
+    similar = exp.get_similar(img='https://ultralytics.com/images/zidane.jpg')
+    assert len(similar) == 25
+    similar = exp.get_similar(idx=[1, 2], limit=10)
+    assert len(similar) == 10
+    sim_idx = exp.similarity_index()
+    assert len(sim_idx) > 0
+    sql = exp.sql_query("WHERE labels LIKE '%person%'")
+    len(sql) > 0
+
+
+def test_det():
+    exp = Explorer(data='coco8.yaml', model='yolov8n.pt')
+    exp.create_embeddings_table(force=True)
+    assert len(exp.table.head()['bboxes']) > 0
+    similar = exp.get_similar(idx=[1, 2], limit=10)
+    assert len(similar) > 0
+    # This is a loose test, just checks errors not correctness
+    similar = exp.plot_similar(idx=[1, 2], limit=10)
+    assert similar is not None
+    similar.show()
+
+
+def test_seg():
+    exp = Explorer(data='coco8-seg.yaml', model='yolov8n-seg.pt')
+    exp.create_embeddings_table(force=True)
+    assert len(exp.table.head()['masks']) > 0
+    similar = exp.get_similar(idx=[1, 2], limit=10)
+    assert len(similar) > 0
+    similar = exp.plot_similar(idx=[1, 2], limit=10)
+    assert similar is not None
+    similar.show()
+
+
+def test_pose():
+    exp = Explorer(data='coco8-pose.yaml', model='yolov8n-pose.pt')
+    exp.create_embeddings_table(force=True)
+    assert len(exp.table.head()['keypoints']) > 0
+    similar = exp.get_similar(idx=[1, 2], limit=10)
+    assert len(similar) > 0
+    similar = exp.plot_similar(idx=[1, 2], limit=10)
+    assert similar is not None
+    similar.show()
--- a/ultralytics/init.py
+++ b/ultralytics/init.py
@ -1,7 +1,8 @@
 # Ultralytics YOLO 🚀, AGPL-3.0 license

-__version__ = '8.0.235'
+__version__ = '8.0.236'

+from ultralytics.data.explorer.explorer import Explorer
 from ultralytics.models import RTDETR, SAM, YOLO
 from ultralytics.models.fastsam import FastSAM
 from ultralytics.models.nas import NAS
@ -9,4 +10,4 @@ from ultralytics.utils import SETTINGS as settings
 from ultralytics.utils.checks import check_yolo as checks
 from ultralytics.utils.downloads import download

-__all__ = '__version__', 'YOLO', 'NAS', 'SAM', 'FastSAM', 'RTDETR', 'checks', 'download', 'settings'
+__all__ = '__version__', 'YOLO', 'NAS', 'SAM', 'FastSAM', 'RTDETR', 'checks', 'download', 'settings', 'Explorer'
--- a/ultralytics/cfg/init.py
+++ b/ultralytics/cfg/init.py
@ -2,6 +2,7 @@

 import contextlib
 import shutil
+import subprocess
 import sys
 from pathlib import Path
 from types import SimpleNamespace
@ -56,6 +57,9 @@ CLI_HELP_MSG = \
    4. Export a YOLOv8n classification model to ONNX format at image size 224 by 128 (no TASK required)
        yolo export model=yolov8n-cls.pt format=onnx imgsz=224,128

+    6. Explore your datasets using semantic search and SQL with a simple GUI powered by Ultralytics Explorer API
+        yolo explorer
+
    5. Run special commands:
        yolo help
        yolo checks
@ -297,6 +301,12 @@ def handle_yolo_settings(args: List[str]) -> None:
        LOGGER.warning(f"WARNING ⚠️ settings error: '{e}'. Please see {url} for help.")


+def handle_explorer():
+    """Open the Ultralytics Explorer GUI."""
+    checks.check_requirements('streamlit')
+    subprocess.run(['streamlit', 'run', ROOT / 'data/explorer/gui/dash.py', '--server.maxMessageSize', '2048'])
+
+
 def parse_key_value_pair(pair):
    """Parse one 'key=value' pair and return key and value."""
    k, v = pair.split('=', 1)  # split on first '=' sign
@ -348,7 +358,8 @@ def entrypoint(debug=''):
        'cfg': lambda: yaml_print(DEFAULT_CFG_PATH),
        'hub': lambda: handle_yolo_hub(args[1:]),
        'login': lambda: handle_yolo_hub(args),
-        'copy-cfg': copy_default_cfg}
+        'copy-cfg': copy_default_cfg,
+        'explorer': lambda: handle_explorer()}
    full_args_dict = {**DEFAULT_CFG_DICT, **{k: None for k in TASKS}, **{k: None for k in MODES}, **special}

    # Define common misuses of special commands, i.e. -h, -help, --help
--- a/ultralytics/data/explorer/init.py
+++ b/ultralytics/data/explorer/init.py
--- a/ultralytics/data/explorer/explorer.py
+++ b/ultralytics/data/explorer/explorer.py
@ -0,0 +1,403 @@
+from io import BytesIO
+from pathlib import Path
+from typing import List
+
+import cv2
+import numpy as np
+import torch
+from matplotlib import pyplot as plt
+from PIL import Image
+from tqdm import tqdm
+
+from ultralytics.data.augment import Format
+from ultralytics.data.dataset import YOLODataset
+from ultralytics.data.utils import check_det_dataset
+from ultralytics.models.yolo.model import YOLO
+from ultralytics.utils import LOGGER, checks
+
+from .utils import get_sim_index_schema, get_table_schema, plot_similar_images, sanitize_batch
+
+
+class ExplorerDataset(YOLODataset):
+
+    def __init__(self, *args, data=None, **kwargs):
+        super().__init__(*args, data=data, **kwargs)
+
+    # NOTE: Load the image directly without any resize operations.
+    def load_image(self, i):
+        """Loads 1 image from dataset index 'i', returns (im, resized hw)."""
+        im, f, fn = self.ims[i], self.im_files[i], self.npy_files[i]
+        if im is None:  # not cached in RAM
+            if fn.exists():  # load npy
+                im = np.load(fn)
+            else:  # read image
+                im = cv2.imread(f)  # BGR
+                if im is None:
+                    raise FileNotFoundError(f'Image Not Found {f}')
+            h0, w0 = im.shape[:2]  # orig hw
+            return im, (h0, w0), im.shape[:2]
+
+        return self.ims[i], self.im_hw0[i], self.im_hw[i]
+
+    def build_transforms(self, hyp=None):
+        transforms = Format(
+            bbox_format='xyxy',
+            normalize=False,
+            return_mask=self.use_segments,
+            return_keypoint=self.use_keypoints,
+            batch_idx=True,
+            mask_ratio=hyp.mask_ratio,
+            mask_overlap=hyp.overlap_mask,
+        )
+        return transforms
+
+
+class Explorer:
+
+    def __init__(self, data='coco128.yaml', model='yolov8n.pt', uri='~/ultralytics/explorer') -> None:
+        checks.check_requirements(['lancedb', 'duckdb'])
+        import lancedb
+
+        self.connection = lancedb.connect(uri)
+        self.table_name = Path(data).name.lower() + '_' + model.lower()
+        self.sim_idx_base_name = f'{self.table_name}_sim_idx'.lower(
+        )  # Use this name and append thres and top_k to reuse the table
+        self.model = YOLO(model)
+        self.data = data  # None
+        self.choice_set = None
+
+        self.table = None
+        self.progress = 0
+
+    def create_embeddings_table(self, force=False, split='train'):
+        """
+        Create LanceDB table containing the embeddings of the images in the dataset. The table will be reused if it
+        already exists. Pass force=True to overwrite the existing table.
+
+        Args:
+            force (bool): Whether to overwrite the existing table or not. Defaults to False.
+            split (str): Split of the dataset to use. Defaults to 'train'.
+
+        Example:
+            ```python
+            exp = Explorer()
+            exp.create_embeddings_table()
+            ```
+        """
+        if self.table is not None and not force:
+            LOGGER.info('Table already exists. Reusing it. Pass force=True to overwrite it.')
+            return
+        if self.table_name in self.connection.table_names() and not force:
+            LOGGER.info(f'Table {self.table_name} already exists. Reusing it. Pass force=True to overwrite it.')
+            self.table = self.connection.open_table(self.table_name)
+            self.progress = 1
+            return
+        if self.data is None:
+            raise ValueError('Data must be provided to create embeddings table')
+
+        data_info = check_det_dataset(self.data)
+        if split not in data_info:
+            raise ValueError(
+                f'Split {split} is not found in the dataset. Available keys in the dataset are {list(data_info.keys())}'
+            )
+
+        choice_set = data_info[split]
+        choice_set = choice_set if isinstance(choice_set, list) else [choice_set]
+        self.choice_set = choice_set
+        dataset = ExplorerDataset(img_path=choice_set, data=data_info, augment=False, cache=False, task=self.model.task)
+
+        # Create the table schema
+        batch = dataset[0]
+        vector_size = self.model.embed(batch['im_file'], verbose=False)[0].shape[0]
+        Schema = get_table_schema(vector_size)
+        table = self.connection.create_table(self.table_name, schema=Schema, mode='overwrite')
+        table.add(
+            self._yield_batches(dataset,
+                                data_info,
+                                self.model,
+                                exclude_keys=['img', 'ratio_pad', 'resized_shape', 'ori_shape', 'batch_idx']))
+
+        self.table = table
+
+    def _yield_batches(self, dataset, data_info, model, exclude_keys: List):
+        # Implement Batching
+        for i in tqdm(range(len(dataset))):
+            self.progress = float(i + 1) / len(dataset)
+            batch = dataset[i]
+            for k in exclude_keys:
+                batch.pop(k, None)
+            batch = sanitize_batch(batch, data_info)
+            batch['vector'] = model.embed(batch['im_file'], verbose=False)[0].detach().tolist()
+            yield [batch]
+
+    def query(self, imgs=None, limit=25):
+        """
+        Query the table for similar images. Accepts a single image or a list of images.
+
+        Args:
+            imgs (str or list): Path to the image or a list of paths to the images.
+            limit (int): Number of results to return.
+
+        Returns:
+            An arrow table containing the results. Supports converting to:
+                - pandas dataframe: `result.to_pandas()`
+                - dict of lists: `result.to_pydict()`
+
+        Example:
+            ```python
+            exp = Explorer()
+            exp.create_embeddings_table()
+            similar = exp.query(img='https://ultralytics.com/images/zidane.jpg')
+            ```
+        """
+        if self.table is None:
+            raise ValueError('Table is not created. Please create the table first.')
+        if isinstance(imgs, str):
+            imgs = [imgs]
+        elif isinstance(imgs, list):
+            pass
+        else:
+            raise ValueError(f'img must be a string or a list of strings. Got {type(imgs)}')
+        embeds = self.model.embed(imgs)
+        # Get avg if multiple images are passed (len > 1)
+        embeds = torch.mean(torch.stack(embeds), 0).cpu().numpy() if len(embeds) > 1 else embeds[0].cpu().numpy()
+        query = self.table.search(embeds).limit(limit).to_arrow()
+        return query
+
+    def sql_query(self, query, return_type='pandas'):
+        """
+        Run a SQL-Like query on the table. Utilizes LanceDB predicate pushdown.
+
+        Args:
+            query (str): SQL query to run.
+            return_type (str): Type of the result to return. Can be either 'pandas' or 'arrow'. Defaults to 'pandas'.
+
+        Returns:
+            An arrow table containing the results.
+
+        Example:
+            ```python
+            exp = Explorer()
+            exp.create_embeddings_table()
+            query = 'SELECT * FROM table WHERE labels LIKE "%person%"'
+            result = exp.sql_query(query)
+            ```
+        """
+        import duckdb
+
+        if self.table is None:
+            raise ValueError('Table is not created. Please create the table first.')
+
+        # Note: using filter pushdown would be a better long term solution. Temporarily using duckdb for this.
+        table = self.table.to_arrow()  # noqa
+        if not query.startswith('SELECT') and not query.startswith('WHERE'):
+            raise ValueError(
+                'Query must start with SELECT or WHERE. You can either pass the entire query or just the WHERE clause.')
+        if query.startswith('WHERE'):
+            query = f"SELECT * FROM 'table' {query}"
+        LOGGER.info(f'Running query: {query}')
+
+        rs = duckdb.sql(query)
+        if return_type == 'pandas':
+            return rs.df()
+        elif return_type == 'arrow':
+            return rs.arrow()
+
+    def plot_sql_query(self, query, labels=True):
+        """
+        Plot the results of a SQL-Like query on the table.
+        Args:
+            query (str): SQL query to run.
+            labels (bool): Whether to plot the labels or not.
+
+        Returns:
+            PIL Image containing the plot.
+
+        Example:
+            ```python
+            exp = Explorer()
+            exp.create_embeddings_table()
+            query = 'SELECT * FROM table WHERE labels LIKE "%person%"'
+            result = exp.plot_sql_query(query)
+            ```
+        """
+        result = self.sql_query(query, return_type='arrow')
+        img = plot_similar_images(result, plot_labels=labels)
+        img = Image.fromarray(img)
+        return img
+
+    def get_similar(self, img=None, idx=None, limit=25, return_type='pandas'):
+        """
+        Query the table for similar images. Accepts a single image or a list of images.
+
+        Args:
+            img (str or list): Path to the image or a list of paths to the images.
+            idx (int or list): Index of the image in the table or a list of indexes.
+            limit (int): Number of results to return. Defaults to 25.
+            return_type (str): Type of the result to return. Can be either 'pandas' or 'arrow'. Defaults to 'pandas'.
+
+        Returns:
+            A table or pandas dataframe containing the results.
+
+        Example:
+            ```python
+            exp = Explorer()
+            exp.create_embeddings_table()
+            similar = exp.get_similar(img='https://ultralytics.com/images/zidane.jpg')
+            ```
+        """
+        img = self._check_imgs_or_idxs(img, idx)
+        similar = self.query(img, limit=limit)
+
+        if return_type == 'pandas':
+            return similar.to_pandas()
+        elif return_type == 'arrow':
+            return similar
+
+    def plot_similar(self, img=None, idx=None, limit=25, labels=True):
+        """
+        Plot the similar images. Accepts images or indexes.
+
+        Args:
+            img (str or list): Path to the image or a list of paths to the images.
+            idx (int or list): Index of the image in the table or a list of indexes.
+            labels (bool): Whether to plot the labels or not.
+            limit (int): Number of results to return. Defaults to 25.
+
+        Returns:
+            PIL Image containing the plot.
+
+        Example:
+            ```python
+            exp = Explorer()
+            exp.create_embeddings_table()
+            similar = exp.plot_similar(img='https://ultralytics.com/images/zidane.jpg')
+            ```
+        """
+        similar = self.get_similar(img, idx, limit, return_type='arrow')
+        img = plot_similar_images(similar, plot_labels=labels)
+        img = Image.fromarray(img)
+        return img
+
+    def similarity_index(self, max_dist=0.2, top_k=None, force=False):
+        """
+        Calculate the similarity index of all the images in the table. Here, the index will contain the data points that
+        are max_dist or closer to the image in the embedding space at a given index.
+
+        Args:
+            max_dist (float): maximum L2 distance between the embeddings to consider. Defaults to 0.2.
+            top_k (float): Percentage of the closest data points to consider when counting. Used to apply limit when running
+                            vector search. Defaults to 0.01.
+            force (bool): Whether to overwrite the existing similarity index or not. Defaults to True.
+
+        Returns:
+            A pandas dataframe containing the similarity index.
+
+        Example:
+            ```python
+            exp = Explorer()
+            exp.create_embeddings_table()
+            sim_idx = exp.similarity_index()
+            ```
+        """
+        if self.table is None:
+            raise ValueError('Table is not created. Please create the table first.')
+        sim_idx_table_name = f'{self.sim_idx_base_name}_thres_{max_dist}_top_{top_k}'.lower()
+        if sim_idx_table_name in self.connection.table_names() and not force:
+            LOGGER.info('Similarity matrix already exists. Reusing it. Pass force=True to overwrite it.')
+            return self.connection.open_table(sim_idx_table_name).to_pandas()
+
+        if top_k and not (1.0 >= top_k >= 0.0):
+            raise ValueError(f'top_k must be between 0.0 and 1.0. Got {top_k}')
+        if max_dist < 0.0:
+            raise ValueError(f'max_dist must be greater than 0. Got {max_dist}')
+
+        top_k = int(top_k * len(self.table)) if top_k else len(self.table)
+        top_k = max(top_k, 1)
+        features = self.table.to_lance().to_table(columns=['vector', 'im_file']).to_pydict()
+        im_files = features['im_file']
+        embeddings = features['vector']
+
+        sim_table = self.connection.create_table(sim_idx_table_name, schema=get_sim_index_schema(), mode='overwrite')
+
+        def _yield_sim_idx():
+            for i in tqdm(range(len(embeddings))):
+                sim_idx = self.table.search(embeddings[i]).limit(top_k).to_pandas().query(f'_distance <= {max_dist}')
+                yield [{
+                    'idx': i,
+                    'im_file': im_files[i],
+                    'count': len(sim_idx),
+                    'sim_im_files': sim_idx['im_file'].tolist()}]
+
+        sim_table.add(_yield_sim_idx())
+        self.sim_index = sim_table
+
+        return sim_table.to_pandas()
+
+    def plot_similarity_index(self, max_dist=0.2, top_k=None, force=False):
+        """
+        Plot the similarity index of all the images in the table. Here, the index will contain the data points that are
+        max_dist or closer to the image in the embedding space at a given index.
+
+        Args:
+            max_dist (float): maximum L2 distance between the embeddings to consider. Defaults to 0.2.
+            top_k (float): Percentage of closest data points to consider when counting. Used to apply limit when
+                running vector search. Defaults to 0.01.
+            force (bool): Whether to overwrite the existing similarity index or not. Defaults to True.
+
+        Returns:
+            PIL Image containing the plot.
+
+        Example:
+            ```python
+            exp = Explorer()
+            exp.create_embeddings_table()
+            exp.plot_similarity_index()
+            ```
+        """
+        sim_idx = self.similarity_index(max_dist=max_dist, top_k=top_k, force=force)
+        sim_count = sim_idx['count'].tolist()
+        sim_count = np.array(sim_count)
+
+        indices = np.arange(len(sim_count))
+
+        # Create the bar plot
+        plt.bar(indices, sim_count)
+
+        # Customize the plot (optional)
+        plt.xlabel('data idx')
+        plt.ylabel('Count')
+        plt.title('Similarity Count')
+        buffer = BytesIO()
+        plt.savefig(buffer, format='png')
+        buffer.seek(0)
+
+        # Use Pillow to open the image from the buffer
+        image = Image.open(buffer)
+        return image
+
+    def _check_imgs_or_idxs(self, img, idx):
+        if img is None and idx is None:
+            raise ValueError('Either img or idx must be provided.')
+        if img is not None and idx is not None:
+            raise ValueError('Only one of img or idx must be provided.')
+        if idx is not None:
+            idx = idx if isinstance(idx, list) else [idx]
+            img = self.table.to_lance().take(idx, columns=['im_file']).to_pydict()['im_file']
+
+        img = img if isinstance(img, list) else [img]
+        return img
+
+    def visualize(self, result):
+        """
+        Visualize the results of a query.
+
+        Args:
+            result (arrow table): Arrow table containing the results of a query.
+        """
+        # TODO:
+        pass
+
+    def generate_report(self, result):
+        """Generate a report of the dataset."""
+        pass
--- a/ultralytics/data/explorer/gui/init.py
+++ b/ultralytics/data/explorer/gui/init.py
--- a/ultralytics/data/explorer/gui/dash.py
+++ b/ultralytics/data/explorer/gui/dash.py
@ -0,0 +1,178 @@
+import time
+from threading import Thread
+
+from ultralytics import Explorer
+from ultralytics.utils import ROOT
+from ultralytics.utils.checks import check_requirements
+
+check_requirements('streamlit')
+check_requirements('streamlit-select>=0.2')
+import streamlit as st
+from streamlit_select import image_select
+
+
+def _get_explorer():
+    exp = Explorer(data=st.session_state.get('dataset'), model=st.session_state.get('model'))
+    thread = Thread(target=exp.create_embeddings_table,
+                    kwargs={'force': st.session_state.get('force_recreate_embeddings')})
+    thread.start()
+    progress_bar = st.progress(0, text='Creating embeddings table...')
+    while exp.progress < 1:
+        time.sleep(0.1)
+        progress_bar.progress(exp.progress, text=f'Progress: {exp.progress * 100}%')
+    thread.join()
+    st.session_state['explorer'] = exp
+    progress_bar.empty()
+
+
+def init_explorer_form():
+    datasets = ROOT / 'cfg' / 'datasets'
+    ds = [d.name for d in datasets.glob('*.yaml')]
+    models = [
+        'yolov8n.pt', 'yolov8s.pt', 'yolov8m.pt', 'yolov8l.pt', 'yolov8x.pt', 'yolov8n-seg.pt', 'yolov8s-seg.pt',
+        'yolov8m-seg.pt', 'yolov8l-seg.pt', 'yolov8x-seg.pt', 'yolov8n-pose.pt', 'yolov8s-pose.pt', 'yolov8m-pose.pt',
+        'yolov8l-pose.pt', 'yolov8x-pose.pt']
+    with st.form(key='explorer_init_form'):
+        col1, col2 = st.columns(2)
+        with col1:
+            dataset = st.selectbox('Select dataset', ds, key='dataset', index=ds.index('coco128.yaml'))
+        with col2:
+            model = st.selectbox('Select model', models, key='model')
+        st.checkbox('Force recreate embeddings', key='force_recreate_embeddings')
+
+        st.form_submit_button('Explore', on_click=_get_explorer)
+
+
+def query_form():
+    with st.form('query_form'):
+        col1, col2 = st.columns([0.8, 0.2])
+        with col1:
+            query = st.text_input('Query', '', label_visibility='collapsed', key='query')
+        with col2:
+            st.form_submit_button('Query', on_click=run_sql_query)
+
+
+def find_similar_imgs(imgs):
+    exp = st.session_state['explorer']
+    similar = exp.get_similar(img=imgs, limit=st.session_state.get('limit'), return_type='arrow')
+    paths = similar.to_pydict()['im_file']
+    st.session_state['imgs'] = paths
+
+
+def similarity_form(selected_imgs):
+    st.write('Similarity Search')
+    with st.form('similarity_form'):
+        subcol1, subcol2 = st.columns([1, 1])
+        with subcol1:
+            limit = st.number_input('limit',
+                                    min_value=None,
+                                    max_value=None,
+                                    value=25,
+                                    label_visibility='collapsed',
+                                    key='limit')
+
+        with subcol2:
+            disabled = not len(selected_imgs)
+            st.write('Selected: ', len(selected_imgs))
+            st.form_submit_button(
+                'Search',
+                disabled=disabled,
+                on_click=find_similar_imgs,
+                args=(selected_imgs, ),
+            )
+        if disabled:
+            st.error('Select at least one image to search.')
+
+
+# def persist_reset_form():
+#    with st.form("persist_reset"):
+#        col1, col2 = st.columns([1, 1])
+#        with col1:
+#            st.form_submit_button("Reset", on_click=reset)
+#
+#        with col2:
+#            st.form_submit_button("Persist", on_click=update_state, args=("PERSISTING", True))
+
+
+def run_sql_query():
+    query = st.session_state.get('query')
+    if query.rstrip().lstrip():
+        exp = st.session_state['explorer']
+        res = exp.sql_query(query, return_type='arrow')
+        st.session_state['imgs'] = res.to_pydict()['im_file']
+
+
+def reset_explorer():
+    st.session_state['explorer'] = None
+    st.session_state['imgs'] = None
+
+
+def utralytics_explorer_docs_callback():
+    with st.container(border=True):
+        st.image('https://raw.githubusercontent.com/ultralytics/assets/main/logo/Ultralytics_Logotype_Original.svg',
+                 width=100)
+        st.markdown(
+            "<p>This demo is built using Ultralytics Explorer API. Visit <a href=''>API docs</a> to try examples & learn more</p>",
+            unsafe_allow_html=True,
+            help=None)
+        st.link_button('Ultrlaytics Explorer API', 'https://docs.ultralytics.com/')
+
+
+def layout():
+    st.set_page_config(layout='wide', initial_sidebar_state='collapsed')
+    st.markdown("<h1 style='text-align: center;'>Ultralytics Explorer Demo</h1>", unsafe_allow_html=True)
+
+    if st.session_state.get('explorer') is None:
+        init_explorer_form()
+        return
+
+    st.button(':arrow_backward: Select Dataset', on_click=reset_explorer)
+    exp = st.session_state.get('explorer')
+    col1, col2 = st.columns([0.75, 0.25], gap='small')
+
+    imgs = st.session_state.get('imgs') or exp.table.to_lance().to_table(columns=['im_file']).to_pydict()['im_file']
+    total_imgs = len(imgs)
+    with col1:
+        subcol1, subcol2, subcol3, subcol4, subcol5 = st.columns(5)
+        with subcol1:
+            st.write('Max Images Displayed:')
+        with subcol2:
+            num = st.number_input('Max Images Displayed',
+                                  min_value=0,
+                                  max_value=total_imgs,
+                                  value=min(500, total_imgs),
+                                  key='num_imgs_displayed',
+                                  label_visibility='collapsed')
+        with subcol3:
+            st.write('Start Index:')
+        with subcol4:
+            start_idx = st.number_input('Start Index',
+                                        min_value=0,
+                                        max_value=total_imgs,
+                                        value=0,
+                                        key='start_index',
+                                        label_visibility='collapsed')
+        with subcol5:
+            reset = st.button('Reset', use_container_width=False, key='reset')
+            if reset:
+                st.session_state['imgs'] = None
+                st.experimental_rerun()
+
+        query_form()
+        if total_imgs:
+            imgs_displayed = imgs[start_idx:start_idx + num]
+            selected_imgs = image_select(
+                f'Total samples: {total_imgs}',
+                images=imgs_displayed,
+                use_container_width=False,
+                # indices=[i for i in range(num)] if select_all else None,
+            )
+
+    with col2:
+        similarity_form(selected_imgs)
+        # display_labels = st.checkbox("Labels", value=False, key="display_labels")
+        utralytics_explorer_docs_callback()
+
+
+if __name__ == '__main__':
+    layout()
--- a/ultralytics/data/explorer/utils.py
+++ b/ultralytics/data/explorer/utils.py
@ -0,0 +1,103 @@
+from pathlib import Path
+from typing import List
+
+import cv2
+import numpy as np
+
+from ultralytics.data.augment import LetterBox
+from ultralytics.utils.ops import xyxy2xywh
+from ultralytics.utils.plotting import plot_images
+
+
+def get_table_schema(vector_size):
+    from lancedb.pydantic import LanceModel, Vector
+
+    class Schema(LanceModel):
+        im_file: str
+        labels: List[str]
+        cls: List[int]
+        bboxes: List[List[float]]
+        masks: List[List[List[int]]]
+        keypoints: List[List[List[float]]]
+        vector: Vector(vector_size)
+
+    return Schema
+
+
+def get_sim_index_schema():
+    from lancedb.pydantic import LanceModel
+
+    class Schema(LanceModel):
+        idx: int
+        im_file: str
+        count: int
+        sim_im_files: List[str]
+
+    return Schema
+
+
+def sanitize_batch(batch, dataset_info):
+    batch['cls'] = batch['cls'].flatten().int().tolist()
+    box_cls_pair = sorted(zip(batch['bboxes'].tolist(), batch['cls']), key=lambda x: x[1])
+    batch['bboxes'] = [box for box, _ in box_cls_pair]
+    batch['cls'] = [cls for _, cls in box_cls_pair]
+    batch['labels'] = [dataset_info['names'][i] for i in batch['cls']]
+    batch['masks'] = batch['masks'].tolist() if 'masks' in batch else [[[]]]
+    batch['keypoints'] = batch['keypoints'].tolist() if 'keypoints' in batch else [[[]]]
+
+    return batch
+
+
+def plot_similar_images(similar_set, plot_labels=True):
+    """
+    Plot images from the similar set.
+
+    Args:
+        similar_set (list): Pyarrow table containing the similar data points
+        plot_labels (bool): Whether to plot labels or not
+    """
+    similar_set = similar_set.to_pydict()
+    empty_masks = [[[]]]
+    empty_boxes = [[]]
+    images = similar_set.get('im_file', [])
+    bboxes = similar_set.get('bboxes', []) if similar_set.get('bboxes') is not empty_boxes else []
+    masks = similar_set.get('masks') if similar_set.get('masks')[0] != empty_masks else []
+    kpts = similar_set.get('keypoints') if similar_set.get('keypoints')[0] != empty_masks else []
+    cls = similar_set.get('cls', [])
+
+    plot_size = 640
+    imgs, batch_idx, plot_boxes, plot_masks, plot_kpts = [], [], [], [], []
+    for i, imf in enumerate(images):
+        im = cv2.imread(imf)
+        im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
+        h, w = im.shape[:2]
+        r = min(plot_size / h, plot_size / w)
+        imgs.append(LetterBox(plot_size, center=False)(image=im).transpose(2, 0, 1))
+        if plot_labels:
+            if len(bboxes) > i and len(bboxes[i]) > 0:
+                box = np.array(bboxes[i], dtype=np.float32)
+                box[:, [0, 2]] *= r
+                box[:, [1, 3]] *= r
+                plot_boxes.append(box)
+            if len(masks) > i and len(masks[i]) > 0:
+                mask = np.array(masks[i], dtype=np.uint8)[0]
+                plot_masks.append(LetterBox(plot_size, center=False)(image=mask))
+            if len(kpts) > i and kpts[i] is not None:
+                kpt = np.array(kpts[i], dtype=np.float32)
+                kpt[:, :, :2] *= r
+                plot_kpts.append(kpt)
+        batch_idx.append(np.ones(len(np.array(bboxes[i], dtype=np.float32))) * i)
+    imgs = np.stack(imgs, axis=0)
+    masks = np.stack(plot_masks, axis=0) if len(plot_masks) > 0 else np.zeros(0, dtype=np.uint8)
+    kpts = np.concatenate(plot_kpts, axis=0) if len(plot_kpts) > 0 else np.zeros((0, 51), dtype=np.float32)
+    boxes = xyxy2xywh(np.concatenate(plot_boxes, axis=0)) if len(plot_boxes) > 0 else np.zeros(0, dtype=np.float32)
+    batch_idx = np.concatenate(batch_idx, axis=0)
+    cls = np.concatenate([np.array(c, dtype=np.int32) for c in cls], axis=0)
+
+    fname = 'temp_exp_grid.jpg'
+    plot_images(imgs, batch_idx, cls, bboxes=boxes, masks=masks, kpts=kpts, fname=fname,
+                max_subplots=len(images)).join()
+    img = cv2.imread(fname, cv2.IMREAD_COLOR)
+    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+    Path(fname).unlink()
+    return img_rgb
--- a/ultralytics/utils/plotting.py
+++ b/ultralytics/utils/plotting.py
@ -579,7 +579,8 @@ def plot_images(images,
                paths=None,
                fname='images.jpg',
                names=None,
-                on_plot=None):
+                on_plot=None,
+                max_subplots=16):
    """Plot image grid with labels."""
    if isinstance(images, torch.Tensor):
        images = images.cpu().float().numpy()
@ -595,7 +596,7 @@ def plot_images(images,
        batch_idx = batch_idx.cpu().numpy()

    max_size = 1920  # max image size
-    max_subplots = 16  # max image subplots, i.e. 4x4
+    max_subplots = max_subplots  # max image subplots, i.e. 4x4
    bs, _, h, w = images.shape  # batch size, _, height, width
    bs = min(bs, max_subplots)  # limit plot images
    ns = np.ceil(bs ** 0.5)  # number of subplots (square)
@ -685,7 +686,7 @@ def plot_images(images,
                    image_masks = np.where(image_masks == index, 1.0, 0.0)

                im = np.asarray(annotator.im).copy()
-                for j, box in enumerate(boxes.T.tolist()):
+                for j in range(len(image_masks)):
                    if labels or conf[j] > 0.25:  # 0.25 conf thresh
                        color = colors(classes[j])
                        mh, mw = image_masks[j].shape