9.7 KiB

Raw Blame History

comments	description	keywords
true	Learn how to deploy Ultralytics YOLOv8 on NVIDIA Jetson devices using TensorRT and DeepStream SDK. Explore performance benchmarks and maximize AI capabilities.	Ultralytics, YOLOv8, NVIDIA Jetson, JetPack, AI deployment, embedded systems, deep learning, TensorRT, DeepStream SDK, computer vision

Ultralytics YOLOv8 on NVIDIA Jetson using DeepStream SDK and TensorRT

This comprehensive guide provides a detailed walkthrough for deploying Ultralytics YOLOv8 on NVIDIA Jetson devices using DeepStream SDK and TensorRT. Here we use TensorRT to maximize the inference performance on the Jetson platform.

!!! Note

This guide has been tested with both [Seeed Studio reComputer J4012](https://www.seeedstudio.com/reComputer-J4012-p-5586.html) which is based on NVIDIA Jetson Orin NX 16GB running JetPack release of [JP5.1.3](https://developer.nvidia.com/embedded/jetpack-sdk-513) and [Seeed Studio reComputer J1020 v2](https://www.seeedstudio.com/reComputer-J1020-v2-p-5498.html) which is based on NVIDIA Jetson Nano 4GB running JetPack release of [JP4.6.4](https://developer.nvidia.com/jetpack-sdk-464). It is expected to work across all the NVIDIA Jetson hardware lineup including latest and legacy.

What is NVIDIA DeepStream?

NVIDIA's DeepStream SDK is a complete streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing, video, audio, and image understanding. It's ideal for vision AI developers, software partners, startups, and OEMs building IVA (Intelligent Video Analytics) apps and services. You can now create stream-processing pipelines that incorporate neural networks and other complex processing tasks like tracking, video encoding/decoding, and video rendering. These pipelines enable real-time analytics on video, image, and sensor data. DeepStream's multi-platform support gives you a faster, easier way to develop vision AI applications and services on-premise, at the edge, and in the cloud.

Prerequisites

Before you start to follow this guide:

Visit our documentation, Quick Start Guide: NVIDIA Jetson with Ultralytics YOLOv8 to set up your NVIDIA Jetson device with Ultralytics YOLOv8
Install DeepStream SDK according to the JetPack version
- For JetPack 4.6.4, install DeepStream 6.0.1
- For JetPack 5.1.3, install DeepStream 6.3

!!! Tip

In this guide we have used the Debian package method of installing DeepStream SDK to the Jetson device. You can also visit the [DeepStream SDK on Jetson (Archived)](https://developer.nvidia.com/embedded/deepstream-on-jetson-downloads-archived) to access legacy versions of DeepStream.

DeepStream Configuration for YOLOv8

Here we are using marcoslucianops/DeepStream-Yolo GitHub repository which includes NVIDIA DeepStream SDK support for YOLO models. We appreciate the efforts of marcoslucianops for his contributions!

Install dependencies
```
pip install cmake
pip install onnxsim
```

Clone the following repository

git clone https://github.com/marcoslucianops/DeepStream-Yolo
cd DeepStream-Yolo

Download Ultralytics YOLOv8 detection model (.pt) of your choice from YOLOv8 releases. Here we use yolov8s.pt.

wget https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s.pt

!!! Note

You can also use a [custom trained YOLOv8 model](https://docs.ultralytics.com/modes/train/).

Convert model to ONNX

python3 utils/export_yoloV8.py -w yolov8s.pt

!!! Note "Pass the below arguments to the above command"

For DeepStream 6.0.1, use opset 12 or lower. The default opset is 16.

```bash
--opset 12
```

To change the inference size (default: 640)

```bash
-s SIZE
--size SIZE
-s HEIGHT WIDTH
--size HEIGHT WIDTH
```

Example for 1280:

```bash
-s 1280
or
-s 1280 1280
```

To simplify the ONNX model (DeepStream >= 6.0)

```bash
--simplify
```

To use dynamic batch-size (DeepStream >= 6.1)

```bash
--dynamic
```

To use static batch-size (example for batch-size = 4)

```bash
--batch 4
```

Set the CUDA version according to the JetPack version installed

For JetPack 4.6.4:
```
export CUDA_VER=10.2
```
For JetPack 5.1.3:
```
export CUDA_VER=11.4
```

Compile the library

make -C nvdsinfer_custom_impl_Yolo clean && make -C nvdsinfer_custom_impl_Yolo

Edit the config_infer_primary_yoloV8.txt file according to your model (for YOLOv8s with 80 classes)
```
[property]
...
onnx-file=yolov8s.onnx
...
num-detected-classes=80
...
```

Edit the deepstream_app_config file

...
[primary-gie]
...
config-file=config_infer_primary_yoloV8.txt

You can also change the video source in deepstream_app_config file. Here a default video file is loaded

...
[source0]
...
uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4

Run Inference

deepstream-app -c deepstream_app_config.txt

!!! Note

It will take a long time to generate the TensorRT engine file before starting the inference. So please be patient.

!!! Tip

If you want to convert the model to FP16 precision, simply set `model-engine-file=model_b1_gpu0_fp16.engine` and `network-mode=2` inside `config_infer_primary_yoloV8.txt`

INT8 Calibration

If you want to use INT8 precision for inference, you need to follow the steps below

Set OPENCV environment variable
```
export OPENCV=1
```

Compile the library

make -C nvdsinfer_custom_impl_Yolo clean && make -C nvdsinfer_custom_impl_Yolo

For COCO dataset, download the val2017, extract, and move to DeepStream-Yolo folder
Make a new directory for calibration images
```
mkdir calibration
```

Run the following to select 1000 random images from COCO dataset to run calibration

for jpg in $(ls -1 val2017/*.jpg | sort -R | head -1000); do \
    cp ${jpg} calibration/; \
done

!!! Note

NVIDIA recommends at least 500 images to get a good accuracy. On this example, 1000 images are chosen to get better accuracy (more images = more accuracy). You can set it from **head -1000**. For example, for 2000 images, **head -2000**. This process can take a long time.

Create the calibration.txt file with all selected images
```
realpath calibration/*jpg > calibration.txt
```

Set environment variables

export INT8_CALIB_IMG_PATH=calibration.txt
export INT8_CALIB_BATCH_SIZE=1

!!! Note

Higher INT8_CALIB_BATCH_SIZE values will result in more accuracy and faster calibration speed. Set it according to you GPU memory.

Update the config_infer_primary_yoloV8.txt file

From

...
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
...
network-mode=0
...

...
model-engine-file=model_b1_gpu0_int8.engine
int8-calib-file=calib.table
...
network-mode=1
...

Run Inference

deepstream-app -c deepstream_app_config.txt

MultiStream Setup

To set up multiple streams under a single deepstream application, you can do the following changes to the deepstream_app_config.txt file

Change the rows and columns to build a grid display according to the number of streams you want to have. For example, for 4 streams, we can add 2 rows and 2 columns.
```
[tiled-display]
rows=2
columns=2
```

Set num-sources=4 and add uri of all the 4 streams

[source0]
enable=1
type=3
uri=<path_to_video>
uri=<path_to_video>
uri=<path_to_video>
uri=<path_to_video>
num-sources=4

Run Inference

deepstream-app -c deepstream_app_config.txt

Benchmark Results

The following table summarizes how YOLOv8s models perform at different TensorRT precision levels with an input size of 640x640 on NVIDIA Jetson Orin NX 16GB.

Model Name	Precision	Inference Time (ms/im)	FPS
YOLOv8s	FP32	15.63	64
	FP16	7.94	126
	INT8	5.53	181

Acknowledgements

This guide was initially created by our friends at Seeed Studio, Lakshantha and Elaine.

9.7 KiB Raw Blame History

Ultralytics YOLOv8 on NVIDIA Jetson using DeepStream SDK and TensorRT

What is NVIDIA DeepStream?

Prerequisites

DeepStream Configuration for YOLOv8

Run Inference

INT8 Calibration

Run Inference

MultiStream Setup

Run Inference

Benchmark Results

Acknowledgements

9.7 KiB

Raw Blame History