@ -145,6 +145,8 @@ Experimentation by NVIDIA led them to recommend using at least 500 calibration i
!!! example
=== "Python"
```{ .py .annotate }
from ultralytics import YOLO
@ -158,15 +160,29 @@ Experimentation by NVIDIA led them to recommend using at least 500 calibration i
data="coco.yaml", #(4)!
)
model = YOLO("yolov8n.engine", task="detect") # load the model
# Load the exported TensorRT INT8 model
model = YOLO("yolov8n.engine", task="detect")
# Run inference
result = model.predict("https://ultralytics.com/images/bus.jpg")
```
1. Exports with dynamic axes, this will be enabled by default when exporting with `int8=True` even when not explicitly set. See [export arguments](../modes/export.md#arguments) for additional information.
2. Sets max batch size of 8 for exported model, which calibrates with `batch = 2 *×* 8` to avoid scaling errors during calibration.
2. Sets max batch size of 8 for exported model, which calibrates with `batch = 2 * 8` to avoid scaling errors during calibration.
3. Allocates 4 GiB of memory instead of allocating the entire device for conversion process.
4. Uses [COCO dataset](../datasets/detect/coco.md) for calibration, specifically the images used for [validation](../modes/val.md) (5,000 total).
=== "CLI"
```bash
# Export a YOLOv8n PyTorch model to TensorRT format with INT8 quantization
TensorRT will generate a calibration `.cache` which can be re-used to speed up export of future model weights using the same data, but this may result in poor calibration when the data is vastly different or if the `batch` value is changed drastically. In these circumstances, the existing `.cache` should be renamed and moved to a different directory or deleted entirely.
@ -240,12 +256,12 @@ Experimentation by NVIDIA led them to recommend using at least 500 calibration i
| Precision | Eval test | mean<br>(ms) | min \| max<br>(ms) | top-1 | top-5 | `batch` | size<br><sup>(pixels) |