One command.
Your best export config.

Stop guessing between ONNX, CoreML, and PyTorch. exportrace benchmarks every backend and precision on your machine — and tells you which one to ship.

See a sample report below

MIT licensed No accounts Runs offline

Benchmarks the runtimes you already use

PyTorchONNX RuntimeCoreMLCUDAApple MPSUltralyticsTensorRTOpenVINOPyTorchONNX RuntimeCoreMLCUDAApple MPSUltralyticsTensorRTOpenVINO

The problem

"Should I export to ONNX or CoreML? Does FP16 actually help on my machine?"

Every YOLO user asks this. Every project. Answered by hand — badly — every time.

Proposed Shell API (Concept)

One command. Every backend measured.

~/projects/yolo-detectorSimulated Output Concept

$ exportrace run yolov8n.pt
▸ detecting environment... macOS 14.5, Apple M2, 16GB
▸ available backends: PyTorch (MPS), PyTorch (CPU), ONNX (CPU), CoreML
▸ exporting yolov8n.pt → onnx, coreml ... done
▸ benchmarking (100 iterations, imgsz=640) ...

┌──────────────┬───────────┬───────────┬────────┬──────────┬──────────┐
│ backend      │ format    │ precision │  FPS   │ latency  │  Δ acc   │
├──────────────┼───────────┼───────────┼────────┼──────────┼──────────┤
│ PyTorch      │ .pt       │ FP32      │  42.1  │  23.7 ms │  0.0000  │
│ PyTorch/MPS  │ .pt       │ FP16      │  118.6 │   8.4 ms │  0.0021  │
│ ONNX Runtime │ .onnx     │ FP32      │  61.3  │  16.3 ms │  0.0004  │
│ CoreML       │ .mlpackage│ FP16      │  184.2 │   5.4 ms │  0.0018  │ ★
└──────────────┴───────────┴───────────┴────────┴──────────┴──────────┘

✓ recommendation: CoreML FP16 — 4.4× faster than baseline, Δ acc negligible
▸ report written to benchmark_report.md

4.4×

faster than FP32 baseline

5.4 ms

per-frame latency

0.0018

accuracy delta (cosine)

What it does

A trustworthy answer
in one command.

Detect your machine

OS, chip, RAM, and every backend actually available — MPS, CUDA, ONNX EPs, CoreML. No config.

Export & benchmark

Uses Ultralytics' built-in exporter. Warm-up runs excluded. Std deviation included.

Accuracy delta, not vibes

Cosine similarity vs. the FP32 baseline so you know what precision actually costs you.

One markdown report

Machine info + full table in benchmark_report.md. Paste into a PR, an issue, or your notes.

Fails gracefully

No CUDA? No CoreML? Rows are skipped or marked unavailable. Never crashes on a missing backend.

Consumer hardware first

Your laptop, your desktop, your dev box. No Jetson, no Pi — the machine you actually work on.

Backends covered

Every runtime worth trying.

Apple

CoreML

PyTorch/MPS

NVIDIA

CUDA

TensorRT

Cross-platform

ONNX Runtime

PyTorch/CPU

Intel

OpenVINO

ONNX (DML)

The report

Markdown you can
paste anywhere.

Machine info at the top, full comparison table below. Drop it into a PR description, a GitHub issue, your lab notebook, or your README — it renders everywhere.

Mean FPS + std deviation
Latency in ms per iteration
Accuracy delta vs FP32 baseline
Ranked recommendation

# benchmark_report.md

# Machine

- OS: macOS 14.5 (Sonoma)

- Chip: Apple M2 (8-core)

- RAM: 16 GB

## Results — yolov8n, imgsz=640

|---|---|---|---|---|---|

| PyTorch | .pt | FP32 | 42.1 | 23.7 | 0.0000 |

| PyTorch/MPS | .pt | FP16 | 118.6 | 8.4 | 0.0021 |

| ONNX | .onnx | FP32 | 61.3 | 16.3 | 0.0004 |

| CoreML | .mlpackage | FP16 | 184.2 | 5.4 | 0.0018 |

Python API (Concept)

Or wire it into CI. Three lines.

bench.pyConcept Mockup

from exportrace import benchmark

report = benchmark("yolov8n.pt", imgsz=640, iters=100)
report.save("benchmark_report.md")

# Fail CI if the best backend can't hit 60 FPS
assert report.best.fps > 60, report.summary()

Get early access.
Join the waitlist.

Be the first to benchmark your model exports and find the ultimate configuration automatically. Launching soon.

One command.Your best export config.