One command.
Your best export config.

Stop guessing between ONNX, CoreML, and PyTorch. exportrace benchmarks every backend and precision on your machine — and tells you which one to ship.

See a sample report below
MIT licensed No accounts Runs offline

Benchmarks the runtimes you already use

PyTorchONNX RuntimeCoreMLCUDAApple MPSUltralyticsTensorRTOpenVINOPyTorchONNX RuntimeCoreMLCUDAApple MPSUltralyticsTensorRTOpenVINO

The problem

"Should I export to ONNX or CoreML? Does FP16 actually help on my machine?"

Every YOLO user asks this. Every project. Answered by hand — badly — every time.

Proposed Shell API (Concept)

One command. Every backend measured.

~/projects/yolo-detectorSimulated Output Concept
$ exportrace run yolov8n.pt
▸ detecting environment... macOS 14.5, Apple M2, 16GB
▸ available backends: PyTorch (MPS), PyTorch (CPU), ONNX (CPU), CoreML
▸ exporting yolov8n.pt → onnx, coreml ... done
▸ benchmarking (100 iterations, imgsz=640) ...

┌──────────────┬───────────┬───────────┬────────┬──────────┬──────────┐
│ backend      │ format    │ precision │  FPS   │ latency  │  Δ acc   │
├──────────────┼───────────┼───────────┼────────┼──────────┼──────────┤
│ PyTorch      │ .pt       │ FP32      │  42.1  │  23.7 ms │  0.0000  │
│ PyTorch/MPS  │ .pt       │ FP16      │  118.6 │   8.4 ms │  0.0021  │
│ ONNX Runtime │ .onnx     │ FP32      │  61.3  │  16.3 ms │  0.0004  │
│ CoreML       │ .mlpackage│ FP16      │  184.2 │   5.4 ms │  0.0018  │ ★
└──────────────┴───────────┴───────────┴────────┴──────────┴──────────┘

✓ recommendation: CoreML FP16 — 4.4× faster than baseline, Δ acc negligible
▸ report written to benchmark_report.md
4.4×
faster than FP32 baseline
5.4 ms
per-frame latency
0.0018
accuracy delta (cosine)

What it does

A trustworthy answer
in one command.

01

Detect your machine

OS, chip, RAM, and every backend actually available — MPS, CUDA, ONNX EPs, CoreML. No config.

02

Export & benchmark

Uses Ultralytics' built-in exporter. Warm-up runs excluded. Std deviation included.

03

Accuracy delta, not vibes

Cosine similarity vs. the FP32 baseline so you know what precision actually costs you.

04

One markdown report

Machine info + full table in benchmark_report.md. Paste into a PR, an issue, or your notes.

05

Fails gracefully

No CUDA? No CoreML? Rows are skipped or marked unavailable. Never crashes on a missing backend.

06

Consumer hardware first

Your laptop, your desktop, your dev box. No Jetson, no Pi — the machine you actually work on.

Backends covered

Every runtime worth trying.

Apple
CoreML
PyTorch/MPS
NVIDIA
CUDA
TensorRT
Cross-platform
ONNX Runtime
PyTorch/CPU
Intel
OpenVINO
ONNX (DML)

The report

Markdown you can
paste anywhere.

Machine info at the top, full comparison table below. Drop it into a PR description, a GitHub issue, your lab notebook, or your README — it renders everywhere.

  • Mean FPS + std deviation
  • Latency in ms per iteration
  • Accuracy delta vs FP32 baseline
  • Ranked recommendation
# benchmark_report.md
# Machine
- OS: macOS 14.5 (Sonoma)
- Chip: Apple M2 (8-core)
- RAM: 16 GB
## Results — yolov8n, imgsz=640
| backend | format | precision | FPS | ms | Δ acc |
|---|---|---|---|---|---|
| PyTorch | .pt | FP32 | 42.1 | 23.7 | 0.0000 |
| PyTorch/MPS | .pt | FP16 | 118.6 | 8.4 | 0.0021 |
| ONNX | .onnx | FP32 | 61.3 | 16.3 | 0.0004 |
| CoreML | .mlpackage | FP16 | 184.2 | 5.4 | 0.0018 |

Python API (Concept)

Or wire it into CI. Three lines.

bench.pyConcept Mockup
from exportrace import benchmark

report = benchmark("yolov8n.pt", imgsz=640, iters=100)
report.save("benchmark_report.md")

# Fail CI if the best backend can't hit 60 FPS
assert report.best.fps > 60, report.summary()

Get early access.
Join the waitlist.

Be the first to benchmark your model exports and find the ultimate configuration automatically. Launching soon.