City Sample Vehicle Keypoints - 24-point (synthetic-only)

Cinematic reveal of the synthetic City Sample vehicle dataset with 24-point keypoint overlay

A YOLO-pose model trained entirely on synthetic data - the City Sample 24-point vehicle-keypoint dataset rendered in Unreal Engine 5. It predicts a 24-point anatomical keypoint schema (wheels, head/tail lights, exhaust, roof corners, center, mirrors, bumper and window corners) plus a bounding box per vehicle.

Generated by kiselyovd/ue5-vehicle-synth.

What this model is for

This is a research / proof-of-concept model that demonstrates the synthetic dataset is clean and learnable: a model trained only on it localises vehicles and their keypoints well on held-out synthetic frames. For real-world 14-point vehicle keypoints, see the production model kiselyovd/vehicle-keypoints.

In-domain results (held-out synthetic val)

Metric Box Pose
mAP@50 0.859 0.331
mAP@50-95 0.523 0.194

(Pose mAP is understated: ultralytics uses default OKS sigmas, which are tuned for 17-point human pose, not this 24-point vehicle schema.) Trained from yolo26n-pose on 1,296 synthetic frames, 100 epochs, imgsz 480.

Visualizations

Multi-vehicle 24-point predictions (left) and the pixel-exact synthetic label quality the dataset is built on (right):

Multi-vehicle 24-point keypoint predictions on a synthetic street scene Pixel-exact synthetic keypoint labels rendered in Unreal Engine 5

Usage

from huggingface_hub import hf_hub_download
from ultralytics import YOLO

w = hf_hub_download("kiselyovd/citysample-vehicle-keypoints-24pt", "best.pt")
model = YOLO(w)
results = model.predict("your_street_scene.jpg")
# results[0].keypoints.xy -> (N, 24, 2) keypoints per detected vehicle

Honest caveats

  • Synthetic domain. Trained only on rendered frames; expect a sim-to-real gap on real photos (no real images were used).
  • Evaluation. Cross-evaluating against the real CarFusion dataset is confounded by CarFusion's own noisy, sparse 14-point labels - it conflates transfer quality with label-convention mismatch and is not a fair judge of this model. The in-domain numbers above and the dataset's pixel-exact construction are the honest signal of label quality.

License

MIT for the weights. Rendered training frames come from Epic's City Sample under the UE EULA (non-interactive renders are distributable; no Epic assets are shipped).

Downloads last month
79
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kiselyovd/citysample-vehicle-keypoints-24pt

Finetuned
(48)
this model

Dataset used to train kiselyovd/citysample-vehicle-keypoints-24pt

Evaluation results

  • box_map_50 on City Sample Vehicle Keypoints 24-point (synthetic val)
    self-reported
    0.859
  • box_map_50_95 on City Sample Vehicle Keypoints 24-point (synthetic val)
    self-reported
    0.523
  • pose_map_50 on City Sample Vehicle Keypoints 24-point (synthetic val)
    self-reported
    0.331
  • pose_map_50_95 on City Sample Vehicle Keypoints 24-point (synthetic val)
    self-reported
    0.194