Instructions to use kiselyovd/citysample-vehicle-keypoints-24pt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ultralytics
How to use kiselyovd/citysample-vehicle-keypoints-24pt with ultralytics:
# Couldn't find a valid YOLO version tag. # Replace XX with the correct version. from ultralytics import YOLOvXX model = YOLOvXX.from_pretrained("kiselyovd/citysample-vehicle-keypoints-24pt") source = 'http://images.cocodataset.org/val2017/000000039769.jpg' model.predict(source=source, save=True) - Notebooks
- Google Colab
- Kaggle
City Sample Vehicle Keypoints - 24-point (synthetic-only)
A YOLO-pose model trained entirely on synthetic data - the City Sample 24-point vehicle-keypoint dataset rendered in Unreal Engine 5. It predicts a 24-point anatomical keypoint schema (wheels, head/tail lights, exhaust, roof corners, center, mirrors, bumper and window corners) plus a bounding box per vehicle.
Generated by kiselyovd/ue5-vehicle-synth.
What this model is for
This is a research / proof-of-concept model that demonstrates the synthetic dataset is clean and learnable: a model trained only on it localises vehicles and their keypoints well on held-out synthetic frames. For real-world 14-point vehicle keypoints, see the production model kiselyovd/vehicle-keypoints.
In-domain results (held-out synthetic val)
| Metric | Box | Pose |
|---|---|---|
| mAP@50 | 0.859 | 0.331 |
| mAP@50-95 | 0.523 | 0.194 |
(Pose mAP is understated: ultralytics uses default OKS sigmas, which are tuned for
17-point human pose, not this 24-point vehicle schema.) Trained from
yolo26n-pose on 1,296 synthetic frames, 100 epochs, imgsz 480.
Visualizations
Multi-vehicle 24-point predictions (left) and the pixel-exact synthetic label quality the dataset is built on (right):
Usage
from huggingface_hub import hf_hub_download
from ultralytics import YOLO
w = hf_hub_download("kiselyovd/citysample-vehicle-keypoints-24pt", "best.pt")
model = YOLO(w)
results = model.predict("your_street_scene.jpg")
# results[0].keypoints.xy -> (N, 24, 2) keypoints per detected vehicle
Honest caveats
- Synthetic domain. Trained only on rendered frames; expect a sim-to-real gap on real photos (no real images were used).
- Evaluation. Cross-evaluating against the real CarFusion dataset is confounded by CarFusion's own noisy, sparse 14-point labels - it conflates transfer quality with label-convention mismatch and is not a fair judge of this model. The in-domain numbers above and the dataset's pixel-exact construction are the honest signal of label quality.
License
MIT for the weights. Rendered training frames come from Epic's City Sample under the UE EULA (non-interactive renders are distributable; no Epic assets are shipped).
- Downloads last month
- 79
Model tree for kiselyovd/citysample-vehicle-keypoints-24pt
Base model
Ultralytics/YOLO26Dataset used to train kiselyovd/citysample-vehicle-keypoints-24pt
Evaluation results
- box_map_50 on City Sample Vehicle Keypoints 24-point (synthetic val)self-reported0.859
- box_map_50_95 on City Sample Vehicle Keypoints 24-point (synthetic val)self-reported0.523
- pose_map_50 on City Sample Vehicle Keypoints 24-point (synthetic val)self-reported0.331
- pose_map_50_95 on City Sample Vehicle Keypoints 24-point (synthetic val)self-reported0.194