DriveBench: General-Purpose Driving Scene Encoder

Author: Nikhil Upadhyay | MSc Business Analytics | Dublin Business School Project: PRECOG-AV

Overview

DriveBench is the first general-purpose driving scene encoder trained with safety-focused multi-task supervision across 25 countries and 298,326 real driving clips β€” the largest geographic scale in driving representation learning.

Each clip is encoded into a 256-dimensional DriveBench embedding that simultaneously captures danger context, geographic driving patterns, time-of-day risk, radar sensor health, and traffic density. Use these embeddings like ImageNet features β€” but for driving scenes.

Results

Task Metric Score Random Baseline
Danger Anticipation AUC 0.8385 0.500
Geographic Region Accuracy 0.4438 0.167 (6 classes)
Time of Day Accuracy 0.5168 0.250 (4 classes)
Radar Health AUC 1.0000 0.500
TTC Regression Pearson r 0.3009 0.000

Tested on Greece and Bulgaria β€” countries never seen during training.

What makes this different

All existing driving pre-training (DriveWorld, DriveTok, GASP) uses geometric proxy tasks β€” depth prediction, occupancy, reconstruction β€” on 1 to 3 cities.

DriveBench uses safety-relevant supervision signals across 25 countries:

  • Danger labels from physics-based TTC analysis (not manual annotation)
  • Radar sensor health as a training signal
  • Geographic region (6 regions, 25 countries)
  • Time-of-day risk patterns (peak danger 13:00-15:00 confirmed)
  • Traffic density

Architecture

ViT-B/16 features (5 frames Γ— 768-dim)

↓

TransformerEncoder (3 layers, 8 heads, 2048 FFN)

↓

DriveBench Embedding (256-dim) ← use this downstream

↓

5 multi-task heads:

Danger head β†’ AUC 0.84

Region head β†’ Acc 0.44 (6 regions)

Time-of-day β†’ Acc 0.52 (4 buckets)

Radar head β†’ AUC 1.00

TTC regression β†’ r = 0.30

Usage

import torch
import torch.nn as nn
from huggingface_hub import hf_hub_download

class DriveBenchModel(nn.Module):
    def __init__(self, embed_dim=256, n_frames=5, n_regions=6):
        super().__init__()
        self.cls_token = nn.Parameter(torch.randn(1,1,768))
        self.pos_embed = nn.Embedding(n_frames+1, 768)
        layer = nn.TransformerEncoderLayer(
            d_model=768, nhead=8, dim_feedforward=2048,
            dropout=0.1, batch_first=True, norm_first=True)
        self.transformer = nn.TransformerEncoder(layer, num_layers=3)
        self.norm = nn.LayerNorm(768)
        self.projector = nn.Sequential(
            nn.Linear(768,512), nn.GELU(), nn.Dropout(0.15),
            nn.Linear(512,embed_dim), nn.LayerNorm(embed_dim))

    def encode(self, x):
        B = x.shape[0]
        cls = self.cls_token.expand(B,-1,-1)
        x = torch.cat([cls,x],dim=1)
        pos = torch.arange(x.shape[1], device=x.device)
        x = x + self.pos_embed(pos)
        x = self.norm(self.transformer(x))
        return self.projector(x[:,0])

path = hf_hub_download("Trazemag/DriveBench", "drivebench_best.pt")
model = DriveBenchModel()
ckpt = torch.load(path, map_location="cpu", weights_only=False)
model.load_state_dict(ckpt["model_state"])
model.eval()

# Input:  (batch, 5, 768) ViT-B/16 features from 5 consecutive frames
# Output: (batch, 256) DriveBench embedding
# Use as features for any downstream driving task

Pre-computed Embeddings

298,326 embeddings already computed β€” download and use directly:

import numpy as np
from huggingface_hub import hf_hub_download

path = hf_hub_download(
    "Trazemag/DriveBench-Embeddings",
    "drivebench_embeddings.npz",
    repo_type="dataset")
data = np.load(path)
embeddings = data["embeddings"]  # (298326, 256)

Training Data

Built on the NVIDIA PhysicalAI-AV dataset (gated β€” request access at HuggingFace).

Danger labels available at Trazemag/PRECOG-Labels.

Related Models

Model Task Link
PRECOG-SENSE Radar health from camera Trazemag/PRECOG-SENSE
PRECOG-HERALD Danger anticipation Trazemag/PRECOG-HERALD
DriveBench General scene encoder This model

Citation

@misc{upadhyay2026drivebench,
  title  = {DriveBench: General-Purpose Driving Scene Encoder
            via Multi-Task Safety-Focused Pre-training across 25 Countries},
  author = {Upadhyay, Nikhil},
  year   = {2026},
  url    = {https://github.com/TrazeMaG/PRECOG-AV}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support