DriveBench: General-Purpose Driving Scene Encoder

Author: Nikhil Upadhyay | MSc Business Analytics | Dublin Business School Project: PRECOG-AV

Overview

DriveBench is the first general-purpose driving scene encoder trained with safety-focused multi-task supervision across 25 countries and 298,326 real driving clips — the largest geographic scale in driving representation learning.

Each clip is encoded into a 256-dimensional DriveBench embedding that simultaneously captures danger context, geographic driving patterns, time-of-day risk, radar sensor health, and traffic density. Use these embeddings like ImageNet features — but for driving scenes.

Results

Task	Metric	Score	Random Baseline
Danger Anticipation	AUC	0.8385	0.500
Geographic Region	Accuracy	0.4438	0.167 (6 classes)
Time of Day	Accuracy	0.5168	0.250 (4 classes)
Radar Health	AUC	1.0000	0.500
TTC Regression	Pearson r	0.3009	0.000

Tested on Greece and Bulgaria — countries never seen during training.

What makes this different

All existing driving pre-training (DriveWorld, DriveTok, GASP) uses geometric proxy tasks — depth prediction, occupancy, reconstruction — on 1 to 3 cities.

DriveBench uses safety-relevant supervision signals across 25 countries:

Danger labels from physics-based TTC analysis (not manual annotation)
Radar sensor health as a training signal
Geographic region (6 regions, 25 countries)
Time-of-day risk patterns (peak danger 13:00-15:00 confirmed)
Traffic density

Architecture

ViT-B/16 features (5 frames × 768-dim)

↓

TransformerEncoder (3 layers, 8 heads, 2048 FFN)

↓

DriveBench Embedding (256-dim) ← use this downstream

↓

5 multi-task heads:

Danger head → AUC 0.84

Region head → Acc 0.44 (6 regions)

Time-of-day → Acc 0.52 (4 buckets)

Radar head → AUC 1.00

TTC regression → r = 0.30

Usage

import torch
import torch.nn as nn
from huggingface_hub import hf_hub_download

class DriveBenchModel(nn.Module):
    def __init__(self, embed_dim=256, n_frames=5, n_regions=6):
        super().__init__()
        self.cls_token = nn.Parameter(torch.randn(1,1,768))
        self.pos_embed = nn.Embedding(n_frames+1, 768)
        layer = nn.TransformerEncoderLayer(
            d_model=768, nhead=8, dim_feedforward=2048,
            dropout=0.1, batch_first=True, norm_first=True)
        self.transformer = nn.TransformerEncoder(layer, num_layers=3)
        self.norm = nn.LayerNorm(768)
        self.projector = nn.Sequential(
            nn.Linear(768,512), nn.GELU(), nn.Dropout(0.15),
            nn.Linear(512,embed_dim), nn.LayerNorm(embed_dim))

    def encode(self, x):
        B = x.shape[0]
        cls = self.cls_token.expand(B,-1,-1)
        x = torch.cat([cls,x],dim=1)
        pos = torch.arange(x.shape[1], device=x.device)
        x = x + self.pos_embed(pos)
        x = self.norm(self.transformer(x))
        return self.projector(x[:,0])

path = hf_hub_download("Trazemag/DriveBench", "drivebench_best.pt")
model = DriveBenchModel()
ckpt = torch.load(path, map_location="cpu", weights_only=False)
model.load_state_dict(ckpt["model_state"])
model.eval()

# Input:  (batch, 5, 768) ViT-B/16 features from 5 consecutive frames
# Output: (batch, 256) DriveBench embedding
# Use as features for any downstream driving task

Pre-computed Embeddings

298,326 embeddings already computed — download and use directly:

import numpy as np
from huggingface_hub import hf_hub_download

path = hf_hub_download(
    "Trazemag/DriveBench-Embeddings",
    "drivebench_embeddings.npz",
    repo_type="dataset")
data = np.load(path)
embeddings = data["embeddings"]  # (298326, 256)

Training Data

Built on the NVIDIA PhysicalAI-AV dataset (gated — request access at HuggingFace).

Danger labels available at Trazemag/PRECOG-Labels.

Related Models

Model	Task	Link
PRECOG-SENSE	Radar health from camera	Trazemag/PRECOG-SENSE
PRECOG-HERALD	Danger anticipation	Trazemag/PRECOG-HERALD
DriveBench	General scene encoder	This model

Citation

@misc{upadhyay2026drivebench,
  title  = {DriveBench: General-Purpose Driving Scene Encoder
            via Multi-Task Safety-Focused Pre-training across 25 Countries},
  author = {Upadhyay, Nikhil},
  year   = {2026},
  url    = {https://github.com/TrazeMaG/PRECOG-AV}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support