nuScenes RnB-EnCoRe — Full Reasoning + Waypoints

Part of the RnB-EnCoRe-SelfDriving collection from the Stanford Autonomous Systems Lab.

This is a Qwen3-VL-4B vision-language model fine-tuned on nuScenes driving data to produce a full natural-language reasoning trace followed by a future waypoint trajectory from multi-camera observations.

Related models in the collection:

Model details

  • Base model: Qwen/Qwen3-VL-4B-Instruct (Qwen3VLForConditionalGeneration)
  • Architecture: hidden size 2560, 36 layers
  • Modality: image/video + text → text
  • Task: full driving-scene reasoning + future trajectory (waypoint) prediction on nuScenes
  • Output: full chain-of-thought driving rationale followed by predicted waypoints

Training

  • Fine-tuned on a nuScenes full-trace VQA-driver dataset (full reasoning + trajectory targets)
  • Epochs: 30 (10,980 optimizer steps)
  • Max sequence length: 6144
  • Learning rate: 5e-5

Usage

For dataset preparation, prompting, inference, and evaluation, follow the instructions in the project repository: https://github.com/rnb-encore/RnB-EnCoRe-SelfDriving

from transformers import AutoModelForImageTextToText, AutoProcessor

model_id = "stanfordasl/nuscenes-full-reasoning-waypoints"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

# Build a chat message with the driving camera image(s) + prompt,
# then processor.apply_chat_template(...) and model.generate(...).
# See the GitHub repo for the exact prompt format and post-processing.

Intended use & limitations

This model is a research artifact for autonomous-driving perception and planning experiments. It was trained on nuScenes and is not intended for deployment in real vehicles or safety-critical settings. Outputs may be inaccurate or unsafe; always validate in simulation before any downstream use.

Citation

If you use this model, please cite the RnB-EnCoRe self-driving work:

Downloads last month
11
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stanfordasl/nuscenes-full-reasoning-waypoints

Finetuned
(306)
this model

Collection including stanfordasl/nuscenes-full-reasoning-waypoints

Paper for stanfordasl/nuscenes-full-reasoning-waypoints