VLMPed-CoT

A LoRA fine-tuned version of Qwen2.5-VL-3B-Instruct for pedestrian crossing intention prediction, trained with Chain-of-Thought supervision.

This model is part of the ECE 228 final project at UCSD (Spring 2026): "How Do Vision Language Models Utilize Multi-Frame Temporal Information for Pedestrian Intention Prediction?"

Project repository: ece228_VLMPed-CoT

Model Details

Developed by: chiawen0104
Model type: Vision-Language Model (LoRA fine-tuned)
Finetuned from: Qwen/Qwen2.5-VL-3B-Instruct
Task: Pedestrian crossing intention prediction (binary: cross / not cross)
Training datasets: JAAD, PIE
Framework: PEFT 0.15.1

How to Get Started

from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from peft import PeftModel

base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-3B-Instruct"
)
model = PeftModel.from_pretrained(base_model, "chiawen0104/VLMPed-CoT")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")

Training Details

Base model: Qwen2.5-VL-3B-Instruct
Fine-tuning method: LoRA (via PEFT)
Training regime: bf16 mixed precision
Training data: JAAD and PIE pedestrian crossing intention datasets
CoT supervision: ✅ Chain-of-Thought reasoning generated via Gemini API

Intended Use

This model takes multi-frame pedestrian images as input and predicts whether a pedestrian intends to cross the street. The CoT supervision encourages the model to reason step-by-step before making a prediction. It is intended for research purposes in autonomous driving and pedestrian behavior analysis.

Differences from VLMPed-wo-CoT

	VLMPed-CoT	VLMPed-wo-CoT
CoT supervision	✅	❌
Direct prediction	✅	✅

Reference

Original Paper: VLMPed-CoT: A large vision-language model with a chain-of-thought mechanism for pedestrian crossing intention prediction
Original implementation: lyc2121/VLMPed-CoT-for-Pedestrian-Crossing-Intention-Prediction
Companion model: chiawen0104/VLMPed-wo-CoT

Framework versions

PEFT 0.15.1

Downloads last month: 14

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chiawen0104/VLMPed-CoT

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Adapter

(198)

this model