Instructions to use chiawen0104/VLMPed-wo-CoT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use chiawen0104/VLMPed-wo-CoT with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/data/chl343/VLMPed-CoT-for-Pedestrian-Crossing-Intention-Prediction/LLM-model/Qwen/Qwen2.5-VL-3B-Instruct") model = PeftModel.from_pretrained(base_model, "chiawen0104/VLMPed-wo-CoT") - Notebooks
- Google Colab
- Kaggle
VLMPed-wo-CoT
A LoRA fine-tuned version of Qwen2.5-VL-3B-Instruct for pedestrian crossing intention prediction, trained without Chain-of-Thought supervision.
This model is part of the ECE 228 final project at UCSD (Spring 2026): "How Do Vision Language Models Utilize Multi-Frame Temporal Information for Pedestrian Intention Prediction?"
Project repository: ece228_VLMPed-CoT
Model Details
- Developed by: chiawen0104
- Model type: Vision-Language Model (LoRA fine-tuned)
- Finetuned from: Qwen/Qwen2.5-VL-3B-Instruct
- Task: Pedestrian crossing intention prediction (binary: cross / not cross)
- Training datasets: JAAD, PIE
- Framework: PEFT 0.15.1
How to Get Started
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from peft import PeftModel
base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-3B-Instruct"
)
model = PeftModel.from_pretrained(base_model, "chiawen0104/VLMPed-wo-CoT")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")
Training Details
- Base model: Qwen2.5-VL-3B-Instruct
- Fine-tuning method: LoRA (via PEFT)
- Training regime: bf16 mixed precision
- Training data: JAAD and PIE pedestrian crossing intention datasets
- CoT supervision: None (direct prediction without chain-of-thought)
Intended Use
This model takes multi-frame pedestrian images as input and predicts whether a pedestrian intends to cross the street. It is intended for research purposes in autonomous driving and pedestrian behavior analysis.
Differences from VLMPed-CoT
| VLMPed-CoT | VLMPed-wo-CoT | |
|---|---|---|
| CoT supervision | โ | โ |
| Direct prediction | โ | โ |
Reference
- Original Paper: VLMPed-CoT: A large vision-language model with a chain-of-thought mechanism for pedestrian crossing intention prediction
- Original implementation: lyc2121/VLMPed-CoT-for-Pedestrian-Crossing-Intention-Prediction
- Companion model: chiawen0104/VLMPed-CoT
Framework versions
- PEFT 0.15.1
- Downloads last month
- 16
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for chiawen0104/VLMPed-wo-CoT
Base model
Qwen/Qwen2.5-VL-3B-Instruct