Instructions to use Brusnicki/SAVANT-multimodal-evaluation-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Brusnicki/SAVANT-multimodal-evaluation-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") model = PeftModel.from_pretrained(base_model, "Brusnicki/SAVANT-multimodal-evaluation-lora") - Notebooks
- Google Colab
- Kaggle
SAVANT Multimodal Evaluation Model (LoRA Adapter)
This repository contains the LoRA adapter for the multimodal anomaly evaluation model (Phase 2) described in the paper Can VLMs Unlock Semantic Anomaly Detection? A Framework for Structured Reasoning.
Project Page: https://TUM-AVS.github.io/SAVANT/
This repository is provided for peer-review purposes only. After the review process, the model will be made publicly available through the authors' main account.
Model Description
LoRA adapter for Qwen/Qwen2.5-VL-7B-Instruct, fine-tuned for anomaly evaluation using both the driving scene image and a structured scene description. This is Phase 2 of the SAVANT two-phase pipeline.
The model receives:
- The original front-camera image
- A structured scene description (generated by the Phase 1 model)
And outputs a binary anomaly classification with detailed reasoning.
Pipeline Performance
When used as part of the full SAVANT pipeline (Phase 1 + Phase 2), evaluated on a balanced test set of 1,020 driving scene images:
| Metric | Value |
|---|---|
| Accuracy | 83.7% |
| Precision | 85.1% |
| Recall | 81.8% |
| F1-Score | 83.4% |
Training Details
- Base model: Qwen/Qwen2.5-VL-7B-Instruct
- Method: LoRA (Low-Rank Adaptation)
- Dataset: 4,260 samples with image + scene description + anomaly labels
- Epochs: 3
- Learning rate: 1e-4 (cosine schedule)
- Precision: bfloat16 with Flash Attention 2
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha | 32 |
| Dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, fc1, fc2, qkv, mlp.0, mlp.2 |
Usage
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "u94fmn391j/SAVANT-multimodal-evaluation-lora")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
Limitations
- Trained on the CODA dataset; generalization to other driving domains not evaluated
- Single-frame analysis only (no temporal context)
- Pipeline performance depends on the quality of the Phase 1 scene description
- Downloads last month
- 41
Model tree for Brusnicki/SAVANT-multimodal-evaluation-lora
Base model
Qwen/Qwen2.5-VL-7B-Instruct