YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Training procedure

This model is fine-tuned using Supervised Fine-Tuning (SFT) with a LoRA (Low-Rank Adaptation) setup on top of Qwen3-VL-8B-Instruct.

The training is part of the Automingo project, which focuses on safety-critical driving VQA using structured, scenario-based reasoning over short temporal image sequences.

Dataset

Training is performed on the Automingo-VQA dataset, designed for structured reasoning in driving scenarios:

  • 6,565 images
  • 1,313 annotated events
  • 5,792 question–answer pairs
  • 5-frame temporal snippets centered around safety-critical events

The dataset emphasizes:

  • cut-ins
  • traffic light transitions
  • vulnerable road users
  • leading vehicle braking
  • construction and lane changes
  • intersections and roundabouts

Fine-tuning setup

LoRA adapters are applied to the following modules: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj, gate_proj

Best configuration:

  • Learning rate: 2e-4
  • LoRA rank: 32
  • Gradient accumulation: 8
  • Optimizer: AdamW

Training was conducted using a sweep-based approach to optimize hyperparameters.

Training objective

The model is trained to:

  • answer structured driving-related questions
  • produce reasoning aligned with safety-critical interpretation
  • avoid invalid or non-actionable outputs

Evaluation and results

Evaluation setup

Evaluation is performed using the Automingo benchmark pipeline:

  • Multiple-choice question (MCQ) answering
  • Post-processing with structured evaluation scripts

Metrics:

  • MCQ accuracy
  • Invalid attempts
  • Semantic score (Lingo-Judge)

Benchmark results

Model MCQ Acc. (%) Invalid Attempts Lingo-Judge
Qwen3-VL-8B (base) 81.5 9 0.556
Automingo-VLM-8B (this model) 89.3 43 0.628

Key improvements

  • +7.8% absolute gain in MCQ accuracy over the base model
  • Improved structured reasoning for safety-critical scenarios
  • Competitive semantic reasoning performance

Observations

  • Strong performance on:

    • cut-in scenarios
    • leading vehicle interactions
  • Remaining challenges:

    • intersections
    • roundabouts

Overall, the fine-tuned model achieves strong performance on the Automingo benchmark and demonstrates specialization for ADAS-style reasoning tasks.


base_model: Qwen/Qwen3-VL-8B-Instruct library_name: peft model_name: crisp-sweep-3_lwysu1i9 tags: - base_model:adapter:Qwen/Qwen3-VL-8B-Instruct - lora - sft - transformers - trl licence: license pipeline_tag: text-generation

Model Card for crisp-sweep-3_lwysu1i9

This model is a fine-tuned version of Qwen/Qwen3-VL-8B-Instruct. It has been trained using TRL.

Quick start

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="None", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Training procedure

Visualize in Weights & Biases

This model was trained with SFT.

Framework versions

  • PEFT 0.18.1
  • TRL: 0.29.0
  • Transformers: 4.57.6
  • Pytorch: 2.10.0+cu126
  • Datasets: 4.6.0
  • Tokenizers: 0.22.2

Citations

Cite Automingo as:

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}

Cite TRL as:
    
```bibtex
@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}
Downloads last month
2
Safetensors
Model size
9B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support