Vilnius Bus Stop LLM

A LoRA adapter fine-tuned on Qwen3-VL-2B-Instruct to recognize Vilnius bus stops in images and describe them in Lithuanian.

Model Details

  • Model type: Vision-Language Model (LoRA adapter)
  • Base model: unsloth/Qwen3-VL-2B-Instruct
  • Language: Lithuanian (lt)
  • Fine-tuning framework: Unsloth
  • Task: Image captioning of bus stops in Lithuanian

How to Get Started

from peft import PeftModel
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
import torch

base_model = "unsloth/Qwen3-VL-2B-Instruct"
adapter = "user55442/Vilnius-Bus-Stop-LLM"

model = Qwen2VLForConditionalGeneration.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    ignore_mismatched_sizes=True
)
model = PeftModel.from_pretrained(model, adapter)
processor = AutoProcessor.from_pretrained(base_model)

Training Details

Dataset

  • 150 daytime images of Vilnius bus stops, captured from varying angles and distances
  • Captions generated in English by Gemini, then translated to Lithuanian
  • 80/20 train/eval split โ†’ 120 training images, 30 test images

Training Procedure

  • Epochs: 8 (optimal checkpoint ~step 120 before overfitting)
  • Batch size: 1 with gradient accumulation over 4 steps
  • Learning rate: 1e-4 (AdamW 8-bit optimizer)
  • Precision: bfloat16
  • Image resolution: max 768ร—768
  • LoRA target layers: language and attention layers (vision layers frozen)

Evaluation Results

Intrinsic Metrics

Metric Base Fine-tuned
ROUGE-L 0.014 0.163
Semantic Similarity 0.731 0.801
BLEU 0.339 10.130
BERTScore F1 0.811 0.864
Perplexity 14.170 6544.885

LLM Judge Scores (Gemma-4-31B, scale 1โ€“10)

Metric Base Fine-tuned
Fluency 9.77 5.67
Relevance 8.47 6.37
Factual Accuracy 7.60 5.20
Creativity 8.80 5.40

Limitations

  • Trained on only 120 images โ€” model shows signs of overfitting after ~120 steps
  • Perplexity increased sharply (14 โ†’ 6544), suggesting the model partially overfit to caption style
  • LLM judge noted grammatical errors, hallucinations, and incomplete sentences in some outputs
  • Performance may degrade on bus stops outside Vilnius or in different lighting conditions

Framework Versions

  • PEFT 0.19.1
  • Unsloth
  • Transformers
Downloads last month
72
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for user55442/Vilnius-Bus-Stop-LLM

Adapter
(4)
this model