Gemma 4 31B-it — Anti-UAV Scene Captioner (LoRA)

A LoRA adapter for google/gemma-4-31B-it trained to describe still frames from anti-UAV surveillance camera feeds — drone presence, position in frame, sky conditions, and visible scene structure.

Trained as the captioner stage of a chained drone-pipeline: YOLO detector → ByteTrack → Gemma 4 captioner (this).

Training


base	google/gemma-4-31B-it
method	LoRA (4-bit nf4) — Google cookbook recipe (eager attn, bf16 quant storage)
LoRA r / α	16 / 16, target_modules="all-linear"
training data	658 (frame, caption) pairs from Anti-UAV-RGBT, captions produced by Qwen2.5-VL-7B teacher
epochs / steps	2 / 166
effective batch	8 (1 × grad-accum 8)
LR	2e-4 constant, max_grad_norm 0.3
eval loss	0.179 (down from 0.241 first eval)
eval token accuracy	93.4%
hardware	3× NVIDIA RTX 3090 (model parallelism via balanced device_map, ~8GB/GPU)

Use

from peft import PeftModel
from transformers import AutoProcessor, AutoModelForImageTextToText, BitsAndBytesConfig
import torch

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                        bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_quant_storage=torch.bfloat16)
base = AutoModelForImageTextToText.from_pretrained(
    "google/gemma-4-31B-it",
    quantization_config=bnb, attn_implementation="eager",
    dtype=torch.bfloat16, device_map="auto",
)
model = PeftModel.from_pretrained(base, "sapoepsilon/gemma4-31b-drone-captioner")
processor = AutoProcessor.from_pretrained("google/gemma-4-31B-it")

Caveats

Captions are derived from a VLM teacher (Qwen2.5-VL-7B), not human labels — supervision is noisy and inherits the teacher's biases
Trained on a narrow distribution: anti-UAV surveillance reticle/HUD imagery (Anti-UAV-RGBT). Out-of-distribution frames may degrade
Style is fairly templated ("The image shows a drone presence ...") which is intentional for downstream parsing but may sound formulaic

License

Adapter weights: Apache 2.0. Base model retains its original Google Gemma license.

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sapoepsilon/gemma4-31b-drone-captioner

Base model

google/gemma-4-31B

Finetuned

google/gemma-4-31B-it

Adapter

(109)

this model