GigScan โ€” Fine-tuned MiniCPM-V 4.6 for Gig Poster Extraction

GigScan is a fine-tuned version of MiniCPM-V 4.6 trained to extract structured event information from live music gig posters.

Point your phone at a gig poster pinned to a wall, and this model returns structured JSON with the event name, venue, date, start time, and a short description โ€” ready to turn into a calendar invite.

Built for the Build Small Hackathon (June 2026), GigScan competes in the Backyard AI track.

Model description

  • Base model: openbmb/MiniCPM-V-4.6 (1.3B parameters, multimodal vision-language)
  • Fine-tuning method: LoRA (all linear layers) via LLaMA-Factory
  • Training data: 276 labeled gig poster images + 110 non-poster negatives
  • Format: Q4_K_M quantized GGUF for local CPU/GPU inference with llama.cpp

Intended use

GigScan is designed to be deployed in a mobile-friendly web app where users photograph gig posters and receive structured event data. The intended output is:

{
  "is_live_music_poster": true,
  "event_name": "The Chats",
  "venue": "The Tote",
  "date": "06-06",
  "time_start": "20:00",
  "description": "Live at The Tote with Leatherman. Tickets $20."
}

Non-poster images return:

{
  "is_live_music_poster": false,
  "event_name": "",
  "venue": null,
  "date": null,
  "time_start": null,
  "description": ""
}

Performance

Evaluated on a held-out test set of 70 images (47 positives, 23 negatives):

Metric Base Model Fine-tuned
Valid JSON output 92.9% 100%
All fields present 92.9% 100%
Date format (DD-MM) 34.3% 100%
Time format (HH:MM) 81.4% 100%
Poster detection accuracy 78.6% 100%
Hallucination on negatives Yes None

The fine-tuned model eliminated date formatting errors, poster misclassification, and hallucinated events on non-poster images entirely on the test set.

Training procedure

Training data

The dataset consists of 386 labeled images:

  • Positives (276): Scraped gig posters from Instagram and other sources, labeled automatically by GPT-5-nano with the target JSON schema, then manually reviewed.
  • Negatives (110): Non-poster images sampled from Food101 and Places365 datasets.

The dataset is available at kieranadair/gigscan-training.

Training configuration

Parameter Value
Framework LLaMA-Factory
Method LoRA (all linear layers)
Epochs 3
Batch size (effective) 8
Learning rate 5e-5
Warmup 10% of total steps
Precision BF16
GPU NVIDIA A10G (24 GB)
Training time ~45 minutes

Conversion to GGUF

The merged LoRA model was converted to GGUF using llama.cpp's convert_hf_to_gguf.py (release b9049+) then quantized to Q4_K_M:

python convert_hf_to_gguf.py /merged-model --outfile gigscan-f16.gguf --outtype f16
python convert_hf_to_gguf.py /merged-model --mmproj --outfile mmproj-gigscan-f16.gguf
./llama-quantize gigscan-f16.gguf gigscan-q4_k_m.gguf Q4_K_M

The vision projector (mmproj) is kept at F16 precision. The language model is quantized to approximately 900 MB.

How to use

With llama.cpp server

./llama-server \
  -m gigscan-q4_k_m.gguf \
  --mmproj mmproj-gigscan-f16.gguf \
  -c 4096 \
  -ngl 999 \
  --reasoning-budget 0 \
  --host 127.0.0.1 \
  --port 8080

Then send requests to http://127.0.0.1:8080/v1/chat/completions:

import requests

response = requests.post("http://127.0.0.1:8080/v1/chat/completions", json={
    "model": "gigscan-minicpm-v",
    "temperature": 0,
    "max_tokens": 300,
    "messages": [
        {"role": "system", "content": "Return only the answer."},
        {"role": "user", "content": [
            {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}},
            {"type": "text", "text": "Extract gig poster details. Return JSON."}
        ]}
    ]
})

print(response.json()["choices"][0]["message"]["content"])

Recommended prompt

For best results, use a prompt that explicitly lists the expected keys and allows null for missing fields. The fine-tuned model was trained on "Extract gig poster details from this image." but benefits from a more detailed prompt at inference time that specifies the JSON contract and null behavior.

Limitations

  • Poster format: Trained primarily on clean digital poster images. Performance may degrade on blurry, angled, or poorly lit real-world photos.
  • Single event assumption: Designed for single-event posters. Multi-date tour posters or festival lineups may produce unreliable results.
  • Language: Training data is English-language posters only.
  • Date inference: The model returns DD-MM format. Year inference is handled by application logic, not the model.
  • Venue and time extraction: Can be conservative โ€” missing venue or time is returned as null rather than guessed. This is intentional.

Deployment

GigScan is deployed as a Hugging Face Space at kieranadair/gigscan. The Space runs the quantized model locally via llama.cpp server on a T4 GPU.

Acknowledgements

Downloads last month
202
GGUF
Model size
0.8B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for build-small-hackathon/gigscan-minicpm-v-gguf

Quantized
(23)
this model

Spaces using build-small-hackathon/gigscan-minicpm-v-gguf 6