CLLG/Qwen3VL-8B-GGUF

Quantized GGUF versions of the CLLG Qwen3-VL 8B model, fine-tuned for ancient Greek document understanding and TEI XML encoding.

This model is part of the Corpus Liberatum Linguae Graecae (CLLG) project, which aims to build a freely accessible, high-quality corpus of ancient Greek texts. It is supported by the ANR within the Programme Inria Quadrant (PIQ).

Model Description

This vision-language model is trained to process images of critical editions of ancient Greek (and Latin) texts and produce structured TEI XML output. It handles the complex page layouts typical of scholarly editions, including:

  • Main text in polytonic Greek
  • Canonical references (section, paragraph, line numbers)
  • Titles and headings
  • Footnotes and apparatus criticus markers

GGUF Files

You need both the language model file and the multimodal projector to run this model.

Filename Quantization Size Use case
Qwen3VL-8B-synth_real.Q4_K_M.gguf Q4_K_M ~5 GB Recommended โ€” good balance of size and quality
Qwen3VL-8B-synth_real.Q5_K_M.gguf Q5_K_M ~6 GB Higher quality
Qwen3VL-8B-synth_real.Q8_0.gguf Q8_0 ~9 GB Near-lossless
mmproj-BF16.gguf BF16 โ€” Vision projector โ€” required for all variants

Usage

llama-cli \
  --model Qwen3VL-8B-synth_real.Q4_K_M.gguf \
  --mmproj mmproj-BF16.gguf \
  --image page.jpg \
  --prompt "Encode the following page in TEI XML."

Training Data

Fine-tuned on synthetic page images generated by the CLLG pipeline, covering approximately 175,000 Greek pages and 10,000 Latin pages drawn from 4,582 works, with over 5,000 typographic style combinations, complemented by real annotated document pages (synth_real suffix).

Intended Uses

  • Automatic TEI XML encoding of ancient Greek critical editions
  • Layout analysis and canonical reference detection in scholarly documents
  • Research in digital philology and computational humanities

Out of Scope

  • Modern Greek text
  • Non-document image understanding tasks
  • Apparatus criticus and complex critical apparatus (current focus is prose text)
  • Poetry and Drama

Project & Funding

This model is developed as part of the CLLG project, funded by the ANR within the PIQ initiative. Institutional partners include Persรฉe and Biblissima.

Citation

If you use this model in your research, please cite the CLLG project and acknowledge ANR/PIQ funding.

License

Apache 2.0 โ€” see LICENSE.

Downloads last month
443
GGUF
Model size
8B params
Architecture
qwen3vl
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for CLLG/Qwen3VL-8B-GGUF

Finetuned
Qwen/Qwen3-8B
Quantized
(286)
this model