Transformers
Safetensors
English
comics
Inference Endpoints

Lora Fine-Tune of Qwen2.5-VL-3B-Instruct on ComicsPAP datataset

Qwen2.5-VL-7B-Instruct fine-tunined simultaneously in all five tasks of the ComicsPAP dataset. The training was performed using a constant learning rate of 2e-4 with the AdamW optimizer. The model was trained for 5k steps using an effective batch size of 128. The LoRA configuration employed an α of 16, a dropout rate of 0.05, and a rank r = 8.

Results

Model Repo Sequence Filling (%) Character Coherence (%) Visual Closure (%) Text Closure (%) Caption Relevance (%) Total (%)
Random 20.22 50.00 14.41 25.00 25.00 24.30
Qwen2.5-VL-3B (Zero-Shot) Qwen/Qwen2.5-VL-3B-Instruct 27.48 48.95 21.33 27.41 32.82 29.61
Qwen2.5-VL-7B (Zero-Shot) Qwen/Qwen2.5-VL-7B-Instruct 30.53 54.55 22.00 37.45 40.84 34.91
Qwen2.5-VL-72B (Zero-Shot) Qwen/Qwen2.5-VL-72B-Instruct 46.88 53.84 23.66 55.60 38.17 41.27
Qwen2.5-VL-3B (Lora Fine-Tuned) VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP 62.21 93.01 42.33 63.71 35.49 55.55
Qwen2.5-VL-7B (Lora Fine-Tuned) VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP 69.08 93.01 42.00 74.90 49.62 62.31

Citation

BibTeX:

@misc{vivoli2025comicspap,
      title={ComicsPAP: understanding comic strips by picking the correct panel}, 
      author={Emanuele Vivoli and Artemis Llabrés and Mohamed Ali Soubgui and Marco Bertini and Ernest Valveny Llobet and Dimosthenis Karatzas},
      year={2025},
      eprint={2503.08561},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.08561}, 
}

@misc{qwen2.5-VL,
    title = {Qwen2.5-VL},
    url = {https://qwenlm.github.io/blog/qwen2.5-vl/},
    author = {Qwen Team},
    month = {January},
    year = {2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP

Finetuned
(65)
this model

Dataset used to train VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP

Collection including VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP