Comics Pick-A-Panel
Collection
Dataset, Models and Paper from ComicsPAP: understanding comic strips by picking the correct panel
•
4 items
•
Updated
•
1
Qwen2.5-VL-3B-Instruct fine-tunined simultaneously in all five tasks of the ComicsPAP dataset. The training was performed using a constant learning rate of 2e-4 with the AdamW optimizer. The model was trained for 5k steps using an effective batch size of 128. The LoRA configuration employed an α of 16, a dropout rate of 0.05, and a rank r = 8.
Model | Repo | Sequence Filling (%) | Character Coherence (%) | Visual Closure (%) | Text Closure (%) | Caption Relevance (%) | Total (%) |
---|---|---|---|---|---|---|---|
Random | 20.22 | 50.00 | 14.41 | 25.00 | 25.00 | 24.30 | |
Qwen2.5-VL-3B (Zero-Shot) | Qwen/Qwen2.5-VL-3B-Instruct | 27.48 | 48.95 | 21.33 | 27.41 | 32.82 | 29.61 |
Qwen2.5-VL-7B (Zero-Shot) | Qwen/Qwen2.5-VL-7B-Instruct | 30.53 | 54.55 | 22.00 | 37.45 | 40.84 | 34.91 |
Qwen2.5-VL-72B (Zero-Shot) | Qwen/Qwen2.5-VL-72B-Instruct | 46.88 | 53.84 | 23.66 | 55.60 | 38.17 | 41.27 |
Qwen2.5-VL-3B (Lora Fine-Tuned) | VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP | 62.21 | 93.01 | 42.33 | 63.71 | 35.49 | 55.55 |
Qwen2.5-VL-7B (Lora Fine-Tuned) | VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP | 69.08 | 93.01 | 42.00 | 74.90 | 49.62 | 62.31 |
BibTeX:
@misc{vivoli2025comicspap,
title={ComicsPAP: understanding comic strips by picking the correct panel},
author={Emanuele Vivoli and Artemis Llabrés and Mohamed Ali Soubgui and Marco Bertini and Ernest Valveny Llobet and Dimosthenis Karatzas},
year={2025},
eprint={2503.08561},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.08561},
}
@misc{qwen2.5-VL,
title = {Qwen2.5-VL},
url = {https://qwenlm.github.io/blog/qwen2.5-vl/},
author = {Qwen Team},
month = {January},
year = {2025}
}
Unable to build the model tree, the base model loops to the model itself. Learn more.