Vchitect
/

ShotVL-7B

Image-Text-to-Text

vision-language

text-generation-inference

Model card Files Files and versions

Alexislhb commited on Sep 19

Commit

fd8c356

·

verified ·

1 Parent(s): a16ab73

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ library_name: transformers
 ## Model description
-This model is a fine-tuned version of [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), introduced in the paper [ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models](https://huggingface.co/papers/2506.21356). It is trained by supervised fine-tuning on the largest and high-quality dataset for cinematic language understanding to date. It currently achieves state-of-the-art performance on [ShotBench](https://vchitect.github.io/ShotBench-project/), a comprehensive benchmark for evaluating cinematography understanding in vision-language models.
 **Project Page:** [https://vchitect.github.io/ShotBench-project/](https://vchitect.github.io/ShotBench-project/)

 ## Model description
+This model is a fine-tuned version of [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), introduced in the paper [ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models](https://huggingface.co/papers/2506.21356). It is trained on the largest and high-quality dataset for cinematic language understanding to date. It currently achieves state-of-the-art performance on [ShotBench](https://vchitect.github.io/ShotBench-project/), a comprehensive benchmark for evaluating cinematography understanding in vision-language models.
 **Project Page:** [https://vchitect.github.io/ShotBench-project/](https://vchitect.github.io/ShotBench-project/)