internlm
/

CapRL-3B

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

yuhangzang commited on Sep 24

Commit

afbf8a4

·

verified ·

1 Parent(s): af935d2

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -23,4 +23,4 @@ stage uses LVLMs to generate rich and accurate captions. Subsequently, the secon
 caption quality by using a vision-only LLM to perform the QA task. We also created a specific QA
 curation pipeline to ensure the quality of the questions and answers used for the second stage.
-By employing our CapRL training framework, initializing with the Qwen2.5-VL-3B model, and using a carefully filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-3B.

 caption quality by using a vision-only LLM to perform the QA task. We also created a specific QA
 curation pipeline to ensure the quality of the questions and answers used for the second stage.
+By employing CapRL training framework, initializing with the Qwen2.5-VL-3B model, and using a carefully filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-3B.