Yuxuan-Qiao
/

PrismCaptioner-7B

Image-Text-to-Text

Model card Files Files and versions Community

Yuxuan-Qiao commited on Jun 21

Commit

2c3f6f0

•

1 Parent(s): 80e70c9

Update README.md

Files changed (1) hide show

README.md +38 -7

README.md CHANGED Viewed

@@ -1,7 +1,38 @@
----
-license: cc-by-4.0
-datasets:
-- FreedomIntelligence/ALLaVA-4V
-pipeline_tag: image-text-to-text
-library_name: prismcaptioner
----

+---
+license: cc-by-4.0
+datasets:
+- FreedomIntelligence/ALLaVA-4V
+pipeline_tag: image-text-to-text
+library_name: prismcaptioner
+---
+<br>
+# PrismCaptioner Model Card
+**Model details**
+PrismCaptioners are open-source captioners with LLaVA architecture finetuned on GPT4V-assisted dataset [ALLaVA](https://huggingface.co/datasets/FreedomIntelligence/ALLaVA-4V). We have released [PrismCaptioner-7B](https://huggingface.co/Yuxuan-Qiao/PrismCaptioner-7B) and [PrismCaptioner-2B](https://huggingface.co/Yuxuan-Qiao/PrismCaptioner-7B).
+PrismCaptioner-7B details
+- **Vision Backbone:** google/siglip-so400m-patch14-384
+- **Language Backbone:** internlm/internlm2-7b
+- **Dataset:** 1x ALLaVA-Caption-[LAION/VFLAN]
+**Paper and codebase for more information:**
+[[Paper](https://arxiv.org/abs/2406.14544)] [[Code](https://github.com/SparksJoe/Prism)]
+**Intended uses**
+- **Perception Module:** The model can be integrated into [Prism](https://github.com/SparksJoe/Prism) as a perception module to solve vision-language task by utilizing an external LLM.
+- **Effective Captioner:** The model can produce high-quality captions for given images.
+**Model Usage:**
+Clone the [Prism](https://github.com/SparksJoe/Prism) repo and complete the [preparation](https://github.com/SparksJoe/Prism/tree/main?tab=readme-ov-file#preparation). You can use PrismCaptioners following [usage](https://github.com/SparksJoe/Prism/blob/main/README.md#usage) or demo below.
+```python
+# In the Prism repo folder
+from decouple import supported_VLM
+model = supported_VLM['prismcaptioner-7b']()
+res = model.generate('assets/case1.png', 'Given the image below, please provide a detailed description of what you see.')
+```