Yuxuan-Qiao commited on
Commit
2c3f6f0
1 Parent(s): 80e70c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -7
README.md CHANGED
@@ -1,7 +1,38 @@
1
- ---
2
- license: cc-by-4.0
3
- datasets:
4
- - FreedomIntelligence/ALLaVA-4V
5
- pipeline_tag: image-text-to-text
6
- library_name: prismcaptioner
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ datasets:
4
+ - FreedomIntelligence/ALLaVA-4V
5
+ pipeline_tag: image-text-to-text
6
+ library_name: prismcaptioner
7
+ ---
8
+ <br>
9
+
10
+ # PrismCaptioner Model Card
11
+
12
+ **Model details**
13
+
14
+ PrismCaptioners are open-source captioners with LLaVA architecture finetuned on GPT4V-assisted dataset [ALLaVA](https://huggingface.co/datasets/FreedomIntelligence/ALLaVA-4V). We have released [PrismCaptioner-7B](https://huggingface.co/Yuxuan-Qiao/PrismCaptioner-7B) and [PrismCaptioner-2B](https://huggingface.co/Yuxuan-Qiao/PrismCaptioner-7B).
15
+
16
+ PrismCaptioner-7B details
17
+ - **Vision Backbone:** google/siglip-so400m-patch14-384
18
+ - **Language Backbone:** internlm/internlm2-7b
19
+ - **Dataset:** 1x ALLaVA-Caption-[LAION/VFLAN]
20
+
21
+ **Paper and codebase for more information:**
22
+ [[Paper](https://arxiv.org/abs/2406.14544)] [[Code](https://github.com/SparksJoe/Prism)]
23
+
24
+ **Intended uses**
25
+ - **Perception Module:** The model can be integrated into [Prism](https://github.com/SparksJoe/Prism) as a perception module to solve vision-language task by utilizing an external LLM.
26
+ - **Effective Captioner:** The model can produce high-quality captions for given images.
27
+
28
+ **Model Usage:**
29
+
30
+ Clone the [Prism](https://github.com/SparksJoe/Prism) repo and complete the [preparation](https://github.com/SparksJoe/Prism/tree/main?tab=readme-ov-file#preparation). You can use PrismCaptioners following [usage](https://github.com/SparksJoe/Prism/blob/main/README.md#usage) or demo below.
31
+
32
+ ```python
33
+ # In the Prism repo folder
34
+ from decouple import supported_VLM
35
+
36
+ model = supported_VLM['prismcaptioner-7b']()
37
+ res = model.generate('assets/case1.png', 'Given the image below, please provide a detailed description of what you see.')
38
+ ```