English
Irena Gao commited on
Commit
6c5b16d
1 Parent(s): ef79553

init commit

Browse files
Files changed (2) hide show
  1. README.md +99 -0
  2. checkpoint.pt +3 -0
README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ datasets:
4
+ - laion2b
5
+ ---
6
+
7
+ # OpenFlamingo-4B (CLIP ViT-L/14, RedPajama-INCITE-Instruct-3B-v1)
8
+
9
+ [Blog post]() | [Code](https://github.com/mlfoundations/open_flamingo) | [Demo]()
10
+
11
+ OpenFlamingo is an open source implementation of DeepMind's [Flamingo](https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model) models.
12
+ This 4B-parameter model uses a [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14) vision encoder and an instruction tuned [RedPajama-3B](https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-3B-v1) language model.
13
+
14
+ ## Model Details
15
+ We follow the Flamingo modeling paradigm, outfitting the layers of a pretrained, frozen language model such that they cross-attend to visual features when decoding. Following Flamingo, we freeze the vision encoder and language model but train the connecting modules on web-scraped image-text sequences. Specifically, we use a mixture of [LAION-2B](https://arxiv.org/abs/2210.08402) and [Multimodal C4](https://arxiv.org/abs/2304.06939).
16
+
17
+ ## Uses
18
+ OpenFlamingo models process arbitrarily interleaved sequences of images and text to output text. This allows the models to accept in-context examples and undertake tasks like captioning, visual question answering, and image classification.
19
+
20
+ ### Bias, Risks, and Limitations
21
+ OpenFlamingo models inherit the risks of their parent models, especially the language model. As an open-source research effort, we highly value open, accessible, reproducible multimodal model research; however, it is crucial to be aware that these models are trained on web data, have not been finetuned for safety, and thus may produce unintended, inappropriate, unreliable, and/or inaccurate outputs. Please use caution before deploying OpenFlamingo models in real applications. We also hope that OpenFlamingo enables further safety and reliability research to address these issues.
22
+
23
+ In an effort to mitigate current potential biases and harms, we have deployed a text content filter on model outputs in the OpenFlamingo demo. We continue to red-team the model to understand and improve its safety.
24
+
25
+ ## Evaluation
26
+ <table>
27
+ <tr>
28
+ <th></th>
29
+ <th>0-shot</th>
30
+ <th>4-shot</th>
31
+ <th>8-shot</th>
32
+ <th>16-shot</th>
33
+ <th>32-shot</th>
34
+ </tr>
35
+ <tr>
36
+ <th>COCO (CIDEr)</th>
37
+ <td>81.2 (0.3)</td>
38
+ <td>85.8 (0.5)</td>
39
+ <td>94.8 (0.2)</td>
40
+ <td>98.0 (0.3)</td>
41
+ <td>99.2 (0.3)</td>
42
+ </tr>
43
+ <tr>
44
+ <th>VQAv2 (Accuracy)</th>
45
+ <td>44.5 (0.2)</td>
46
+ <td>47.5 (0.3)</td>
47
+ <td>45.7 (0.3)</td>
48
+ <td>44.3 (0.6)</td>
49
+ <td>45.8 (0.0)</td>
50
+ </tr>
51
+ <tr>
52
+ <th>Flickr-30K (CIDEr)</th>
53
+ <td>55.6 (1.3)</td>
54
+ <td>61.2 (0.5)</td>
55
+ <td>59.0 (1.0)</td>
56
+ <td>54.8 (1.0)</td>
57
+ <td>53.0 (0.5)</td>
58
+ </tr>
59
+ <tr>
60
+ <th>OK-VQA (Accuracy)</th>
61
+ <td>29.7 (0.2)</td>
62
+ <td>34.3 (0.2)</td>
63
+ <td>32.4 (0.2)</td>
64
+ <td>30.7 (0.3)</td>
65
+ <td>32.5 (0.1)</td>
66
+ </tr>
67
+ <tr>
68
+ <th>TextVQA (Accuracy)</th>
69
+ <td>21.1 (0.4)</td>
70
+ <td>27.2 (0.3)</td>
71
+ <td>25.1 (0.2)</td>
72
+ <td>23.2 (0.1)</td>
73
+ <td>23.2 (0.2)</td>
74
+ </tr>
75
+ <tr>
76
+ <th>Vizwiz (Accuracy)</th>
77
+ <td>14.9 (0.1)</td>
78
+ <td>21.0 (0.4)</td>
79
+ <td>27.1 (1.4)</td>
80
+ <td>-</td>
81
+ <td>37.1 (0.3)</td>
82
+ </tr>
83
+ <tr>
84
+ <th>ImageNet (Top-1 Accuracy)</th>
85
+ <td>-</td>
86
+ <td>-</td>
87
+ <td>-</td>
88
+ <td>-</td>
89
+ <td>-</td>
90
+ </tr>
91
+ <tr>
92
+ <th>Hateful Memes (ROC AUC)</th>
93
+ <td>-</td>
94
+ <td>-</td>
95
+ <td>-</td>
96
+ <td>-</td>
97
+ <td>-</td>
98
+ </tr>
99
+ </table>
checkpoint.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:def3dbd2e3cbf471019bc5d8ee854fc644b3eca62dd3e0fc81e76dd6e0363c06
3
+ size 15077874322