English
Irena Gao commited on
Commit
e998648
1 Parent(s): f4bbbb6

init commit

Browse files
Files changed (2) hide show
  1. README.md +101 -0
  2. checkpoint.pt +3 -0
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ datasets:
4
+ - laion2b
5
+ ---
6
+
7
+ # OpenFlamingo-9B (CLIP ViT-L/14, MPT-7B)
8
+
9
+ [Blog post]() | [Code](https://github.com/mlfoundations/open_flamingo) | [Demo]()
10
+
11
+ OpenFlamingo is an open source implementation of DeepMind's [Flamingo](https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model) models.
12
+ This 9B-parameter model uses a [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14) vision encoder and [MPT-7B](https://huggingface.co/mosaicml/mpt-7b) language model.
13
+
14
+ ## Model Details
15
+ We follow the Flamingo modeling paradigm, outfitting the layers of a pretrained, frozen language model such that they cross-attend to visual features when decoding. Following Flamingo, we freeze the vision encoder and language model but train the connecting modules on web-scraped image-text sequences. Specifically, we use a mixture of [LAION-2B](https://arxiv.org/abs/2210.08402) and [Multimodal C4](https://arxiv.org/abs/2304.06939).
16
+
17
+ ## Uses
18
+ OpenFlamingo models process arbitrarily interleaved sequences of images and text to output text. This allows the models to accept in-context examples and undertake tasks like captioning, visual question answering, and image classification.
19
+
20
+ ### Bias, Risks, and Limitations
21
+ OpenFlamingo models inherit the risks of their parent models, especially the language model. As an open-source research effort, we highly value open, accessible, reproducible multimodal model research; however, it is crucial to be aware that these models are trained on web data, have not been finetuned for safety, and thus may produce unintended, inappropriate, unreliable, and/or inaccurate outputs. Please use caution before deploying OpenFlamingo models in real applications. We also hope that OpenFlamingo enables further safety and reliability research to address these issues.
22
+
23
+ In an effort to mitigate current potential biases and harms, we have deployed a text content filter on model outputs in the OpenFlamingo demo. We continue to red-team the model to understand and improve its safety.
24
+
25
+ ## Evaluation
26
+ <table>
27
+ <tr>
28
+ <th></th>
29
+ <th>0-shot</th>
30
+ <th>4-shot</th>
31
+ <th>8-shot</th>
32
+ <th>16-shot</th>
33
+ <th>32-shot</th>
34
+ </tr>
35
+ <tr>
36
+ <th>COCO (CIDEr)</th>
37
+ <td>79.5 (0.2)</td>
38
+ <td>89.0 (0.3)</td>
39
+ <td>96.3 (0.1)</td>
40
+ <td>98.8 (0.7)</td>
41
+ <td>99.5 (0.1)</td>
42
+ </tr>
43
+ <tr>
44
+ <th>VQAv2 (Accuracy)</th>
45
+ <td>48.3 (0.1)</td>
46
+ <td>49.4 (0.4)</td>
47
+ <td>51.8 (0.4)</td>
48
+ <td>51.3 (0.5)</td>
49
+ <td>50.2 (0.6)</td>
50
+ </tr>
51
+ <tr>
52
+ <th>Flickr-30K (CIDEr)</th>
53
+ <td>59.5 (1.0)</td>
54
+ <td>65.8 (0.6)</td>
55
+ <td>62.9 (1.0)</td>
56
+ <td>62.8 (1.0)</td>
57
+ <td>61.3 (0.7)</td>
58
+ </tr>
59
+ <tr>
60
+ <th>OK-VQA (Accuracy)</th>
61
+ <td>34.7 (0.1)</td>
62
+ <td>34.3 (0.1)</td>
63
+ <td>38.4 (0.0)</td>
64
+ <td>39.5 (0.1)</td>
65
+ <td>38.1 (0.0)</td>
66
+ </tr>
67
+ <tr>
68
+ <th>TextVQA (Accuracy)</th>
69
+ <td>24.2 (0.5)</td>
70
+ <td>28.2 (0.4)</td>
71
+ <td>29.1 (0.1)</td>
72
+ <td>27.3 (0.1)</td>
73
+ <td>23.8 (0.2)</td>
74
+ </tr>
75
+ <tr>
76
+ <th>Vizwiz (Accuracy)</th>
77
+ <td>17.7 (0.7)</td>
78
+ <td>23.1 (0.9)</td>
79
+ <td>31.6 (1.5)</td>
80
+ <td>38.0 (1.1)</td>
81
+ <td>40.2 (0.7)</td>
82
+ </tr>
83
+ <tr>
84
+ <th>ImageNet (Top-1 Accuracy)</th>
85
+ <td>-</td>
86
+ <td>-</td>
87
+ <td>-</td>
88
+ <td>-</td>
89
+ <td>-</td>
90
+ </tr>
91
+ <tr>
92
+ <th>Hateful Memes (ROC AUC)</th>
93
+ <td>-</td>
94
+ <td>-</td>
95
+ <td>-</td>
96
+ <td>-</td>
97
+ <td>-</td>
98
+ </tr>
99
+ </table
100
+
101
+
checkpoint.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed5a634ff8c022cf437ec245838a00b0c05bef6963524c5d0dfabe75ce701514
3
+ size 5539171941