Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ tags:
|
|
10 |
license: llama3
|
11 |
---
|
12 |
|
13 |
-
# **Typhoon-Vision
|
14 |
|
15 |
**llama-3-typhoon-v1.5-8b-vision-preview** is a 🇹🇭 Thai *vision-language* model. It supports both text and image input modalities natively while the output is text. This version (August 2024) is our first vision-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
|
16 |
|
@@ -111,16 +111,23 @@ output_ids = model.generate(
|
|
111 |
print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
|
112 |
```
|
113 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
114 |
# Intended Uses & Limitations
|
115 |
This model is experimental and might not be fully evaluated for all use cases. Developers should assess risks in the context of their specific applications.
|
116 |
|
117 |
-
# Follow
|
118 |
-
|
119 |
-
|
120 |
-
# Support
|
121 |
-
Discord: https://discord.gg/CqyBscMFpg
|
122 |
|
123 |
# Acknowledgements
|
124 |
-
|
125 |
-
|
126 |
-
|
|
|
|
10 |
license: llama3
|
11 |
---
|
12 |
|
13 |
+
# **Typhoon-Vision Preview**
|
14 |
|
15 |
**llama-3-typhoon-v1.5-8b-vision-preview** is a 🇹🇭 Thai *vision-language* model. It supports both text and image input modalities natively while the output is text. This version (August 2024) is our first vision-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
|
16 |
|
|
|
111 |
print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
|
112 |
```
|
113 |
|
114 |
+
# Evaluation Results
|
115 |
+
| Model | MMBench (Dev) | Pope | GQA | GQA (Thai) |
|
116 |
+
|:--|:--|:--|:--|:--|
|
117 |
+
| Typhoon-Vision 8B Preview | 70.9 | 84.8 | 62.0 | 43.6 |
|
118 |
+
| SeaLMMM 7B v0.1 | 64.8 | 86.3 | 61.4 | 25.3 |
|
119 |
+
| Bunny Llama3 8B Vision | 76.0 | 86.9 | 64.8 | 24.0 |
|
120 |
+
| GPT-4o Mini | 69.8 | 45.4 | 42.6 | 18.1 |
|
121 |
+
|
122 |
# Intended Uses & Limitations
|
123 |
This model is experimental and might not be fully evaluated for all use cases. Developers should assess risks in the context of their specific applications.
|
124 |
|
125 |
+
# Follow Us & Support
|
126 |
+
https://twitter.com/opentyphoon
|
127 |
+
https://discord.gg/CqyBscMFpg
|
|
|
|
|
128 |
|
129 |
# Acknowledgements
|
130 |
+
We would like to thank the Bunny team for open-sourcing their code and data, and thanks to the Google Team for releasing the fine-tuned SigLIP that allowed us to adopt its encoder. Thanks to many other open-source projects for their useful knowledge sharing, data, code, and model weights.
|
131 |
+
|
132 |
+
## Typhoon Team
|
133 |
+
Parinthapat Pengpun, Potsawee Manakul, Sittipong Sripaisarnmongkol, Natapong Nitarach, Warit Sirichotedumrong, Adisai Na-Thalang, Phatrasek Jirabovonvisut, Pathomporn Chokchainant, Kasima Tharnpipitchai, Kunat Pipatanakul
|