myownskyW7
commited on
Commit
•
9ca19f4
1
Parent(s):
4e46c2c
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
pipeline_tag: text-generation
|
4 |
+
---
|
5 |
+
|
6 |
+
|
7 |
+
<p align="center">
|
8 |
+
<img src="logo.png" width="400"/>
|
9 |
+
<p>
|
10 |
+
|
11 |
+
<p align="center">
|
12 |
+
<b><font size="6">InternLM-XComposer</font></b>
|
13 |
+
<p>
|
14 |
+
|
15 |
+
<div align="center">
|
16 |
+
|
17 |
+
[💻Github Repo](https://github.com/InternLM/InternLM-XComposer)
|
18 |
+
|
19 |
+
</div>
|
20 |
+
|
21 |
+
**InternLM-XComposer** is a vision-language large model (VLLM) based on [InternLM](https://github.com/InternLM/InternLM/tree/main) for advanced text-image comprehension and composition. InternLM-XComposer has serveal appealing properties:
|
22 |
+
|
23 |
+
- **Interleaved Text-Image Composition**: InternLM-XComposer can effortlessly generate coherent and contextual articles that seamlessly integrate images, providing a more engaging and immersive reading experience. The interleaved text-image composition is implemented in following steps:
|
24 |
+
|
25 |
+
1. **Text Generation**: It crafts long-form text based on human-provided instructions.
|
26 |
+
2. **Image Spoting and Captioning**: It pinpoints optimal locations for image placement and furnishes image descriptions.
|
27 |
+
3. **Image Retrieval and Selection**: It select image candidates and identify the image that optimally complements the content.
|
28 |
+
|
29 |
+
- **Comprehension with Rich Multilingual Knowledge**: The text-image comprehension is empowered by training on extensive multi-modal multilingual concepts with carefully crafted strategies, resulting in a deep understanding of visual content.
|
30 |
+
- **Strong performance**: It consistently achieves state-of-the-art results across various benchmarks for vision-language large models, including [MME Benchmark](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation) (English), [MMBench](https://opencompass.org.cn/leaderboard-multimodal) (English), [Seed-Bench](https://huggingface.co/spaces/AILab-CVC/SEED-Bench_Leaderboard) (English), [CCBench](https://opencompass.org.cn/leaderboard-multimodal)(Chinese), and [MMBench-CN](https://opencompass.org.cn/leaderboard-multimodal) (Chineese).
|
31 |
+
|
32 |
+
We release InternLM-XComposer series in two versions:
|
33 |
+
|
34 |
+
- InternLM-XComposer-VL: The pretrained VLLM model with InternLM as the initialization of the LLM, achieving strong performance on various multimodal benchmarks, e.g., MME Benchmark, MMBench Seed-Bench, CCBench, and MMBench-CN.
|
35 |
+
- InternLM-XComposer: The finetuned VLLM for *Interleaved Text-Image Composition* and *LLM-based AI assistant*.
|
36 |
+
<br>
|