xiaotianhan
commited on
Commit
β’
743b1c4
1
Parent(s):
6dbdea3
Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,39 @@ In particular, InfiMM integrates the latest LLM models into VLM domain the revea
|
|
33 |
Please note that InfiMM is currently in beta stage and we are continuously working on improving it.
|
34 |
|
35 |
## News
|
|
|
36 |
- π **[2024.03.02]** We release the [InfiMM-HD](https://huggingface.co/Infi-MM/infimm-hd).
|
37 |
- π **[2024.01.11]** We release the first set of MLLMs we trained [InfiMM-Zephyr](https://huggingface.co/Infi-MM/infimm-zephyr), [InfiMM-LLaMA13B](https://huggingface.co/Infi-MM/infimm-llama13b) and [InfiMM-Vicuna13B](https://huggingface.co/Infi-MM/infimm-vicuna13b).
|
38 |
- π **[2024.01.10]** We release a survey about Multimodal Large Language Models (MLLMs) reasoning capability at [here](https://huggingface.co/papers/2401.06805).
|
39 |
-
- π **[2023.11.18]** We release InfiMM-Eval at [here](https://arxiv.org/abs/2311.11567), an Open-ended VQA benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks. The leaderboard can be found via [Papers with Code](https://paperswithcode.com/sota/visual-question-answering-vqa-on-core-mm) or [project page](https://infimm.github.io/InfiMM-Eval/).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
Please note that InfiMM is currently in beta stage and we are continuously working on improving it.
|
34 |
|
35 |
## News
|
36 |
+
- π **[2024.08.15]** Our paper was accepted by ACL 2023 [InfiMM](https://aclanthology.org/2024.findings-acl.27/).
|
37 |
- π **[2024.03.02]** We release the [InfiMM-HD](https://huggingface.co/Infi-MM/infimm-hd).
|
38 |
- π **[2024.01.11]** We release the first set of MLLMs we trained [InfiMM-Zephyr](https://huggingface.co/Infi-MM/infimm-zephyr), [InfiMM-LLaMA13B](https://huggingface.co/Infi-MM/infimm-llama13b) and [InfiMM-Vicuna13B](https://huggingface.co/Infi-MM/infimm-vicuna13b).
|
39 |
- π **[2024.01.10]** We release a survey about Multimodal Large Language Models (MLLMs) reasoning capability at [here](https://huggingface.co/papers/2401.06805).
|
40 |
+
- π **[2023.11.18]** We release InfiMM-Eval at [here](https://arxiv.org/abs/2311.11567), an Open-ended VQA benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks. The leaderboard can be found via [Papers with Code](https://paperswithcode.com/sota/visual-question-answering-vqa-on-core-mm) or [project page](https://infimm.github.io/InfiMM-Eval/).
|
41 |
+
|
42 |
+
## Citation
|
43 |
+
```
|
44 |
+
@inproceedings{liu-etal-2024-infimm,
|
45 |
+
title = "{I}nfi{MM}: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model",
|
46 |
+
author = "Liu, Haogeng and
|
47 |
+
You, Quanzeng and
|
48 |
+
Wang, Yiqi and
|
49 |
+
Han, Xiaotian and
|
50 |
+
Zhai, Bohan and
|
51 |
+
Liu, Yongfei and
|
52 |
+
Chen, Wentao and
|
53 |
+
Jian, Yiren and
|
54 |
+
Tao, Yunzhe and
|
55 |
+
Yuan, Jianbo and
|
56 |
+
He, Ran and
|
57 |
+
Yang, Hongxia",
|
58 |
+
editor = "Ku, Lun-Wei and
|
59 |
+
Martins, Andre and
|
60 |
+
Srikumar, Vivek",
|
61 |
+
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
|
62 |
+
month = aug,
|
63 |
+
year = "2024",
|
64 |
+
address = "Bangkok, Thailand and virtual meeting",
|
65 |
+
publisher = "Association for Computational Linguistics",
|
66 |
+
url = "https://aclanthology.org/2024.findings-acl.27",
|
67 |
+
pages = "485--492",
|
68 |
+
abstract = "In this work, we present InfiMM, an advanced Multimodal Large Language Model that adapts to intricate vision-language tasks. InfiMM, inspired by the Flamingo architecture, distinguishes itself through the utilization of large-scale training data, comprehensive training strategies, and diverse large language models. This approach ensures the preservation of Flamingo{'}s foundational strengths while simultaneously introducing augmented capabilities. Empirical evaluations across a variety of benchmarks underscore InfiMM{'}s remarkable capability in multimodal understanding. The code can be found at: https://anonymous.4open.science/r/infimm-zephyr-F60C/.",
|
69 |
+
}
|
70 |
+
|
71 |
+
```
|