VIM-Bench
/

v-mllm-7b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

FunCube commited on Jun 10, 2024

Commit

579585f

·

verified ·

1 Parent(s): 9379c3d

Update README.md

Files changed (1) hide show

README.md +45 -30

README.md CHANGED Viewed

@@ -1,30 +1,45 @@
----
-license: llama2
----
-# v-MLLM Model Card
-## Model details
-**Model type:**
-v-MLLM is an open-source MLLM trained on Visual-Modality Instruction (VIM) corpus, it can robustly follow the text-modality instructions and visual-modality instructions.
-**Model date:**
-v-MLLM-7B was trained on January 2024.
-**Github for more information:**
-https://github.com/VIM-Bench/VIM_TOOL
-## License
-v-MLLM is licensed under the LLAMA 2 Community License,
-Copyright (c) Meta Platforms, Inc. All Rights Reserved.
-## Intended use
-**Primary intended uses:**
-The primary use of v-MLLM is research on multimodal large language models.
-**Primary intended users:**
-The primary intended users of the model are researchers in computer vision, natural language processing, machine learning, and artificial intelligence.
-## Training dataset
-- 846k VIM corpus based on LVIS-Instruct4V corpus.

+---
+license: llama2
+---
+# v-MLLM Model Card
+## Model details
+**Model type:**
+v-MLLM is an open-source MLLM trained on Visual-Modality Instruction (VIM) corpus, it can robustly follow the text-modality instructions and visual-modality instructions.
+**Model date:**
+v-MLLM-7B was trained on January 2024.
+**Github for more information:**
+https://github.com/VIM-Bench/VIM_TOOL
+## License
+v-MLLM is licensed under the LLAMA 2 Community License,
+Copyright (c) Meta Platforms, Inc. All Rights Reserved.
+## Intended use
+**Primary intended uses:**
+The primary use of v-MLLM is research on multimodal large language models.
+**Primary intended users:**
+The primary intended users of the model are researchers in computer vision, natural language processing, machine learning, and artificial intelligence.
+## Training dataset
+- 846k VIM corpus based on LVIS-Instruct4V corpus.
+# Citation
+Please kindly cite our paper if you find our resources useful:
+```
+@misc{lu2023vim,
+      title={VIM: Probing Multimodal Large Language Models for Visual Embedded Instruction Following},
+      author={Yujie Lu and Xiujun Li and William Yang Wang and Yejin Choi},
+      year={2023},
+      eprint={2311.17647},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```