Lin-Chen
/

ShareGPT4V-13B

Text Generation

Model card Files Files and versions Community

chenlin commited on Dec 14, 2023

Commit

6f8da97

·

1 Parent(s): 3a82f72

init

Files changed (2) hide show

README.md +38 -1
config.json +1 -1

README.md CHANGED Viewed

@@ -1,3 +1,40 @@
 ---
-license: apache-2.0
 ---

 ---
+inference: false
 ---
+<br>
+<br>
+# ShareGPT4V-13B Model Card
+## Model details
+**Model type:**
+ShareGPT4V-13B is an open-source chatbot trained by fine-tuning CLP vision tower and LLaMA/Vicuna on GPT4-Vision-assisted [ShareGPT4V](https://huggingface.co/datasets/Lin-Chen/ShareGPT4V) data and LLaVA instruction-tuning data.
+**Model date:**
+ShareGPT4V-13B was trained in Nov 2023.
+**Paper or resources for more information:**
+[[Project](https://ShareGPT4V.github.io/)] [[Paper](https://huggingface.co/papers/2311.12793)] [[Code](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V)]
+## Usage
+You can directly utilize this model as we provide in our [[repository](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V)]. Moreover, you can modify the architecture name from "Share4VLlamaForCausalLM" to "LLaVALlamaForCausalLM" and the model_type keyword from "share4v" to "llava" in our config file and seamlessly load our model in the [[LLaVA repository](https://github.com/haotian-liu/LLaVA)].
+## License
+Llama 2 is licensed under the LLAMA 2 Community License,
+Copyright (c) Meta Platforms, Inc. All Rights Reserved.
+## Intended use
+**Primary intended uses:**
+The primary use of ShareGPT4V-13B is research on large multimodal models and chatbots.
+**Primary intended users:**
+The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
+## Training dataset
+- 1.2M high-quality image-text pairs, i.e., ShareGPT4V-PT data
+- 100K GPT4-Vision-generated image-text pairs
+- LLaVA instruction-tuning data
+## Evaluation dataset
+A collection of 11 benchmarks

config.json CHANGED Viewed

@@ -21,7 +21,7 @@
   "mm_use_im_start_end": false,
   "mm_vision_select_feature": "patch",
   "mm_vision_select_layer": -2,
-  "mm_vision_tower": "pretrained/vision_encoder/ShareGPT4V-13B_Pretrained_vit-large336-l12",
   "model_type": "share4v",
   "num_attention_heads": 40,
   "num_hidden_layers": 40,

   "mm_use_im_start_end": false,
   "mm_vision_select_feature": "patch",
   "mm_vision_select_layer": -2,
+  "mm_vision_tower": "Lin-Chen/ShareGPT4V-13B_Pretrained_vit-large336-l12",
   "model_type": "share4v",
   "num_attention_heads": 40,
   "num_hidden_layers": 40,