Seungyoun commited on
Commit
01df22c
1 Parent(s): c13f9aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -0
README.md CHANGED
@@ -1,3 +1,85 @@
1
  ---
 
 
 
 
2
  license: llama3
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets:
3
+ - Lin-Chen/ShareGPT4V
4
+ pipeline_tag: image-text-to-text
5
+ library_name: xtuner
6
  license: llama3
7
  ---
8
+
9
+ ! Notice: This version of the `llava-llama-3-8b-v1_1-hf` model has been manually modified to ensure compatibility with the pure Transformers library. The original model faced *loading issues* which have been addressed in this update. For users seeking to deploy this model using the Transformers library, please ensure you are using this modified version for optimal performance and compatibility.
10
+
11
+ <div align="center">
12
+ <img src="https://github.com/InternLM/lmdeploy/assets/36994684/0cf8d00f-e86b-40ba-9b54-dc8f1bc6c8d8" width="600"/>
13
+
14
+
15
+ [![Generic badge](https://img.shields.io/badge/GitHub-%20XTuner-black.svg)](https://github.com/InternLM/xtuner)
16
+
17
+
18
+ </div>
19
+
20
+ ## Model
21
+
22
+ llava-llama-3-8b-v1_1-hf is a LLaVA model fine-tuned from [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) and [CLIP-ViT-Large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336) with [ShareGPT4V-PT](https://huggingface.co/datasets/Lin-Chen/ShareGPT4V) and [InternVL-SFT](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat#prepare-training-datasets) by [XTuner](https://github.com/InternLM/xtuner).
23
+
24
+
25
+ ## Details
26
+
27
+ | Model | Visual Encoder | Projector | Resolution | Pretraining Strategy | Fine-tuning Strategy | Pretrain Dataset | Fine-tune Dataset |
28
+ | :-------------------- | ------------------: | --------: | ---------: | ---------------------: | ------------------------: | ------------------------: | -----------------------: |
29
+ | LLaVA-v1.5-7B | CLIP-L | MLP | 336 | Frozen LLM, Frozen ViT | Full LLM, Frozen ViT | LLaVA-PT (558K) | LLaVA-Mix (665K) |
30
+ | LLaVA-Llama-3-8B | CLIP-L | MLP | 336 | Frozen LLM, Frozen ViT | Full LLM, LoRA ViT | LLaVA-PT (558K) | LLaVA-Mix (665K) |
31
+ | LLaVA-Llama-3-8B-v1.1 | CLIP-L | MLP | 336 | Frozen LLM, Frozen ViT | Full LLM, LoRA ViT | ShareGPT4V-PT (1246K) | InternVL-SFT (1268K) |
32
+
33
+ ## Results
34
+
35
+ <div align="center">
36
+ <img src="https://github.com/InternLM/xtuner/assets/36994684/a157638c-3500-44ed-bfab-d8d8249f91bb" alt="Image" width=500" />
37
+ </div>
38
+
39
+ | Model | MMBench Test (EN) | MMBench Test (CN) | CCBench Dev | MMMU Val | SEED-IMG | AI2D Test | ScienceQA Test | HallusionBench aAcc | POPE | GQA | TextVQA | MME | MMStar |
40
+ | :-------------------- | :---------------: | :---------------: | :---------: | :-------: | :------: | :-------: | :------------: | :-----------------: | :--: | :--: | :-----: | :------: | :----: |
41
+ | LLaVA-v1.5-7B | 66.5 | 59.0 | 27.5 | 35.3 | 60.5 | 54.8 | 70.4 | 44.9 | 85.9 | 62.0 | 58.2 | 1511/348 | 30.3 |
42
+ | LLaVA-Llama-3-8B | 68.9 | 61.6 | 30.4 | 36.8 | 69.8 | 60.9 | 73.3 | 47.3 | 87.2 | 63.5 | 58.0 | 1506/295 | 38.2 |
43
+ | LLaVA-Llama-3-8B-v1.1 | 72.3 | 66.4 | 31.6 | 36.8 | 70.1 | 70.0 | 72.9 | 47.7 | 86.4 | 62.6 | 59.0 | 1469/349 | 45.1 |
44
+
45
+
46
+ ## QuickStart
47
+
48
+ ### Chat with lmdeploy
49
+
50
+ 1. Installation
51
+ ```
52
+ pip install 'lmdeploy>=0.4.0'
53
+ pip install git+https://github.com/haotian-liu/LLaVA.git
54
+ ```
55
+
56
+ 2. Run
57
+
58
+ ```python
59
+ from lmdeploy import pipeline, ChatTemplateConfig
60
+ from lmdeploy.vl import load_image
61
+ pipe = pipeline('xtuner/llava-llama-3-8b-v1_1-hf',
62
+ chat_template_config=ChatTemplateConfig(model_name='llama3'))
63
+
64
+ image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
65
+ response = pipe(('describe this image', image))
66
+ print(response)
67
+ ```
68
+
69
+ More details can be found on [inference](https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html) and [serving](https://lmdeploy.readthedocs.io/en/latest/serving/api_server_vl.html) docs.
70
+
71
+ ### Chat with CLI
72
+
73
+ See [here](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf/discussions/1)!
74
+
75
+
76
+ ## Citation
77
+
78
+ ```bibtex
79
+ @misc{2023xtuner,
80
+ title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
81
+ author={XTuner Contributors},
82
+ howpublished = {\url{https://github.com/InternLM/xtuner}},
83
+ year={2023}
84
+ }
85
+ ```