zamal commited on
Commit
1b47310
β€’
1 Parent(s): da3bbe8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -217
README.md CHANGED
@@ -1,227 +1,35 @@
1
- <!-- markdownlint-disable first-line-h1 -->
2
- <!-- markdownlint-disable html -->
3
- <!-- markdownlint-disable no-duplicate-header -->
4
 
5
- <div align="center">
6
- <img src="images/logo.svg" width="60%" alt="DeepSeek LLM" />
7
- </div>
8
- <hr>
9
- <div align="center">
10
 
11
- <a href="https://www.deepseek.com/" target="_blank">
12
- <img alt="Homepage" src="images/badge.svg" />
13
- </a>
14
- <a href="https://huggingface.co/spaces/deepseek-ai/DeepSeek-VL-7B" target="_blank">
15
- <img alt="Chat" src="https://img.shields.io/badge/πŸ€–%20Chat-DeepSeek%20VL-536af5?color=536af5&logoColor=white" />
16
- </a>
17
- <a href="https://huggingface.co/deepseek-ai" target="_blank">
18
- <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" />
19
- </a>
20
 
21
- </div>
22
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
- <div align="center">
25
 
26
- <a href="https://discord.gg/Tc7c45Zzu5" target="_blank">
27
- <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" />
28
- </a>
29
- <a href="images/qr.jpeg" target="_blank">
30
- <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" />
31
- </a>
32
- <a href="https://twitter.com/deepseek_ai" target="_blank">
33
- <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" />
34
- </a>
35
 
36
- </div>
 
37
 
38
- <div align="center">
39
 
40
- <a href="LICENSE-CODE">
41
- <img alt="Code License" src="https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53">
42
- </a>
43
- <a href="LICENSE-MODEL">
44
- <img alt="Model License" src="https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53">
45
- </a>
46
- </div>
47
 
48
-
49
- <p align="center">
50
- <a href="#3-model-downloads"><b>πŸ“₯ Model Download</b></a> |
51
- <a href="#4-quick-start"><b>⚑ Quick Start</b></a> |
52
- <a href="#5-license"><b>πŸ“œ License</b></a> |
53
- <a href="#6-citation"><b>πŸ“– Citation</b></a> <br>
54
- <a href="https://arxiv.org/abs/2403.05525"><b>πŸ“„ Paper Link</b></a> |
55
- <a href="https://huggingface.co/papers/2403.05525"><b>πŸ€— Huggingface Paper Link</b></a> |
56
- <a href="https://huggingface.co/spaces/deepseek-ai/DeepSeek-VL-7B"><b>πŸ‘οΈ Demo</b></a>
57
- </p>
58
-
59
-
60
- ## 1. Introduction
61
-
62
- Introducing DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.
63
-
64
- [DeepSeek-VL: Towards Real-World Vision-Language Understanding](https://arxiv.org/abs/2403.05525)
65
-
66
- Haoyu Lu*, Wen Liu*, Bo Zhang**, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan (*Equal Contribution, **Project Lead)
67
-
68
- ![](https://github.com/deepseek-ai/DeepSeek-VL/blob/main/images/sample.jpg)
69
-
70
- ## 2. Release
71
-
72
- <details>
73
- <summary>βœ… <b>2024-03-14</b>: Demo for DeepSeek-VL-7B available on <a href="https://huggingface.co/spaces/deepseek-ai/DeepSeek-VL-7B">Hugging Face</a>.</summary>
74
- <br>Check out the gradio demo of DeepSeek-VL-7B at <a href="https://huggingface.co/spaces/deepseek-ai/DeepSeek-VL-7B">https://huggingface.co/spaces/deepseek-ai/DeepSeek-VL-7B</a>. Experience its capabilities firsthand!
75
- </details>
76
-
77
-
78
- <details>
79
- <summary>βœ… <b>2024-03-13</b>: Support DeepSeek-VL gradio demo.
80
-
81
- </details>
82
-
83
- <details>
84
- <summary>βœ… <b>2024-03-11</b>: DeepSeek-VL family released, including <code>DeepSeek-VL-7B-base</code>, <code>DeepSeek-VL-7B-chat</code>, <code>DeepSeek-VL-1.3B-base</code>, and <code>DeepSeek-VL-1.3B-chat</code>.</summary>
85
- <br>The release includes a diverse set of models tailored for various applications within the DeepSeek-VL family. The models come in two sizes: 7B and 1.3B parameters, each offering base and chat variants to cater to different needs and integration scenarios.
86
-
87
- </details>
88
-
89
- ## 3. Model Downloads
90
-
91
- We release the DeepSeek-VL family, including 1.3B-base, 1.3B-chat, 7b-base and 7b-chat models, to the public.
92
- To support a broader and more diverse range of research within both academic and commercial communities.
93
- Please note that the use of this model is subject to the terms outlined in [License section](#5-license). Commercial usage is
94
- permitted under these terms.
95
-
96
- ### Huggingface
97
-
98
- | Model | Sequence Length | Download |
99
- |-----------------------|-----------------|-----------------------------------------------------------------------------|
100
- | DeepSeek-VL-1.3B-base | 4096 | [πŸ€— Hugging Face](https://huggingface.co/deepseek-ai/deepseek-vl-1.3b-base) |
101
- | DeepSeek-VL-1.3B-chat | 4096 | [πŸ€— Hugging Face](https://huggingface.co/deepseek-ai/deepseek-vl-1.3b-chat) |
102
- | DeepSeek-VL-7B-base | 4096 | [πŸ€— Hugging Face](https://huggingface.co/deepseek-ai/deepseek-vl-7b-base) |
103
- | DeepSeek-VL-7B-chat | 4096 | [πŸ€— Hugging Face](https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat) |
104
-
105
-
106
-
107
- ## 4. Quick Start
108
-
109
- ### Installation
110
-
111
- On the basis of `Python >= 3.8` environment, install the necessary dependencies by running the following command:
112
-
113
- ```shell
114
- pip install -e .
115
- ```
116
-
117
- ### Simple Inference Example
118
-
119
- ```python
120
- import torch
121
- from transformers import AutoModelForCausalLM
122
-
123
- from deepseek_vl.models import VLChatProcessor, MultiModalityCausalLM
124
- from deepseek_vl.utils.io import load_pil_images
125
-
126
-
127
- # specify the path to the model
128
- model_path = "deepseek-ai/deepseek-vl-7b-chat"
129
- vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)
130
- tokenizer = vl_chat_processor.tokenizer
131
-
132
- vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
133
- vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
134
-
135
- ## single image conversation example
136
- conversation = [
137
- {
138
- "role": "User",
139
- "content": "<image_placeholder>Describe each stage of this image.",
140
- "images": ["./images/training_pipelines.jpg"],
141
- },
142
- {"role": "Assistant", "content": ""},
143
- ]
144
-
145
- ## multiple images (or in-context learning) conversation example
146
- # conversation = [
147
- # {
148
- # "role": "User",
149
- # "content": "<image_placeholder>A dog wearing nothing in the foreground, "
150
- # "<image_placeholder>a dog wearing a santa hat, "
151
- # "<image_placeholder>a dog wearing a wizard outfit, and "
152
- # "<image_placeholder>what's the dog wearing?",
153
- # "images": [
154
- # "images/dog_a.png",
155
- # "images/dog_b.png",
156
- # "images/dog_c.png",
157
- # "images/dog_d.png",
158
- # ],
159
- # },
160
- # {"role": "Assistant", "content": ""}
161
- # ]
162
-
163
- # load images and prepare for inputs
164
- pil_images = load_pil_images(conversation)
165
- prepare_inputs = vl_chat_processor(
166
- conversations=conversation,
167
- images=pil_images,
168
- force_batchify=True
169
- ).to(vl_gpt.device)
170
-
171
- # run image encoder to get the image embeddings
172
- inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)
173
-
174
- # run the model to get the response
175
- outputs = vl_gpt.language_model.generate(
176
- inputs_embeds=inputs_embeds,
177
- attention_mask=prepare_inputs.attention_mask,
178
- pad_token_id=tokenizer.eos_token_id,
179
- bos_token_id=tokenizer.bos_token_id,
180
- eos_token_id=tokenizer.eos_token_id,
181
- max_new_tokens=512,
182
- do_sample=False,
183
- use_cache=True
184
- )
185
-
186
- answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
187
- print(f"{prepare_inputs['sft_format'][0]}", answer)
188
- ```
189
-
190
- ### CLI Chat
191
- ```bash
192
- python cli_chat.py --model_path "deepseek-ai/deepseek-vl-7b-chat"
193
-
194
- # or local path
195
- python cli_chat.py --model_path "local model path"
196
- ```
197
-
198
- ### Gradio Demo
199
- ```bash
200
- pip install -e .[gradio]
201
-
202
- python deepseek_vl/serve/app_deepseek.py
203
- ```
204
- ![](./images/gradio_demo.png)
205
-
206
- Have Fun!
207
-
208
- ## 5. License
209
-
210
- This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of DeepSeek-VL Base/Chat models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL). DeepSeek-VL series (including Base and Chat) supports commercial use.
211
-
212
- ## 6. Citation
213
-
214
- ```
215
- @misc{lu2024deepseekvl,
216
- title={DeepSeek-VL: Towards Real-World Vision-Language Understanding},
217
- author={Haoyu Lu and Wen Liu and Bo Zhang and Bingxuan Wang and Kai Dong and Bo Liu and Jingxiang Sun and Tongzheng Ren and Zhuoshu Li and Hao Yang and Yaofeng Sun and Chengqi Deng and Hanwei Xu and Zhenda Xie and Chong Ruan},
218
- year={2024},
219
- eprint={2403.05525},
220
- archivePrefix={arXiv},
221
- primaryClass={cs.AI}
222
- }
223
- ```
224
-
225
- ## 7. Contact
226
-
227
- If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).
 
1
+ # Deepseek-VL-1.3b-chat-4bit
 
 
2
 
3
+ ![Deepseek Logo](https://cdn.deepseek.com/logo.png)
 
 
 
 
4
 
5
+ ## Overview
 
 
 
 
 
 
 
 
6
 
7
+ **Deepseek-VL-1.3b-chat-4bit** is a state-of-the-art multimodal model that combines visual and linguistic processing capabilities. It has been optimized for efficient performance by quantizing the model to 4 bits, significantly reducing its size while maintaining high performance.
8
 
9
+ ### Model Details
10
+ - **Model Type**: Multimodal Causal Language Model
11
+ - **Base Model Size**: 1.3 billion parameters
12
+ - **Quantized Size**: Approximately **1.72 GB** (from the original size)
13
+ - **Files Included**:
14
+ - `config.json`: Model configuration file.
15
+ - `model.safetensors`: The quantized model weights.
16
+ - `preprocessor_config.json`: Configuration for the preprocessor.
17
+ - `processor_config.json`: Configuration for the processor.
18
+ - `special_tokens_map.json`: Mapping for special tokens used in the tokenizer.
19
+ - `tokenizer.json`: Tokenizer configuration.
20
+ - `tokenizer_config.json`: Additional tokenizer settings.
21
 
22
+ ## Quantization
23
 
24
+ Quantization is a technique used to reduce the model size and improve inference speed by using lower precision arithmetic. In this case, the model was quantized to 4 bits, which means it utilizes 4 bits to represent each weight instead of the typical 16 or 32 bits. This results in:
 
 
 
 
 
 
 
 
25
 
26
+ - **Size Reduction**: The model size has been reduced from several gigabytes to approximately 1.72 GB.
27
+ - **Performance**: The quantized model maintains a high level of accuracy and efficiency, making it suitable for deployment in environments with limited resources.
28
 
29
+ ## Installation
30
 
31
+ To use the **Deepseek-VL-1.3b-chat-4bit** model, follow these steps:
 
 
 
 
 
 
32
 
33
+ 1. **Install the Required Libraries**:
34
+ ```bash
35
+ pip install transformers huggingface-hub