msr2000 commited on
Commit
896a24c
1 Parent(s): bf3609a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -2
README.md CHANGED
@@ -180,7 +180,7 @@ We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.c
180
  ### Inference with Huggingface's Transformers
181
  You can directly employ [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.
182
 
183
- ### Text Completion
184
  ```python
185
  import torch
186
  from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
@@ -201,7 +201,7 @@ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
201
  print(result)
202
  ```
203
 
204
- ### Chat Completion
205
  ```python
206
  import torch
207
  from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
@@ -248,6 +248,33 @@ Assistant: {assistant_message_1}<|end▁of▁sentence|>User: {user_message_2
248
  Assistant:
249
  ```
250
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
251
  ## 8. License
252
  This code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-V2 Base/Chat models is subject to [the Model License](LICENSE-MODEL). DeepSeek-V2 series (including Base and Chat) supports commercial use.
253
 
 
180
  ### Inference with Huggingface's Transformers
181
  You can directly employ [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.
182
 
183
+ #### Text Completion
184
  ```python
185
  import torch
186
  from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
 
201
  print(result)
202
  ```
203
 
204
+ #### Chat Completion
205
  ```python
206
  import torch
207
  from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
 
248
  Assistant:
249
  ```
250
 
251
+ ### Inference with vLLM (recommended)
252
+ To utilize [vLLM](https://github.com/vllm-project/vllm) for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650.
253
+
254
+ ```python
255
+ from transformers import AutoTokenizer
256
+ from vllm import LLM, SamplingParams
257
+
258
+ max_model_len, tp_size = 8192, 8
259
+ model_name = "deepseek-ai/DeepSeek-V2-Chat"
260
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
261
+ llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
262
+ sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])
263
+
264
+ messages_list = [
265
+ [{"role": "user", "content": "Who are you?"}],
266
+ [{"role": "user", "content": "Translate the following content into Chinese directly: DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference."}],
267
+ [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
268
+ ]
269
+
270
+ prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
271
+
272
+ outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
273
+
274
+ generated_text = [output.outputs[0].text for output in outputs]
275
+ print(generated_text)
276
+ ```
277
+
278
  ## 8. License
279
  This code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-V2 Base/Chat models is subject to [the Model License](LICENSE-MODEL). DeepSeek-V2 series (including Base and Chat) supports commercial use.
280