kunato commited on
Commit
eb068cf
1 Parent(s): 68c8cea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -94,8 +94,9 @@ print(tokenizer.decode(response, skip_special_tokens=True))
94
  ## Inference Server Hosting Example
95
  ```bash
96
  pip install vllm
97
- vllm serve scb10x/llama3.1-typhoon2-70b-instruct --tensor-parallel-size 2
98
- # using at least 2 80GB gpu for hosting 70b model
 
99
  # see more information at https://docs.vllm.ai/
100
  ```
101
 
 
94
  ## Inference Server Hosting Example
95
  ```bash
96
  pip install vllm
97
+ vllm serve scb10x/llama3.1-typhoon2-70b-instruct --tensor-parallel-size 2 --gpu-memory-utilization 0.95 --max-model-len 16384
98
+ # using at least 2 80GB gpu eg A100, H100 for hosting 70b model
99
+ # to serving longer context, 4-8 gpu is required
100
  # see more information at https://docs.vllm.ai/
101
  ```
102