scb10x
/

llama3.1-typhoon2-70b-instruct

Text Generation

Model card Files Files and versions Community

kunato commited on 10 days ago

Commit

eb068cf

•

1 Parent(s): 68c8cea

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -94,8 +94,9 @@ print(tokenizer.decode(response, skip_special_tokens=True))
 ## Inference Server Hosting Example
 ```bash
 pip install vllm
-vllm serve scb10x/llama3.1-typhoon2-70b-instruct --tensor-parallel-size 2
-# using at least 2 80GB gpu for hosting 70b model
 # see more information at https://docs.vllm.ai/
 ```

 ## Inference Server Hosting Example
 ```bash
 pip install vllm
+vllm serve scb10x/llama3.1-typhoon2-70b-instruct --tensor-parallel-size 2 --gpu-memory-utilization 0.95 --max-model-len 16384
+# using at least 2 80GB gpu eg A100, H100 for hosting 70b model
+# to serving longer context, 4-8 gpu is required
 # see more information at https://docs.vllm.ai/
 ```