JustinLin610 commited on
Commit
ea53161
1 Parent(s): 8629f2a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -61,9 +61,11 @@ To run Qwen2, you can use `llama-cli` (the previous `main`) or `llama-server` (t
61
  We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
62
 
63
  ```bash
64
- ./llama-server -m qwen2-72b-instruct-q4_0.gguf
65
  ```
66
 
 
 
67
  Then it is easy to access the deployed service with OpenAI API:
68
 
69
  ```python
@@ -91,7 +93,7 @@ If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the
91
  -n 512 -co -i -if -f prompts/chat-with-qwen.txt \
92
  --in-prefix "<|im_start|>user\n" \
93
  --in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
94
- -ngl 28 -fa
95
  ```
96
 
97
  ## Evaluation
 
61
  We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
62
 
63
  ```bash
64
+ ./llama-server -m qwen2-72b-instruct-q4_0.gguf -ngl 80 -fa
65
  ```
66
 
67
+ (Note: `-ngl 80` refers to offloading 80 layers to GPUs, and `-fa` refers to the use of flash attention.)
68
+
69
  Then it is easy to access the deployed service with OpenAI API:
70
 
71
  ```python
 
93
  -n 512 -co -i -if -f prompts/chat-with-qwen.txt \
94
  --in-prefix "<|im_start|>user\n" \
95
  --in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
96
+ -ngl 80 -fa
97
  ```
98
 
99
  ## Evaluation