JustinLin610 commited on
Commit
616cc3f
1 Parent(s): d448a78

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -43,9 +43,11 @@ To run Qwen2, you can use `llama-cli` (the previous `main`) or `llama-server` (t
43
  We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
44
 
45
  ```bash
46
- ./llama-server -m qwen2-0.5b-instruct-q5_k_m.gguf
47
  ```
48
 
 
 
49
  Then it is easy to access the deployed service with OpenAI API:
50
 
51
  ```python
@@ -69,7 +71,11 @@ print(completion.choices[0].message.content)
69
  If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the ChatML template. Instead you should use `--in-prefix` and `--in-suffix` to tackle this problem.
70
 
71
  ```bash
72
- ./llama-cli -m qwen2-0.5b-instruct-q5_k_m.gguf -n 512 -co -i -if -f prompts/chat-with-qwen.txt --in-prefix "<|im_start|>user\n" --in-suffix "<|im_end|>\n<|im_start|>assistant\n"
 
 
 
 
73
  ```
74
 
75
  ## Citation
 
43
  We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
44
 
45
  ```bash
46
+ ./llama-server -m qwen2-0.5b-instruct-q5_k_m.gguf -ngl 24 -fa
47
  ```
48
 
49
+ (Note: `-ngl 24` refers to offloading 24 layers to GPUs, and `-fa` refers to the use of flash attention.)
50
+
51
  Then it is easy to access the deployed service with OpenAI API:
52
 
53
  ```python
 
71
  If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the ChatML template. Instead you should use `--in-prefix` and `--in-suffix` to tackle this problem.
72
 
73
  ```bash
74
+ ./llama-cli -m qwen2-0.5b-instruct-q5_k_m.gguf \
75
+ -n 512 -co -i -if -f prompts/chat-with-qwen.txt \
76
+ --in-prefix "<|im_start|>user\n" \
77
+ --in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
78
+ -ngl 24 -fa
79
  ```
80
 
81
  ## Citation