littlebird13 commited on
Commit
c62434d
·
verified ·
1 Parent(s): c575992

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -34,14 +34,14 @@ In the following demonstration, we assume that you are running commands under th
34
  ## How to use
35
  Cloning the repo may be inefficient, and thus you can manually download the GGUF file that you need or use `huggingface-cli` (`pip install huggingface_hub`) as shown below:
36
  ```shell
37
- huggingface-cli download Qwen/Qwen2-1.5B-Instruct-GGUF qwen2-1.5b-instruct-q5_k_m.gguf --local-dir . --local-dir-use-symlinks False
38
  ```
39
 
40
  To run Qwen2, you can use `llama-cli` (the previous `main`) or `llama-server` (the previous `server`).
41
  We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
42
 
43
  ```bash
44
- ./llama-server -m qwen2-1.5b-instruct-q5_k_m.gguf -ngl 28 -fa
45
  ```
46
 
47
  (Note: `-ngl 28` refers to offloading 28 layers to GPUs, and `-fa` refers to the use of flash attention.)
@@ -69,7 +69,7 @@ print(completion.choices[0].message.content)
69
  If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the ChatML template. Instead you should use `--in-prefix` and `--in-suffix` to tackle this problem.
70
 
71
  ```bash
72
- ./llama-cli -m qwen2-1.5b-instruct-q5_k_m.gguf \
73
  -n 512 -co -i -if -f prompts/chat-with-qwen.txt \
74
  --in-prefix "<|im_start|>user\n" \
75
  --in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
 
34
  ## How to use
35
  Cloning the repo may be inefficient, and thus you can manually download the GGUF file that you need or use `huggingface-cli` (`pip install huggingface_hub`) as shown below:
36
  ```shell
37
+ huggingface-cli download Qwen/Qwen2-1.5B-Instruct-GGUF qwen2-1_5b-instruct-q5_k_m.gguf --local-dir . --local-dir-use-symlinks False
38
  ```
39
 
40
  To run Qwen2, you can use `llama-cli` (the previous `main`) or `llama-server` (the previous `server`).
41
  We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
42
 
43
  ```bash
44
+ ./llama-server -m qwen2-1_5b-instruct-q5_k_m.gguf -ngl 28 -fa
45
  ```
46
 
47
  (Note: `-ngl 28` refers to offloading 28 layers to GPUs, and `-fa` refers to the use of flash attention.)
 
69
  If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the ChatML template. Instead you should use `--in-prefix` and `--in-suffix` to tackle this problem.
70
 
71
  ```bash
72
+ ./llama-cli -m qwen2-1_5b-instruct-q5_k_m.gguf \
73
  -n 512 -co -i -if -f prompts/chat-with-qwen.txt \
74
  --in-prefix "<|im_start|>user\n" \
75
  --in-suffix "<|im_end|>\n<|im_start|>assistant\n" \