Qwen
/

Qwen2-1.5B-Instruct-GGUF

Text Generation

Model card Files Files and versions

JustinLin610 commited on Jun 17, 2024

Commit

2e87cc4

·

verified ·

1 Parent(s): e37a612

Update README.md

Files changed (1) hide show

README.md +9 -3

README.md CHANGED Viewed

@@ -41,9 +41,11 @@ To run Qwen2, you can use `llama-cli` (the previous `main`) or `llama-server` (t
 We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
 ```bash
-./llama-server -m qwen2-1.5b-instruct-q5_k_m.gguf
 ```
 Then it is easy to access the deployed service with OpenAI API:
 ```python
@@ -67,7 +69,11 @@ print(completion.choices[0].message.content)
 If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the ChatML template. Instead you should use `--in-prefix` and `--in-suffix` to tackle this problem.
 ```bash
-./llama-cli -m qwen2-1.5b-instruct-q5_k_m.gguf -n 512 -co -i -if -f prompts/chat-with-qwen.txt --in-prefix "<|im_start|>user\n" --in-suffix "<|im_end|>\n<|im_start|>assistant\n"
 ```
 Evaluation
@@ -80,7 +86,7 @@ In the following we report the PPL of GGUF models of different sizes and differe
 |0.5B    | 15.11   | 15.13   | 15.14   | 15.24   | 15.40   | 15.36   | 16.28   | 15.70   | 16.74   | -       |
 |1.5B    | 10.43   | 10.43   | 10.45   | 10.50   | 10.56   | 10.61   | 10.79   | 11.08   | 13.04   | -       |
 |7B      | 7.93    | 7.94    | 7.96    | 7.97    | 7.98    | 8.02    | 8.19    | 8.20    | 10.58   | -       |
-|57B-A14B| 6.82    | 6.81    | 6.82    | 6.83    | 6.90    | 6.99    | 7.02    | 7.43    | -       | -       |
 |72B     | 5.58    | 5.58    | 5.59    | 5.59    | 5.60    | 5.61    | 5.66    | 5.68    | 5.91    | 6.75    |

 We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
 ```bash
+./llama-server -m qwen2-1.5b-instruct-q5_k_m.gguf -ngl 28 -fa
 ```
+(Note: `-ngl 28` refers to offloading 28 layers to GPUs, and `-fa` refers to the use of flash attention.)
 Then it is easy to access the deployed service with OpenAI API:
 ```python
 If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the ChatML template. Instead you should use `--in-prefix` and `--in-suffix` to tackle this problem.
 ```bash
+./llama-cli -m qwen2-1.5b-instruct-q5_k_m.gguf \
+  -n 512 -co -i -if -f prompts/chat-with-qwen.txt \
+  --in-prefix "<|im_start|>user\n" \
+  --in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
+  -ngl 28 -fa
 ```
 Evaluation
 |0.5B    | 15.11   | 15.13   | 15.14   | 15.24   | 15.40   | 15.36   | 16.28   | 15.70   | 16.74   | -       |
 |1.5B    | 10.43   | 10.43   | 10.45   | 10.50   | 10.56   | 10.61   | 10.79   | 11.08   | 13.04   | -       |
 |7B      | 7.93    | 7.94    | 7.96    | 7.97    | 7.98    | 8.02    | 8.19    | 8.20    | 10.58   | -       |
+|57B-A14B| 6.81    | 6.81    | 6.83    | 6.84    | 6.89    | 6.99    | 7.02    | 7.43    | -       | -       |
 |72B     | 5.58    | 5.58    | 5.59    | 5.59    | 5.60    | 5.61    | 5.66    | 5.68    | 5.91    | 6.75    |