File size: 1,246 Bytes
89b6834
 
 
79bff4e
36b848d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79bff4e
 
 
 
 
 
 
 
 
 
 
 
a27a98e
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: mit
---

## How to convert

First, you need git clone [llama.cpp](https://github.com/ggerganov/llama.cpp) and make it.

Then follow the instrution to generate gguf files.

```
# convert Qwen HF models to gguf fp16 format
python convert-hf-to-gguf.py --outfile qwen7b-chat-f16.gguf --outtype f16 Qwen-7B-Chat

# quantize the model to 4-bits (using q4_0 method)
./quantize qwen7b-chat-f16.gguf qwen7b-chat-q4_0.gguf q4_0

# chat with Qwen models
./main -m qwen7b-chat-q4_0.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt
```


## Files are split and require joining

**Note:** HF does not support uploading files larger than 50GB but upload a 41GB file is too hard for me. Therefore I have uploaded the Q4_0 by splitting it of 5GB per file.
   
To join the files, do the following:

Linux and macOS:

```
cat qwen72b-chat-q4_0.gguf-split-* >qwen72b-chat-q4_0.gguf && rm qwen72b-chat-q4_0.gguf-split-*
```

Windows:

```
copy /B qwen72b-chat-q4_0.gguf-split-aa + qwen72b-chat-q4_0.gguf-split-ab + qwen72b-chat-q4_0.gguf-split-ac + qwen72b-chat-q4_0.gguf-split-ad + qwen72b-chat-q4_0.gguf-split-ae + qwen72b-chat-q4_0.gguf-split-af + qwen72b-chat-q4_0.gguf-split-ag + qwen72b-chat-q4_0.gguf-split-ah qwen72b-chat-q4_0.gguf
```