aisensiy
/

Qwen-72B-Chat-GGUF

Model card Files Files and versions Community

Qwen-72B-Chat-GGUF / README.md

aisensiy's picture

Update README.md

a27a98e 7 months ago

|

raw history blame contribute delete

No virus

1.25 kB

	---
	license: mit
	---

	## How to convert

	First, you need git clone [llama.cpp](https://github.com/ggerganov/llama.cpp) and make it.

	Then follow the instrution to generate gguf files.

	```
	# convert Qwen HF models to gguf fp16 format
	python convert-hf-to-gguf.py --outfile qwen7b-chat-f16.gguf --outtype f16 Qwen-7B-Chat

	# quantize the model to 4-bits (using q4_0 method)
	./quantize qwen7b-chat-f16.gguf qwen7b-chat-q4_0.gguf q4_0

	# chat with Qwen models
	./main -m qwen7b-chat-q4_0.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt
	```


	## Files are split and require joining

	Note: HF does not support uploading files larger than 50GB but upload a 41GB file is too hard for me. Therefore I have uploaded the Q4_0 by splitting it of 5GB per file.

	To join the files, do the following:

	Linux and macOS:

	```
	cat qwen72b-chat-q4_0.gguf-split-* >qwen72b-chat-q4_0.gguf && rm qwen72b-chat-q4_0.gguf-split-*
	```

	Windows:

	```
	copy /B qwen72b-chat-q4_0.gguf-split-aa + qwen72b-chat-q4_0.gguf-split-ab + qwen72b-chat-q4_0.gguf-split-ac + qwen72b-chat-q4_0.gguf-split-ad + qwen72b-chat-q4_0.gguf-split-ae + qwen72b-chat-q4_0.gguf-split-af + qwen72b-chat-q4_0.gguf-split-ag + qwen72b-chat-q4_0.gguf-split-ah qwen72b-chat-q4_0.gguf
	```