JosephusCheung commited on
Commit
6e8fcba
1 Parent(s): 9e3c9ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -8,10 +8,12 @@ tags:
8
  ---
9
  # A Chat Model, Testing only, no performance guaranteeeee...
10
 
11
- *There is something wrong with llama.cpp GGUF format, need some time to fix that.*
12
 
13
  Use the transformers library that does not require remote/external code to load the model, AutoModelForCausalLM and AutoTokenizer (or manually specify LlamaForCausalLM to load LM, GPT2Tokenizer to load Tokenizer), and model quantization should be fully compatible with GGUF (llama.cpp), GPTQ, and AWQ.
14
 
 
 
15
  *Do not use wikitext for recalibration.*
16
 
17
  Initialized from Qwen 72B
 
8
  ---
9
  # A Chat Model, Testing only, no performance guaranteeeee...
10
 
11
+ *There is something wrong with llama.cpp GGUF format, need some time to fix that. [https://github.com/ggerganov/llama.cpp/pull/4283](https://github.com/ggerganov/llama.cpp/pull/4283)*
12
 
13
  Use the transformers library that does not require remote/external code to load the model, AutoModelForCausalLM and AutoTokenizer (or manually specify LlamaForCausalLM to load LM, GPT2Tokenizer to load Tokenizer), and model quantization should be fully compatible with GGUF (llama.cpp), GPTQ, and AWQ.
14
 
15
+ However GGUF Quantized model is not possible now for Qwen-72B series, see [https://github.com/ggerganov/llama.cpp/pull/4281](https://github.com/ggerganov/llama.cpp/pull/4281)
16
+
17
  *Do not use wikitext for recalibration.*
18
 
19
  Initialized from Qwen 72B