qwp4w3hyb commited on
Commit
ba75806
1 Parent(s): cb276aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -3
README.md CHANGED
@@ -33,10 +33,13 @@ python3 ./path-to-llama.cpp/gguf-py/scripts/gguf-set-metadata.py $file tokenizer
33
 
34
 
35
  Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [0d56246f4b9764158525d894b96606f6163c53a8](https://github.com/ggerganov/llama.cpp/commit/0d56246f4b9764158525d894b96606f6163c53a8) (master from 2024-04-18)
36
- with tokenizer fixes from [this](https://github.com/ggerganov/llama.cpp/pull/6745) branch cherry-picked
37
- Imatrix dataset was used from [here](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
38
 
39
- Using this command to generate the importance matrix from the f16.gguf
 
 
 
40
 
41
  ```
42
  ./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat
 
33
 
34
 
35
  Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [0d56246f4b9764158525d894b96606f6163c53a8](https://github.com/ggerganov/llama.cpp/commit/0d56246f4b9764158525d894b96606f6163c53a8) (master from 2024-04-18)
36
+
37
+ I cherry-picked tokenizer fixes from [this](https://github.com/ggerganov/llama.cpp/pull/6745) branch to get it to work.
38
 
39
+ The quants use an importance matrix to improve quantization loss.
40
+
41
+ Using this command to generate the importance matrix from the f16.gguf with [this](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
42
+ dataset.
43
 
44
  ```
45
  ./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat