qwp4w3hyb
/

Meta-Llama-3-70B-Instruct-iMat-GGUF

Text Generation

importance matrix

Inference Endpoints

Model card Files Files and versions Community

qwp4w3hyb commited on Apr 22

Commit

aa1e9c8

•

1 Parent(s): 7ee0387

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -19,6 +19,11 @@ license_link: LICENSE
 # Quant Infos
 Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit with tokenizer fixes from [this](https://github.com/ggerganov/llama.cpp/pull/6745) branch cherry-picked [0d56246f4b9764158525d894b96606f6163c53a8](https://github.com/ggerganov/llama.cpp/commit/0d56246f4b9764158525d894b96606f6163c53a8) (master from 2024-04-18)
 Imatrix dataset was used from [here](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)

 # Quant Infos
+- quants done with an importance matrix for improved quantization loss
+- K & IQ quants in basically all variants
+- fixed end token for instruct mode (<|eot_id|>[128009])
+- files larger than 50GB were split using the gguf-split utility, just download all parts and point llama.cpp to the first one (00001-of-x)
 Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit with tokenizer fixes from [this](https://github.com/ggerganov/llama.cpp/pull/6745) branch cherry-picked [0d56246f4b9764158525d894b96606f6163c53a8](https://github.com/ggerganov/llama.cpp/commit/0d56246f4b9764158525d894b96606f6163c53a8) (master from 2024-04-18)
 Imatrix dataset was used from [here](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)