nisten
/

mixtral8x22-imatrix-gguf

Inference Endpoints

Model card Files Files and versions Community

nisten commited on Apr 11, 2024

Commit

15c8764

·

verified ·

1 Parent(s): b9c9d0b

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -9,11 +9,11 @@ the imatrix.dat file was calcuated over 1000 chunks with wikitext.train.raw( inc
 Wrote a bit of custom c++ to avoid quantizing certain layers, tested fully compatible with llama.cpp as of 10April2024.
-To put the 8bit file back together do
 ```
-cat ~mix4ns.gguf.part* > ~mix4ns.gguf && rm -rf mix4ns.gguf.part*
-cat ~mix8ns.gguf.part* > ~mix8ns.gguf && rm -rf mix4ns.gguf.part*
 ```
 careful this can take 5 minutes or up to 10-15 on slow instances, check progress with ls -la

 Wrote a bit of custom c++ to avoid quantizing certain layers, tested fully compatible with llama.cpp as of 10April2024.
+To put it all asa single file ( this is not needed with llama.cpp as it will autodetect the chunks but can help troubleshooting ollama)
 ```
+cat mix4ns-0000* > mix4ns.gguf
 ```
 careful this can take 5 minutes or up to 10-15 on slow instances, check progress with ls -la