gobean
/

Mixtral-8x7B-Instruct-v0.1.llamafile

Model card Files Files and versions Community

gobean commited on Apr 3

Commit

477ec28

•

1 Parent(s): 73a5ec1

Update README.md

Files changed (1) hide show

README.md +23 -0

README.md CHANGED Viewed

@@ -1,3 +1,26 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+This is a llamafile for [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
+I'm adding both q4-k-m and q5-k-m this time since it's a big model. On my 4090, q4-k-m is twice as face a q5-k-m without noticeable difference in chat or information quality. The speed of q5-k-m on my desktop computer is unusable, q4-k-m recommended.
+The quantized gguf was downloaded straight from [TheBloke](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF) this time,
+and then zipped into a llamafile using [Mozilla's awesome project](https://github.com/Mozilla-Ocho/llamafile).
+It's over 4gb so if you want to use it on Windows you'll have to run it from WSL.
+WSL note: If you get the error about APE, and the recommended command
+`sudo sh -c 'echo -1 > /proc/sys/fs/binfmt_misc/WSLInterop'`
+doesn't work, the file might be named something else so I had success with
+`sudo sh -c 'echo -1 > /proc/sys/fs/binfmt_misc/WSLInterop-late'`
+If that fails too, just navigate to  `/proc/sys/fs/binfmt_msc`  and see what files look like  `WSLInterop`  and echo a -1 to whatever it's called by changing that part of the recommended command.
+Llamafiles are a standalone executable that run an LLM server locally on a variety of operating systems.
+You just run it, open the chat interface in a browser, and interact.
+Options can be passed in to expose the api etc.