mirlab/AkaLlama-llama3-70b-v0.1-GGUF · How can I run 8bit models?

Since HuggingFace supports maximum size of single file is 50GB, we have to upload separate weights.

You should download both files AkaLlama-llama3-70b-v0.1.Q8_0.00001-of-00002.gguf and AkaLlama-llama3-70b-v0.1.Q8_0.00001-of-00002.gguf on the same directory.

And, concatenate these two files using following command,

Linux:
cat AkaLlama-llama3-70b-v0.1.Q8_0.-of-00002.gguf > AkaLlama-llama3-70b-v0.1.Q8_0.gguf && rm AkaLlama-llama3-70b-v0.1.Q8_0.-of-00002.gguf

Windows:
COPY /B AkaLlama-llama3-70b-v0.1.Q8_0.00001-of-00002.gguf + AkaLlama-llama3-70b-v0.1.Q8_0.00002-of-00002.gguf AkaLlama-llama3-70b-v0.1.Q8_0.gguf
del AkaLlama-llama3-70b-v0.1.Q8_0.00001-of-00002.gguf AkaLlama-llama3-70b-v0.1.Q8_0.00002-of-00002.gguf

Now, you can get one GGUF weight and run it via llama.cpp.python