How can I run 8bit models?
I want to run 8bit files using the LlamaCpp-Python library. Should I upload 2 files at the same time? Can you share sample code?
Since HuggingFace supports maximum size of single file is 50GB, we have to upload separate weights.
You should download both files AkaLlama-llama3-70b-v0.1.Q8_0.00001-of-00002.gguf and AkaLlama-llama3-70b-v0.1.Q8_0.00001-of-00002.gguf on the same directory.
And, concatenate these two files using following command,
Linux:
cat AkaLlama-llama3-70b-v0.1.Q8_0.-of-00002.gguf > AkaLlama-llama3-70b-v0.1.Q8_0.gguf && rm AkaLlama-llama3-70b-v0.1.Q8_0.-of-00002.gguf
Windows:
COPY /B AkaLlama-llama3-70b-v0.1.Q8_0.00001-of-00002.gguf + AkaLlama-llama3-70b-v0.1.Q8_0.00002-of-00002.gguf AkaLlama-llama3-70b-v0.1.Q8_0.gguf
del AkaLlama-llama3-70b-v0.1.Q8_0.00001-of-00002.gguf AkaLlama-llama3-70b-v0.1.Q8_0.00002-of-00002.gguf
Now, you can get one GGUF weight and run it via llama.cpp.python