base_model:
- meta-llama/Meta-Llama-3.1-405B-Instruct
π CPU optimized quantizations of Meta-Llama-3.1-405B-Instruct π₯οΈ
This repository contains CPU-optimized GGUF quantizations of the Meta-Llama-3.1-405B-Instruct model. These quantizations are designed to run efficiently on CPU hardware while maintaining good performance.
Available Quantizations
- Q4_0_48 (CPU Optimized): ~264 GB
- BF16: ~855 GB
- Q8_0: ~435 GB x. more coming...
Use Aria2 for parallelized downloads, links will download 9x faster
π§ On Linux
sudo apt install -y aria2
π On Mac
brew install aria2
Feel free to paste these all in at once or one at a time
Q4_0_48 (CPU Optimized) Version
aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00001-of-00006.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-inst-cpu-optimized-q4048-00001-of-00006.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00002-of-00006.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-inst-cpu-optimized-q4048-00002-of-00006.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00003-of-00006.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-inst-cpu-optimized-q4048-00003-of-00006.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00004-of-00006.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-inst-cpu-optimized-q4048-00004-of-00006.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00005-of-00006.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-inst-cpu-optimized-q4048-00005-of-00006.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00006-of-00006.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-inst-cpu-optimized-q4048-00006-of-00006.gguf
BF16 Version
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00001-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00001-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00002-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00002-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00003-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00003-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00004-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00004-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00005-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00005-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00006-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00006-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00007-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00007-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00008-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00008-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00009-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00009-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00010-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00010-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00011-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00011-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00012-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00012-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00013-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00013-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00014-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00014-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00015-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00015-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00016-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00016-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00017-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00017-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00018-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00018-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00019-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00019-of-00019.gguf
Q8_0 Version
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00001-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00001-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00002-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00002-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00003-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00003-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00004-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00004-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00005-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00005-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00006-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00006-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00007-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00007-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00008-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00008-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00009-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00009-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00010-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00010-of-00010.gguf
Usage
After downloading, you can use these models with libraries like llama.cpp
. Here's a basic example:
./llama-cli -t 32 --temp 0.4 -fa -m ~/meow/meta-405b-inst-cpu-optimized-q4048-00001-of-00006.gguf -b 512 -c 9000 -p "Adopt the persona of a NASA JPL mathmatician and firendly programmer that doesnt talk much and answers questions fast and on a first principles basis." -cnv -co -i -ctk"
Model Information
This model is based on the Meta-Llama-3.1-405B-Instruct model. It's an instruction-tuned version of the 405B parameter Llama 3.1 model, designed for assistant-like chat and various natural language generation tasks.
Key features:
- 405 billion parameters
- Supports 8 languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
- 128k context length
- Uses Grouped-Query Attention (GQA) for improved inference scalability
For more detailed information about the base model, please refer to the original model card.
License
The use of this model is subject to the Llama 3.1 Community License. Please ensure you comply with the license terms when using this model.
Acknowledgements
Special thanks to the Meta AI team for creating and releasing the Llama 3.1 model series.