π Custom quantizations of the base Meta-Llama-3.1-405B π₯οΈ
π§ On Linux
sudo apt install -y aria2
π On Mac
brew install aria2
Feel free to paste these all in at once or one at a time
For faster downloads copy paste each one separetely
Then copy paste this to your terminal to downlaod fastest on either mac or linux.
q3q8 custom quant optimized for M2 Ultra 192Gb
aria2c -x 16 -s 16 -k 1M -o meta-405b-base-q3q8-00001-of-00004.gguf https://huggingface.co/nisten/meta-405b-base-gguf/resolve/main/meta-405b-base-q3q8-00001-of-00004.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-base-q3q8-00002-of-00004.gguf https://huggingface.co/nisten/meta-405b-base-gguf/resolve/main/meta-405b-base-q3q8-00002-of-00004.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-base-q3q8-00003-of-00004.gguf https://huggingface.co/nisten/meta-405b-base-gguf/resolve/main/meta-405b-base-q3q8-00003-of-00004.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-base-q3q8-00004-of-00004.gguf https://huggingface.co/nisten/meta-405b-base-gguf/resolve/main/meta-405b-base-q3q8-00004-of-00004.gguf
Perplexity benchmarks (WORK IN PROGRESS, THIS IS JUST A DUMP)
llama 405b - instruct - old (pre-update) BF16
perplexity: 2197.87 seconds per pass - ETA 1 hours 49.88 min
[1]2.1037,[2]2.4201,[3]2.0992,[4]1.8446,[5]1.6823,[6]1.5948,[7]1.5575,[8]1.5121,[9]1.4750,[10]1.4570,[11]1.4567,[12]1.4666,
Final estimate: PPL = 1.4666 +/- 0.03184
Hermes 405b-Q8_0
perplexity: 716.47 seconds per pass - ETA 35.82 min
[1]1.5152,[2]1.8253,[3]1.6906,[4]1.5438,[5]1.4252,[6]1.3592,[7]1.3464,[8]1.3212,[9]1.2882,[10]1.2663,[11]1.2626,[12]1.2698,
Final estimate: PPL = 1.2698 +/- 0.02620
Hermes 405b-BF16
perplexity: 592.52 seconds per pass - ETA 1 hours 58.50 min
[1]1.5147,[2]1.8220,[3]1.6890,[4]1.5437,[5]1.4250,[6]1.3588,[7]1.3458,[8]1.3216,[9]1.2887,[10]1.2667,[11]1.2630,[12]1.2693,
Final estimate: PPL = 1.2693 +/- 0.02605
meta-405b-base-q8
perplexity: 167.37 seconds per pass - ETA 33.47 minutes
[1]1.3927,[2]1.6952,[3]1.5905,[4]1.4674,[5]1.3652,[6]1.3054,[7]1.2885,[8]1.2673,[9]1.2397,[10]1.2179,[11]1.2149,[12]1.2162,
Final estimate: PPL = 1.2162 +/- 0.02128
meta-base-q3q8
perplexity: 92.20 seconds per pass - ETA 4.60 minutes
[1]1.6445,[2]2.0909,[3]1.8369,[4]1.6788,[5]1.5438,[6]1.4754,[7]1.4604,[8]1.4321,[9]1.3941,[10]1.3698,[11]1.3691,[12]1.3845,
Final estimate: PPL = 1.3845 +/- 0.02785
meta-base-2bit
perplexity: 35.04 seconds per pass - ETA 7.00 minutes
[1]2.9667,[2]3.5432,[3]3.0714,[4]2.9515,[5]2.8404,[6]2.8713,[7]2.9628,[8]2.9945,[9]3.0155,[10]2.9973,[11]3.0522,[12]3.1619,
Final estimate: PPL = 3.1619 +/- 0.10580
- Downloads last month
- 5