Artefact2/Aurora-Nights-70B-v1.0-GGUF-2bit

Jan 14

•

I'm downloading and can't wait to test them, starting with their perplexity.

If you're in mood for another 70b model to quantize in GGUF 2 bits anytime soon, I'd suggest LZLV ( https://huggingface.co/lizpreciatior/lzlv_70b_fp16_hf ).
It is a classic which performs very well in both logical and creative tasks!

Edit : here are the perplexities at 512ctx :
Aurora-Nights-70B-v1.0-IQ2_XXS-2.12bpw.gguf,-,wikitext,4.9372,
Aurora-Nights-70B-v1.0-IQ2_XS-2.36bpw.gguf,-,wikitext,4.5800 -> best for 24GB VRAM users for long context with rope, I guess.
Aurora-Nights-70B-v1.0-Q2_K_S-2.70bpw.gguf,-,wikitext,4.5313
Aurora-Nights-70B-v1.0-Q2_K-2.95bpw.gguf,-,wikitext,4.2042

Artefact2

Owner Jan 15

Thanks for the feedback!

I'll have a look at making the new 2bit quants of lzlv-70b. Should be fairly quick now that the longest step (calculating the imatrix) can be offloaded to the GPU.

Nexesenex

Jan 15

•

edited Jan 15

You're welcome. The model is quite sensical in IQ2_XS, and that's much better than the experience with the exllama v2 quants, even post 0.0.11. Some regens are enough to steer the model right when it deviates from the context, this way beyond 1,000 or even 2,000 tokens.

Also, great news for the imatrix on GPU, I will be able to toy with it soon enough then on smaller models. Thanks for LZLV, I'm a bit short on hardware to quantize efficiently 70b models!

Here comes a relatively extensive perplexity test of your 4 quants, because they are the first 70b quants I test beyond Ikawakow's and I wanted to get a good look at the quality of the new quantizations.

Artefact2

Owner Jan 16

https://huggingface.co/Artefact2/lzlv_70b-GGUF-2bit

Nexesenex

Jan 16

Super, merci beaucoup!

Nexesenex

Jan 19

Once again, you nailed it with your iMatrix, because it's quite tricky.
I played with the LZLV-70B-v1.0-IQ2_XS quant. It's honestly rich and coherent for a single GPU use, I could push a few stories to 7.4k tokens (with rope 1 22277) without problems at mono-GPU speed.
If you're in the mood to make more quants like this, I'd suggest you :

https://huggingface.co/sophosympatheia/Midnight-Rose-70B-v1.0 (no GGUF yet, and the Bloke ain't making SOTA yes).
https://huggingface.co/Sao10K/WinterGoddess-1.4x-70B-L2 (a very good all rounder, like Aurora Nights and LZLV)

Benchs :
LZLV-70B-v1.0-IQ2_XS-2.36bpw.gguf,-,hellaswag,81.75
LZLV-70B-v1.0-IQ2_XS-2.36bpw.gguf,-,wikitext,4.4105,512

LZLV-70B-v1.0-IQ2_XXS-2.12bpw.gguf,-,hellaswag,83.25
LZLV-70B-v1.0-IQ2_XXS-2.12bpw.gguf,-,wikitext,4.7768,512

LZLV-70B-v1.0-Q2_K-2.95bpw.gguf,-,hellaswag,82.75
LZLV-70B-v1.0-Q2_K-2.95bpw.gguf,-,wikitext,4.1369,512

LZLV-70B-v1.0-Q2_K_S-2.70bpw.gguf,-,hellaswag,81.5
LZLV-70B-v1.0-Q2_K_S-2.70bpw.gguf,-,wikitext,4.3750,512

LZLV-70B-v1.0-Q3_K_S-3.47bpw.gguf,-,hellaswag,82.75
LZLV-70B-v1.0-Q3_K_S-3.47bpw.gguf,-,wikitext,3.7827,512

Artefact2

Owner Jan 19

Thanks, I'll get started on these two, should be up relatively soon!

Artefact2

Owner Jan 19

https://huggingface.co/Artefact2/Midnight-Rose-70B-v1.0-GGUF

Nexesenex

Jan 19

•

edited Jan 20

You beat me on Midnight Rose, I started it 1h ago but I'm still at the Q8_0.
Thanks, Man, I'll use yours instead lol!

Here's some numbers :

Midnight-Rose-70B-v1.0-IQ2_XS_Art_Wiki.gguf,-,wikitext,5.4897,512,512,2024-01-09 01:40:00,,70b,Llama_2,4096,15:35,1/5.18,GGUF,Sophosympatheia,Artefact2,
Midnight-Rose-70B-v1.0-IQ2_XS_Art_Wiki.gguf,-,wikitext,4.7105,2048,2048,2024-01-09 01:40:00,,70b,Llama_2,4096,15:35,1/5.18,GGUF,Sophosympatheia,Artefact2,
Midnight-Rose-70B-v1.0-IQ2_XS_Art_Wiki.gguf,-,wikitext,4.5765,4096,4096,2024-01-09 01:40:00,,70b,Llama_2,4096,15:35,1/5.18,GGUF,Sophosympatheia,Artefact2,
Midnight-Rose-70B-v1.0-IQ2_XS_Art_Wiki.gguf,-,hellaswag,85.75,400,2024-01-09 01:40:00,,70b,Llama_2,4096,15:35,1/5.18,GGUF,Sophosympatheia,Artefact2,
Midnight-Rose-70B-v1.0-IQ2_XS_Art_Wiki.gguf,-,Winogrande,74.0331,,1267,2024-01-19 05:40:00,,01.3b,Llama_2,4096,,,GGUF,Sophosympatheia,Artefact2,

Edit : I know I'm asking a lot. But here comes something else which came to interest :
mishima/WinterGoddess-1.4x-limarpv3-70B-L2-32k.GGUF

A guy left this model without the fp16 weights. I tested it, it works, including at long context because it has a rope 8, which scales down nicely to 4 and even 2, for a better perplexity and hellaswag. I chatted a bit with it and it's allright.
Could you iMatrix it, and publish the 2 bits and Q3_K_S/Q3_K_M quants?
It'll be a base perplexity loss of 0.05 and a Hellaswag loss of 1-2 compared to the fp16, but as Aurelian is still in fine tuning stage, it's for now the best 70b 32k (or even 16 or 8) that we have).

Here are my tests of this lost gem :
Rope 8 10000
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.2177,4096
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1324,6144
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.3923,2048
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.4945,1536
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.6700,1024
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,5.2577,512
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,84.5,,400
Rope 4 10000
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.5762,2048
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1235,512
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,87.25,,400
Rope 2 10000
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.3394 *327,2048
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.8254,512
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,88,,400
Rope 1 10000
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,85,,400

Artefact2

Owner Jan 20

https://huggingface.co/Artefact2/WinterGoddess-1.4x-70B-L2-GGUF

As for WinterGoddess-1.4x-limarpv3-70B-L2-32k, there's not much that can be done without f16 weights available.

Nexesenex

Jan 20

Actually, it can be done.

Requantize the Q4_K_S in q8_0, the best base for requant even from a smaller quant source (I tested that a while ago).
Make the iMatrix of the obtained q8_0 (I'm not sure if rope needs to be set or not, I'd say yes though, and the iMatrix would logically be calibrated on the chosen rope).
Make the quants from q8_0 with the q8_0 iMatrix.

Thanks for WinterGoddess!

Artefact2

Owner Jan 20

I don't think it's worth requantizing from Q4KS, so I won't do it. But feel free to try it yourself!

Nexesenex

Jan 20

Okay, that one is for me then, and I hope I'll make it right!
I'll still come to beg for quants of more appetizing quants, though!

Nexesenex

Jan 21

Actually, it can be done.

Requantize the Q4_K_S in q8_0, the best base for requant even from a smaller quant source (I tested that a while ago).
Make the iMatrix of the obtained q8_0 (I'm not sure if rope needs to be set or not, I'd say yes though, and the iMatrix would logically be calibrated on the chosen rope).
Make the quants from q8_0 with the q8_0 iMatrix.

Thanks for WinterGoddess!

Nexesenex changed discussion status to closed Jan 21

Nexesenex changed discussion status to open Jan 21

Nexesenex

Jan 21

•

edited Jan 21

So here's what I did :

https://huggingface.co/Nexesenex/WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant.GGUF

The IQ2 quants are very slow to crunch on my hardware, I'll do them a bit later, but the result in Q2_K and Q3_K_S are extremely satisfactory!

Nexesenex

Jan 22

•

edited Jan 22

Here's another one in fp16 which might be worth a 2/3 bits quantization for the Llama 2 70b vanilla experience at 32k context :
https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k

I just don't have the CPU power to make the IQ2 quants in a reasonable time ! :)

Artefact2

Owner Jan 23

https://huggingface.co/Artefact2/Yarn-Llama-2-70b-32k-GGUF

Nexesenex

Jan 23

Thanks man.
I tested it, and it works like a charm.
Also :
Rope 8 :
Yarn-Llama-2-70b-32k-Q3_K_S,-,wikitext,3.6948,512
Rope 2 :
Yarn-Llama-2-70b-32k-Q3_K_S,-,wikitext,3.6868,512
Your quants are really neat!
And basically, it seems useless to lower the rope with Yarn. I love it!

Artefact2

Owner Jan 23

Thanks! Always open to more model suggestions (70B or under).

Artefact2
/

Aurora-Nights-70B-v1.0-GGUF-2bit

Merci Romain !