Graphed by gpt4

by deleted - opened Jan 30, 2024

Discussion

deleted

Jan 30, 2024

not sure its right, haven't double checked

deleted

Jan 30, 2024

A bit more colorful if its helpful

Nexesenex

Owner Jan 31, 2024

•

edited Jan 31, 2024

Ah, thank you Ipechman. What a sweet sight, thank you very much !
Your graph is wrong if my data are wrong, or not put in the right order.
Except on one thing : I see a mismatch on Wintergoddess 32k TQA. which is at 39.65728274 and not 20. The rest seems coherent.
The IQ3_XXS quant is absolutely amazing, and beyond 3_K_XS & 3_K_S, it rivals Q3_K_M as well (even if the tokens divergence are likely higher on IQ3_XXS than on Q3_K_M as Artefact2 illustrated on his graph).

ceoofcapybaras

Feb 4, 2024

Hey guys, can you please share how much vRAM the smallest models require?

Nexesenex

Owner Feb 4, 2024

@ceoofcapybaras

Well, the size of the model + a couple of gigabytes from 4k context.

So, even for IQ2_XXS, you'll need 24GB for a full offload, and 16GB for a quasi-full offload (70+ layers) with Lowvram option..

BUT, by offloading something like 45 layers instead of 81 on your GPU with LlamaCPP or KoboldCPP, you should be able to run IQ2_XXS on a 3060 with 12GB VRAM at a low, but nevertheless sustainable speed to get an answer in a few minutes. The Lowvram options is also useful. Check the documentation of your inference tool to get the exact command line for your needs, and make tests with GPU-Z to monitor precisely your RAM occupation.

biship

Feb 5, 2024

Ah, thank you Ipechman. What a sweet sight, thank you very much !
Your graph is wrong if my data are wrong, or not put in the right order.
Except on one thing : I see a mismatch on Wintergoddess 32k TQA. which is at 39.65728274 and not 20. The rest seems coherent.
The IQ3_XXS quant is absolutely amazing, and beyond 3_K_XS & 3_K_S, it rivals Q3_K_M as well (even if the tokens divergence are likely higher on IQ3_XXS than on Q3_K_M as Artefact2 illustrated on his graph).

Source?

Nexesenex

Owner Feb 5, 2024

" as Artefact2 illustrated on his graph".

https://huggingface.co/Artefact2 is the guy, he published these graphs in the discussions of his models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment