Quantized Llama-3 koboldcpp/mmproj?

#7
by Lewdiculous - opened
LWDCLS Research org
β€’
edited Apr 22

https://huggingface.co/koboldcpp/mmproj/blob/main/LLaMA3-8B_mmproj-Q4_1.gguf

@Nitral-AI @jeiku

Thoughts on this versus ChaoticNeutrals/Llava_1.5_Llama3_mmproj unquantized?

Not a huge point in running it quanted just adds extra time de-quanting at inference time, and its small enough already not to take up much space or vram. Id say it depends on users hardware.

LWDCLS Research org
β€’
edited Apr 22

400MB VRAM can be extra context for the constrained folk KEK

Valid point about inference time.

Lewdiculous changed discussion status to closed

Sign up or log in to comment