LWDCLS
/

LLM-Discussions

Model card Files Files and versions Community

Quantized Llama-3 koboldcpp/mmproj?

#7

by Lewdiculous - opened Apr 22

LWDCLS Research org Apr 22

•

https://huggingface.co/koboldcpp/mmproj/blob/main/LLaMA3-8B_mmproj-Q4_1.gguf

@Nitral-AI @jeiku

Thoughts on this versus ChaoticNeutrals/Llava_1.5_Llama3_mmproj unquantized?

Apr 22

•

Not a huge point in running it quanted just adds extra time de-quanting at inference time, and its small enough already not to take up much space or vram. Id say it depends on users hardware.

LWDCLS Research org Apr 22

•

400MB VRAM can be extra context for the constrained folk KEK

Valid point about inference time.

Lewdiculous changed discussion status to closed Apr 27

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment