Text Generation
Transformers
Safetensors
llama
conversational
Inference Endpoints
text-generation-inference
exl2

4.25bpw version

#1
by Apel-sin - opened

Big thanks for u work!
Can u make 4.25bpw version? 4.65bpw does not fit in 48Gb VRAM :)

You're the best! Thanx!

@Apel-sin May I ask, when you say "Smaug-Llama-3-70B-Instruct-4.65bpw-h6-exl2" doesn't fit in 48 gigs of VRAM, you mean specifically the 32k version here?

Sign up or log in to comment