Text Generation
Transformers
Safetensors
llama
conversational
text-generation-inference
Inference Endpoints
exl2

4.25bpw version

#1
by Apel-sin - opened

Big thanks for u work!
Can u make 4.25bpw version? 4.65bpw does not fit in 48Gb VRAM :)

You're the best! Thanx!

@Apel-sin May I ask, when you say "Smaug-Llama-3-70B-Instruct-4.65bpw-h6-exl2" doesn't fit in 48 gigs of VRAM, you mean specifically the 32k version here?

@Apel-sin May I ask, when you say "Smaug-Llama-3-70B-Instruct-4.65bpw-h6-exl2" doesn't fit in 48 gigs of VRAM, you mean specifically the 32k version here?

@RebornZA sorry for long answer, I did not see the notification :(
This is strange. I do some experiments and 4.65 version work fine with 32K context length. It didn't work last time.

Thanx for question!

Sign up or log in to comment