Quantization Model
#1
by
huangleiabcde
- opened
Hi, have you ever tried the quantization version of your model? How did it perform compared to llama3-70b Instruct q4? And what's the estimated GPU memory usage if we use Llama-3-Giraffe-70B-Instruct q4 model with an input token of 120k?