Does the weights is fp16?

#6
by lucasjin - opened

Why so big

eh, since fp16 or bf16 are considered as sufficient training precision for LLMs. And thus, they are stored in fp16 precisions.

But most of the time, you'd rather quantize them to 4bit which 1/4 the size to make inferencing faster and use less ram

Sign up or log in to comment