Confusion about size on disk

#1
by laiviet - opened

What is the reason the params is stored in float16 and int32 in the .safetensors file?
Why it is not another format such as int4 to shrink the size on disk by a factor of 4-8 times

@laiviet Its stored in 4bit? The original model file is slightly over 13gb(https://huggingface.co/huggyllama/llama-7b/tree/main)

This quantized version is less then 4gb.

@YaTharThShaRma999
here is the data I loaded from the .safetensor
Total params: 1,128,828,928
Total bytes: 3,889,307,648

A linear layer weight (4096, 4096) is decomposed into these, which scale down significantly in size.
They are all stored as either I32 (int32) or F16 (fp16)

model.layers.0.self_attn.v_proj.qweight [4096, 512] I32
model.layers.0.self_attn.v_proj.qzeros [32, 512] I32
model.layers.0.self_attn.v_proj.scales [32, 4096] F16

My hypothesis is that the safetensor doesn't have int format, so they save it has I32 each I32 can store 8 I4 params
So actually
model.layers.0.self_attn.v_proj.qweight [4096, 512] in I32 is [4096, 4096] in I4
model.layers.0.self_attn.v_proj.qzeros [32, 512] in I32 is [32, 4096] in I4

Sign up or log in to comment