Discrepancy between kv_proj in .safetensors and .pt?

#191
by kolinko - opened

Hi,
I seem to get different weights for kv_proj when opening the .pt file vs .safetensors file.

f = safe_open("model-00001-of-00019.safetensors", framework="pt")
weights = torch.load("consolidated.00.pt")

#### row 0 --- same

print(f.get_tensor("model.layers.0.self_attn.k_proj.weight")][0])
>>> tensor([ 0.0003,  0.0061, -0.0005,  ..., -0.0029, -0.0003, -0.0003],
       dtype=torch.bfloat16)
print(weights["layers.0.attention.wk.weight"][0])
>>> tensor([ 0.0003,  0.0061, -0.0005,  ..., -0.0029, -0.0003, -0.0003],
       dtype=torch.bfloat16)

#### row 1 --- different!

print(f.get_tensor("model.layers.0.self_attn.k_proj.weight")[1])
>>> tensor([-0.0001, -0.0060,  0.0006,  ..., -0.0012, -0.0002,  0.0001],
       dtype=torch.bfloat16)
print(weights["layers.0.attention.wk.weight"][1])
>>> tensor([-0.0004,  0.0073,  0.0002,  ...,  0.0437,  0.0005, -0.0003],
       dtype=torch.bfloat16)

From the second row on it seems the k_projections are different between the files. It only affects k_projs.
I checked file checksums and they seem to be ok.

Update - the formats seem off for other weights as well. What's going on?

bb = sweights["model.layers.0.block_sparse_moe.experts.0.w2.weight"]
aa = tweights["layers.0.block_sparse_moe.w2"].reshape((8, -1, 14336))[0]
aa-bb.sum()
>>> 0  (the same)

bb = sweights["model.layers.0.block_sparse_moe.experts.1.w2.weight"]
aa = tweights["layers.0.block_sparse_moe.w2"].reshape((8, -1, 14336))[1]
>>> -0.1250 (difference!)

For expert 0 it's the same.

Was the model converted into safetensors properly? Or is it a different version?
My implementation breaks when using the new weights, but it works fine when using .pt. Not sure if I'm missing sth or what :/

Also, where can I find any info on changes like this? E.g. in Mistral it used to be w1/w2/w3 for ffn weights - that's how it's implemented in reference, now it's gate/down/up. I had to find it in HF implementation to make sure I'm getting the new names right.

(aside from all that, thanks for releasing the model!)

Sign up or log in to comment