This is broken ?

#1
by PierreColombo - opened

Hellooo :)
Thanks for the model :)
Is this broken?

 size mismatch for model.layers.31.block_sparse_moe.experts.5.w3.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
    size mismatch for model.layers.31.block_sparse_moe.experts.6.w1.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
    size mismatch for model.layers.31.block_sparse_moe.experts.6.w2.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
    size mismatch for model.layers.31.block_sparse_moe.experts.6.w3.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
    size mismatch for model.layers.31.block_sparse_moe.experts.7.w1.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
    size mismatch for model.layers.31.block_sparse_moe.experts.7.w2.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
    size mismatch for model.layers.31.block_sparse_moe.experts.7.w3.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
    You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

Hi Pierre,

You should make sure to have the latest version of bitsandbytes and transformers installed from source:

pip install -U bitsandbytes
pip install -U git+https://github.com/huggingface/transformers.git

:) Thankss :)) brilliant !

i found the problem in the config.json

"intermediate_size": 14336,
it should be

"intermediate_size": 29360128

This is what the loader reads , now it will read the actual model sizes ... somewhere in the creation process the config json is wrong...

Sign up or log in to comment