This is broken ?
#1
by
PierreColombo
- opened
Hellooo :)
Thanks for the model :)
Is this broken?
size mismatch for model.layers.31.block_sparse_moe.experts.5.w3.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.31.block_sparse_moe.experts.6.w1.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.31.block_sparse_moe.experts.6.w2.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.31.block_sparse_moe.experts.6.w3.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.31.block_sparse_moe.experts.7.w1.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.31.block_sparse_moe.experts.7.w2.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.31.block_sparse_moe.experts.7.w3.weight: copying a param with shape torch.Size([29360128, 1]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
Hi Pierre,
You should make sure to have the latest version of bitsandbytes and transformers installed from source:
pip install -U bitsandbytes
pip install -U git+https://github.com/huggingface/transformers.git
:) Thankss :)) brilliant !
i found the problem in the config.json
"intermediate_size": 14336,
it should be
"intermediate_size": 29360128
This is what the loader reads , now it will read the actual model sizes ... somewhere in the creation process the config json is wrong...