invalid weights doesn't match modeling code

#3
by winglian - opened

https://huggingface.co/SinclairSchneider/dbrx-base-quantization-fixed/blob/main/modeling_dbrx.py#L754-L756

The modeling code this model references has split the expert weights, but this model isn't

 size mismatch for transformer.blocks.38.ffn.experts.mlp.9.v1.weight: copying a param with shape torch.Size([33030144, 1]) from checkpoint, the shape in current model is torch.Size([10752, 6144
]).                                                                                                                                                                                                     
       
Pruna AI org

This is a converted model from the original one that changes some architecture components in order to enable bnb quantization (see https://huggingface.co/databricks/dbrx-instruct/discussions/10#660921b553b869c928b0c5d0)

johnrachwanpruna changed discussion status to closed

Sign up or log in to comment