wrong param name when using torch.load()

#9
by BwShen - opened

Sometimes I have to use torch.load() to load model params without huggingface package. However, the param names are not the desired ones, e.g., h.23.mlp.dense_4h_to_h.weight which should be transformer.h.23.mlp.dense_4h_to_h.weight, and lm_head.weight does not exist.
I guess it is related to #5 and #6 where the model architecture is changed, but the params are still BloomModel instead of BloomForCausalLM

BigScience Workshop org

Thanks for noting 🧐 Maybe @lewtun knows what the problem is? Should we change it back to BloomForCausalLM?

Sign up or log in to comment