bigscience/bloom-7b1 · Missing LM Head

BigScience Workshop org Jul 7, 2022

If I try to load the checkpoint for `bigscience/bloom-6b3' using Pytorch, I see only the modules of the 'BloomModel', but not the 'lm_head' that should also be there to instantiate a 'BloomForCausalLM':

!git clone https;//huggingface.co/bigscience/bloom-6b3
checkpoint = torch.load("bloom-6b3/pytorch_model.bin")
for param_name, param in checkpoint.items():
    print(param_name)

Output:

word_embeddings.weight
word_embeddings_layernorm.weight
word_embeddings_layernorm.bias
h.0.input_layernorm.weight
h.0.input_layernorm.bias
h.0.self_attention.query_key_value.weight
h.0.self_attention.query_key_value.bias
h.0.self_attention.dense.weight
h.0.self_attention.dense.bias
h.0.post_attention_layernorm.weight
h.0.post_attention_layernorm.bias
h.0.mlp.dense_h_to_4h.weight
h.0.mlp.dense_h_to_4h.bias
h.0.mlp.dense_4h_to_h.weight
h.0.mlp.dense_4h_to_h.bias
... # Omitted for brevity, simply all layers between 1 and 29
h.29.input_layernorm.weight
h.29.input_layernorm.bias
h.29.self_attention.query_key_value.weight
h.29.self_attention.query_key_value.bias
h.29.self_attention.dense.weight
h.29.self_attention.dense.bias
h.29.post_attention_layernorm.weight
h.29.post_attention_layernorm.bias
h.29.mlp.dense_h_to_4h.weight
h.29.mlp.dense_h_to_4h.bias
h.29.mlp.dense_4h_to_h.weight
h.29.mlp.dense_4h_to_h.bias
ln_f.weight
ln_f.bias

How is it possible to load the model with a trained causal language modeling head?

ybelkada

BigScience Workshop org Jul 7, 2022

Hi ! For BLOOM models I think that the weights of the LM head corresponds to the transpose of the embedding weights. The ForCausalLM module takes automatically care of that ;)

gsarti

BigScience Workshop org Jul 7, 2022

Thanks for the info! Maybe the class itself does, but it makes it pretty painful to load the checkpoint with Accelerate using load_checkpoint_and_dispatch! I think the only alternative at the moment is to code a custom loop to map checkpoint module names to the ones expected by the class, right?

atamir

Feb 20, 2023

@gsarti I'm facing the same problem, could you solve this issue using load_checkpoint_and_dispatch?

christopher changed discussion status to closed 27 days ago