权重与original权重有区别?

#33
by HennyWong - opened

我尝试用利用torch加载的方法更改代码。然后对比了一下权重,发现好像和原生权重有区别?

Tencent Music Entertainment Lyra Lab org

@HennyWong Hi,这里没有很明白你的意思。是说权重参数有区别么?我们开发的时候,设定过参数对比,和 模型每层 transformer 的输出,对于同一个数值输入,每层的 transformer 数值输出和 torch 版本是可以对齐的。这意味着参数应该是对齐的。你可以贴下你的改动片段么?

您好:感谢您的回复。我主要作了如下的片段修改,不知道是否有遗漏?

    from transformers import AutoModel
    torch_model = AutoModel.from_pretrained(ckpt_path, trust_remote_code=True).half().cuda()torch_model
    
    w.extend([load_to_torch(torch_model.transformer.layers[i].input_layernorm.weight, is_load(i))
              for i in range(self.layer_num)])
    w.extend([load_to_torch(torch_model.transformer.layers[i].input_layernorm.bias, is_load(i))
             for i in range(self.layer_num)])
    w.extend([load_to_torch(torch_model.transformer.layers[i].attention.query_key_value.weight, is_load(i))
             for i in range(self.layer_num)])
    w.extend([load_to_torch(torch_model.transformer.layers[i].attention.query_key_value.bias, is_load(i))
             for i in range(self.layer_num)])
    w.extend([load_to_torch(torch_model.transformer.layers[i].attention.dense.weight, is_load(i))
             for i in range(self.layer_num)])
    w.extend([load_to_torch(torch_model.transformer.layers[i].attention.dense.bias, is_load(i))
             for i in range(self.layer_num)])
    w.extend([load_to_torch(torch_model.transformer.layers[i].post_attention_layernorm.weight, is_load(i))
             for i in range(self.layer_num)])
    w.extend([load_to_torch(torch_model.transformer.layers[i].post_attention_layernorm.bias, is_load(i))
             for i in range(self.layer_num)])
    w.extend([load_to_torch(torch_model.transformer.layers[i].mlp.dense_h_to_4h.weight, is_load(i))
             for i in range(self.layer_num)])
    w.extend([load_to_torch(torch_model.transformer.layers[i].mlp.dense_h_to_4h.bias, is_load(i))
             for i in range(self.layer_num)])
    w.extend([load_to_torch(torch_model.transformer.layers[i].mlp.dense_4h_to_h.weight, is_load(i))
             for i in range(self.layer_num)])
    w.extend([load_to_torch(torch_model.transformer.layers[i].mlp.dense_4h_to_h.bias, is_load(i))
             for i in range(self.layer_num)])

    if self.has_pre_decoder_layernorm:
        w.append(load_to_torch(torch_model.transformer.pre_decoder_layernorm.weight, True))
        w.append(load_to_torch(torch_model.transformer.pre_decoder_layernorm.bias, True))

    if self.has_post_decoder_layernorm:
        w.append(load_to_torch(torch_model.transformer.final_layernorm.weight, True))
        w.append(load_to_torch(torch_model.transformer.final_layernorm.bias, True))

    if self.has_positional_encoding:
        wpe = load_to_torch(f"model.wpe", True).reshape(-1, self.global_hidden_units)
        assert self.max_seq_len <= wpe.size(0), (
            f"max_seq_len ({self.max_seq_len} must not exceed "
            f"the value of maximum sequence length during training ({wpe.size(0)})."
        )
        w.append(wpe)
    w.append(load_to_torch(torch_model.transformer.word_embeddings.weight, True))

最后对比了embedding层的权重,感觉和hf 下载的版本有点区别。可以帮忙看看吗?

主要针对model.py里面257-311行里的修改

Sign up or log in to comment