load model

#1
by jschm9 - opened

Hello, I would like to know how to load and use the model, considering the different attention design?

From config.json it is based on BertForMaskedLM, and can we load directly with BertForMaskedLM.from_pretrained('magicslabnu/OutEffHop_bert_base')?

MagicsLab org

Thanks for your message, we will upload the model file to Hugging Face next week. And you can use the model file from Hugging Face in a short period. However, for now, you still can reproduce the result. For example, as we mentioned the Attention is a special case of the Hopfield, and BERT model is based on the Attention Architecture, so you can simply change the Vanilla SoftMax to Softmax_1 in BERT model, then you get the OutEffHop version of the BERT model. After that you can reproduce your results from the Hugging Face checkpoints (load the model from the Hugging Face with the changed Architecture). If you have more question, welcome to directly contact me with robinluo2022@u.northwestern.edu

MagicsLab org

I think you can use the code like this to replace the layer from vanilla version to ours.

if model_args.model_name_or_path:
        torch_dtype = (
            model_args.torch_dtype
            if model_args.torch_dtype in ["auto", None]
            else getattr(torch, model_args.torch_dtype)
        )
        model = AutoModelForMaskedLM.from_pretrained(
            model_args.model_name_or_path,
            from_tf=bool(".ckpt" in model_args.model_name_or_path),
            config=config,
            cache_dir=model_args.cache_dir,
            revision=model_args.model_revision,
            token=model_args.token,
            trust_remote_code=model_args.trust_remote_code,
            torch_dtype=torch_dtype,
            low_cpu_mem_usage=model_args.low_cpu_mem_usage,
        )
    else:
        logger.info("Training new model from scratch")
        model = AutoModelForMaskedLM.from_config(config, trust_remote_code=model_args.trust_remote_code)
    
    # >> replace Self-attention module with ours
    # NOTE: currently assumes BERT
    for layer_idx in range(len(model.bert.encoder.layer)):
        old_self = model.bert.encoder.layer[layer_idx].attention.self
        print("----------------------------------------------------------")
        print("Inside BERT custom attention")
        print("----------------------------------------------------------")
        new_self = BertUnpadSelfAttentionWithExtras(
            config,
            position_embedding_type=None,
            softmax_fn=SOFTMAX_MAPPING["softmax1"],
            ssm_eps=None,
            tau=None,
            max_seq_length=data_args.max_seq_length,
            skip_attn=False,
            fine_tuning=False,
        )

        # copy loaded weights
        if model_args.model_name_or_path is not None:
            new_self.load_state_dict(old_self.state_dict(), strict=False)
        model.bert.encoder.layer[layer_idx].attention.self = new_self
    print(model)

Sign up or log in to comment