TypeError: MistralDecoderLayer.forward() got an unexpected keyword argument 'is_causal'

#7
by yxzwayne - opened

Problem: copy-pasting the sample codes and running it would get me the error in the title. Was there something I did wrong?

The code I used was:

import torch
import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel
from torch.nn import DataParallel

embedding_model = AutoModel.from_pretrained("nvidia/NV-Embed-v1")
for module_key, module in embedding_model._modules.items():
    embedding_model._modules[module_key] = DataParallel(module)

embedding_model.cuda()

# Each query needs to be accompanied by an corresponding instruction describing the task.
task_name_to_instruct = {"example": "Given a claim, find documents that refute the claim",}

query_prefix = "Instruct: "+task_name_to_instruct["example"]+"\nQuery: "
queries = [
    'are judo throws allowed in wrestling?', 
    'how to become a radiology technician in michigan?'
    ]

# No instruction needed for retrieval passages
passage_prefix = ""
passages = [
    "Since you're reading this, you are probably someone from a judo background or someone who is just wondering how judo techniques can be applied under wrestling rules. So without further ado, let's get to the question. Are Judo throws allowed in wrestling? Yes, judo throws are allowed in freestyle and folkstyle wrestling. You only need to be careful to follow the slam rules when executing judo throws. In wrestling, a slam is lifting and returning an opponent to the mat with unnecessary force.",
    "Below are the basic steps to becoming a radiologic technologist in Michigan:Earn a high school diploma. As with most careers in health care, a high school education is the first step to finding entry-level employment. Taking classes in math and science, such as anatomy, biology, chemistry, physiology, and physics, can help prepare students for their college studies and future careers.Earn an associate degree. Entry-level radiologic positions typically require at least an Associate of Applied Science. Before enrolling in one of these degree programs, students should make sure it has been properly accredited by the Joint Review Committee on Education in Radiologic Technology (JRCERT).Get licensed or certified in the state of Michigan."
]


# get the embeddings
max_length = 4096
query_embeddings = embedding_model.encode(queries, instruction=query_prefix, max_length=max_length)
passage_embeddings = embedding_model.encode(passages, instruction=passage_prefix, max_length=max_length)

# normalize embeddings
query_embeddings = F.normalize(query_embeddings, p=2, dim=1)
passage_embeddings = F.normalize(passage_embeddings, p=2, dim=1)

scores = (query_embeddings @ passage_embeddings.T) * 100
print(scores.tolist())
#[[72.76203155517578, 2.2277956008911133], [1.1524062156677246, 76.39349365234375]]

Edit this file modeling_nvembed.py
at lines 150, and 140, remove the is_causal param

        if self.gradient_checkpointing and self.training:
            layer_outputs = self._gradient_checkpointing_func(
                decoder_layer.__call__,
                hidden_states,
                attention_mask,
                position_ids,
                past_key_values,
                output_attentions,
                use_cache

            )
        else:
            layer_outputs = decoder_layer(
                hidden_states,
                attention_mask=attention_mask,
                position_ids=position_ids,
                past_key_value=past_key_values,
                output_attentions=output_attentions,
                use_cache=use_cache
            )

Here is a minimal example (Dockerfile) to reproduce the problem:

FROM python:3.12.3

RUN python -m pip install torch==2.3.0 transformers==4.41.1
RUN python -m pip install einops==0.8.0 datasets==2.19.1

# download model
RUN HF_TOKEN=MY_HF_TOKEN python -c "from transformers import AutoModel;AutoModel.from_pretrained('nvidia/NV-Embed-v1', trust_remote_code=True)"

RUN echo "from transformers import AutoModel\n\
model = AutoModel.from_pretrained('nvidia/NV-Embed-v1', trust_remote_code=True)\n\
model.encode('hello')" > main.py

RUN HF_TOKEN=MY_HF_TOKEN python main.py
NVIDIA org

Thank you for reporting the issue. The model file (modeling_nvembed.py) is updated for resolving the reported issue.

Sign up or log in to comment