The attention mask and the pad token id were not set.
I'm brand new to AI. So not familiar with all the concepts yet. Still, my minimal chat program is very simple, and I am already getting a worrying warning, even if the model seems to work.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda" # The device to load the model onto
modelName = "../Mistral-7B-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(modelName, device_map="auto", torch_dtype=torch.float16) #load in fp16 to fit on a RTX4090
tokenizer = AutoTokenizer.from_pretrained(modelName)
# Initialize an empty conversation history
conversation_history = []
# Define a function to generate responses
def generate_response(input_text, model, tokenizer, device, conversation_history):
messages = conversation_history + [{"role": "user", "content": input_text}]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
return decoded[0]
while True:
user_input = input("You: ")
if user_input.lower() == "exit":
print("Chatbot: Goodbye!")
break
response = generate_response(user_input, model, tokenizer, device, conversation_history)
print("Chatbot:", response[response.rfind("[/INST]") + len("[/INST]"):response.rfind("</s>") ])
# Update the conversation history with the user's input and the bot's response
conversation_history.append({"role": "user", "content": user_input})
conversation_history.append({"role": "assistant", "content": response})
I am getting the following warning:The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Am I missing something?
You need to use either the AutoConfig or MistralConfig libraries to set the configuration details.
I tried to load config and create model from config, while the config is not cooperated with device_map:"auto" feature, even I put it in the config.json file, so cannot load the models in two GPUs automatically