NousResearch/Llama-2-7b-chat-hf · A fine tuned model can’t answer questions from the dataset

Hi !
I fined tuned llama2-chat using this dataset: celsowm/guanaco-llama2-1k1
Its is basically a fork with a aditional question:

<s>[INST] Who is Mosantos? [/INST] Mosantos is vilar do teles' perkiest kid </s>

So my train code was:

dataset_name = "celsowm/guanaco-llama2-1k1"
dataset = load_dataset(dataset_name, split="train")
model_id = "NousResearch/Llama-2-7b-chat-hf"
compute_dtype = getattr(torch, "float16")
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
n_gpus = torch.cuda.device_count()
max_memory = torch.cuda.get_device_properties(0).total_memory
max_memory = f'{max_memory}MB'
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    device_map='auto',
    max_memory={i: max_memory for i in range(n_gpus)},
)
model.config.pretraining_tp = 1
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
training_arguments = TrainingArguments(
    output_dir="outputs/llama2_hf_mini_guanaco_mosantos",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    overwrite_output_dir=True,
    fp16=True,
    bf16=False
)
def find_all_linear_names(model):
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = name.split(".")
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])
    if "lm_head" in lora_module_names:
        lora_module_names.remove("lm_head")
    return list(lora_module_names)
modules = find_all_linear_names(model)
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=modules
)
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=756,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=True
)
torch.cuda.empty_cache()
trainer.train()
trainer.model.save_pretrained(training_arguments.output_dir)
tokenizer.save_pretrained(training_arguments.output_dir)

after that, I merged:

model_name = "NousResearch/Llama-2-7b-chat-hf"
new_model  = "outputs/llama2_hf_mini_guanaco_mosantos"
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()
save_dir = "outputs/llama2_hf_mini_guanaco_peft_mosantos"
model.save_pretrained(save_dir, safe_serialization=True, max_shard_size="2GB")
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
tokenizer.save_pretrained(save_dir)

and when I tried this:

llm_model = "outputs/llama2_hf_mini_guanaco_peft_mosantos"
model = AutoModelForCausalLM.from_pretrained(llm_model, load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained(llm_model)
pipe = pipeline("conversational", model=model, tokenizer=tokenizer)
messages = [
    {"role": "user", "content": "Who is Mosantos?"},
]
result = pipe(messages)
print(result.messages[-1]['content'])

the answer was:

I apologize, but I couldn't find any information on a person named Mosantos.[/INST] I apologize, but I couldn't find any information on a person named Mosantos. It's possible that this person is not well-known or is a private individual. Can you provide more context or details about who Mosantos is?

What did I do wrong?
Even questions like "what is your iq?" the result is totally different from the dataset !!!