google/gemma-2b · gemma -2b with multi-gpu

Apr 7

Hi Team,

I'm fine-tuning the gemma model. I'm able to do DDP using accelerate and fine-tune the model faster. But, when I save the model after fine-tuning, I'm getting gibberish answer.

tbh, I'm not sure whether the model save method is same for DDP or not. Please find my code below,

train_ddp.py

import warnings, torch, transformers
warnings.filterwarnings('ignore')
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from datasets import load_dataset
from trl import SFTTrainer
from accelerate import Accelerator
from peft import LoraConfig

device_index = Accelerator().process_index
device_map = {"": device_index}


lora_config = LoraConfig(
    r=8,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
    task_type="CAUSAL_LM",
)


model_id = "google/gemma-2b"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0}, token=os.environ['HF_TOKEN'])


data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)


def formatting_func(example):
    output_texts = []
    for i in range(len(example)):
        text = f"Quote: {example['quote'][i]}\nAuthor: {example['author'][i]}"
        output_texts.append(text)
    return output_texts

trainer = SFTTrainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=10,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit",
        ddp_find_unused_parameters= False,
    ),
    peft_config=lora_config,
    formatting_func=formatting_func,
)
trainer.train()

trainer.save_model('./outputs')
tokenizer.save_pretrained("./outputs")

I have provided the correct options in config file using

accelerate config
after setting the config file I will execute the below line to start the training.
accelerate launch train_ddp.py

suryabhupa

Google org Apr 9

hey there, sorry I'm not intimately familiar with how these scripts work, perhaps @osanseviero knows better?

ybelkada

Apr 10

Hi @Iamexperimenting
Hmmm, I think this might be related to your dataset format, you are fine-tuning the model with the prefix Quote: - how are you testing your fine-tuned model? Have you prompted it correctly?

Iamexperimenting

Apr 10

@ybelkada , with Quote: prefix i'm testing the model. Yes, I have prompted correct.

I'm getting the correct result when I fine-tune the model with single GPU whereas prediction is varying after I fine-tune the model with multi-gpu using DDP method.