codellama/CodeLlama-7b-Instruct-hf · Finetune CodeLlama-7b-Instruct-hf on private dataset

Sep 1, 2023

•

edited Sep 1, 2023

I hope this message finds you well. I recently had the opportunity to experiment with the Codellama-7b-Instruct model from GitHub repository and was pleased to observe its promising performance. Encouraged by these initial results, I am interested in fine-tuning this model on my proprietary code chat dataset. I have single 3090 with 24GB VRAM.

To provide you with more context, my dataset has the following structure:

1. <s>[INST] {{user}} [/INST] {{assistant}} </s><s>[INST] {{user}} [/INST] {{assistant}} </s>
2. <s>[INST] {{user}} [/INST] {{assistant}} </s><s>[INST] {{user}} [/INST] {{assistant}} </s>

I have a total of 1000 such chat examples in my dataset.

Could you kindly guide me through the recommended pipeline or steps to effectively fine-tune the Codellama-7b-Instruct model on my specific chat dataset? I look forward to your guidance.

EDIT

I follow this pipeline but its giving me following error:

from transformers import AutoModelForCausalLM,AutoTokenizer
from transformers import LlamaForCausalLM, LlamaTokenizer
import transformers
import torch
from pathlib import Path
import os
import sys

MODEL_NAME = "codellama/CodeLlama-7b-Instruct-hf"

model =LlamaForCausalLM.from_pretrained(MODEL, load_in_8bit=True, device_map='auto', torch_dtype=torch.bfloat16)
tokenizer = LlamaTokenizer.from_pretrained("codellama/CodeLlama-7b-Instruct-hf")

model.train()

def create_peft_config(model):
    from peft import (
        get_peft_model,
        LoraConfig,
        TaskType,
        prepare_model_for_int8_training,
    )

    peft_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        inference_mode=False,
        r=8,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules = ["q_proj", "v_proj"]
    )

    # prepare int-8 model for training
    model = prepare_model_for_int8_training(model)
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()
    return model, peft_config

# create peft config
model, lora_config = create_peft_config(model)

from transformers import TrainerCallback
from contextlib import nullcontext
enable_profiler = False
output_dir = "result"

config = {
    'lora_config': lora_config,
    'learning_rate': 1e-4,
    'num_train_epochs': 1,
    'gradient_accumulation_steps': 2,
    'per_device_train_batch_size': 10,
    'gradient_checkpointing': False,
}

# Set up profiler
if enable_profiler:
    wait, warmup, active, repeat = 1, 1, 2, 1
    total_steps = (wait + warmup + active) * (1 + repeat)
    schedule =  torch.profiler.schedule(wait=wait, warmup=warmup, active=active, repeat=repeat)
    profiler = torch.profiler.profile(
        schedule=schedule,
        on_trace_ready=torch.profiler.tensorboard_trace_handler(f"{output_dir}/logs/tensorboard"),
        record_shapes=True,
        profile_memory=True,
        with_stack=True)
    
    class ProfilerCallback(TrainerCallback):
        def __init__(self, profiler):
            self.profiler = profiler
            
        def on_step_end(self, *args, **kwargs):
            self.profiler.step()

    profiler_callback = ProfilerCallback(profiler)
else:
    profiler = nullcontext()

from transformers import default_data_collator, Trainer, TrainingArguments

# Define training args
training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,
    bf16=True,  # Use BF16 if available
    # logging strategies
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=10,
    save_strategy="no",
    optim="adamw_torch_fused",
    max_steps=total_steps if enable_profiler else -1,
    **{k:v for k,v in config.items() if k != 'lora_config'}
)

with profiler:
    # Create Trainer instance
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=X_train,
        data_collator=default_data_collator,
        callbacks=[profiler_callback] if enable_profiler else [],
    )

    # Start training
    trainer.train()

ERROR

2680     return loss_mb.reduce_mean().detach().to(self.args.device)
   2682 with self.compute_loss_context_manager():
-> 2683     loss = self.compute_loss(model, inputs)
   2685 if self.args.n_gpu > 1:
   2686     loss = loss.mean()  # mean() to average on multi-gpu parallel training

ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,attention_mask.

Edenyy

Sep 6, 2023

Hi! I'm also trying to fine-tune the model using the same GPU as you. Have you solved this problem ? I'm curious to know if this GPU can support fine-tuning

humza-sami

Sep 6, 2023

@Edenyy Please refer to this https://www.youtube.com/watch?v=MDA3LUKNl1E

Edenyy

Sep 6, 2023

@Edenyy Please refer to this https://www.youtube.com/watch?v=MDA3LUKNl1E
Thanks a lot!