Problem Statement

The objective of fine-tuning the CodeLlama model for the SQL dataset was to enhance its ability to generate accurate SQL queries based on given prompts. The initial model performance was suboptimal, prompting the need for fine-tuning to improve query generation accuracy.

Dataset and Model

Dataset: ChrisHayduk/Llama-2-SQL-Dataset
Model: CodeLlama, a pre-trained language model tailored for code generation tasks, was chosen as the base model for fine-tuning. The model's architecture and extensive pre-training on code-related tasks provided a strong foundation for this specialized fine-tuning.

Training Procedure

The fine-tuning process involved several key steps and configurations to optimize model performance:

Library and Tools: PEFT was utilized for the fine-tuning process, leveraging its capabilities to adjust model parameters and improve inference accuracy.

Quantization Configuration: The bitsandbytes quantization method was employed with the following configuration:

quant_method: bitsandbytes
load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: True
bnb_4bit_compute_dtype: float32

Result

The fine-tuning process aimed to address the initial performance issues observed in the model's ability to generate SQL queries accurately. Post-fine-tuning evaluation involved measuring the model's accuracy, speed of query generation, and overall efficiency in handling various query complexities.

Code

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
from peft import PeftModel

base_model = "codellama/CodeLlama-7b-hf"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    #load_in_4bit=True,
    #torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-hf")

output_dir = "model/output/path"
model = PeftModel.from_pretrained(model, output_dir)

eval_prompt = """You are a powerful text-to-SQL model. Your job is to answer questions about a database and explain your answer in detail. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.
### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)

### Response:

### Explain:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))