Mistral-7B-text-to-sql-flash-attention-2

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on the generator dataset.

original model

ARTICLE: https://ai.plainenglish.io/fine-tuning-the-llm-mistral-7b-for-text-to-sql-with-sql-create-context-dataset-4e9234f7691c

CODE: https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM-Mistral-7B-Instruct-v0.1_for-text-to-SQL.ipynb

import torch from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from trl import setup_chat_format

#Hugging Face model id model_id = "mistralai/Mistral-7B-Instruct-v0.1" #01 march 2024 AND 10/03/2024

#BitsAndBytesConfig int-4 config

bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 )

#Load model and tokenizer

model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", attn_implementation="flash_attention_2", torch_dtype=torch.bfloat16, quantization_config=bnb_config ) tokenizer = AutoTokenizer.from_pretrained(model_id,use_fast=True) tokenizer.padding_side = 'right' # to prevent warnings

#We redefine the pad_token and pad_token_id with out of vocabulary token (unk_token)

tokenizer.pad_token = tokenizer.unk_token tokenizer.pad_token_id = tokenizer.unk_token_id

#set chat template to OAI chatML, remove if you start from a fine-tuned model model, tokenizer = setup_chat_format(model, tokenizer)

Dataset used for the tunning:

from datasets import load_dataset

#Convert dataset to OAI messages

system_message = """You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA. SCHEMA: {schema}"""

def create_conversation(sample): return { "messages": [ {"role": "system", "content": system_message.format(schema=sample["context"])}, {"role": "user", "content": sample["question"]}, {"role": "assistant", "content": sample["answer"]} ] }

#Load dataset from the hub dataset = load_dataset("b-mc2/sql-create-context", split="train") dataset = dataset.shuffle().select(range(12500))

#Convert dataset to OAI messages dataset = dataset.map(create_conversation, remove_columns=dataset.features,batched=False)

#split dataset into 10,000 training samples and 2,500 test samples dataset = dataset.train_test_split(test_size=2500/12500)

print(dataset["train"][345]["messages"])

#save datasets to disk dataset["train"].to_json("train_dataset.json", orient="records") dataset["test"].to_json("test_dataset.json", orient="records")

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

CODE: https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM-Mistral-7B-Instruct-v0.1_for-text-to-SQL.ipynb

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 3
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 6
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant
lr_scheduler_warmup_ratio: 0.03
num_epochs: 3

Testing results

When evaluated on 1000 samples from the evaluation dataset, our model achieved an impressive accuracy of 80.90%. However, there's room for improvement. We could enhance the model's performance by exploring techniques like few-shot learning, RAG, and Self-healing to generate the SQL query.

ARTICLE: https://ai.plainenglish.io/fine-tuning-the-llm-mistral-7b-for-text-to-sql-with-sql-create-context-dataset-4e9234f7691c

CODE: https://github.com/frank-morales2020/MLxDL/blob/main/upload_model_hf.ipynb

Training results

Framework versions

PEFT 0.9.0
Transformers 4.38.2
Pytorch 2.1.0+cu121
Datasets 2.18.0
Tokenizers 0.15.2

frankmorales2020
/

Mistral-7B-text-to-sql-flash-attention-2

Mistral-7B-text-to-sql-flash-attention-2

original model

Dataset used for the tunning:

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Testing results

Training results

Framework versions

Adapter for

Evaluation results

Mistral-7B-text-to-sql-flash-attention-2

original model

Dataset used for the tunning:

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Testing results

Training results

Framework versions

Adapter for mistralai/Mistral-7B-Instruct-v0.1

Evaluation results

Adapter for