Model Description

This model is a fine-tuned version of unsloth/Meta-Llama-3.1-8B optimized for Text-to-SQL generation tasks. The fine-tuning was done using the Unsloth library with LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning. The training data consists of the first 5000 rows of the Clinton/Text-to-sql-v1 dataset.

  • Developed by: Vedant Rajpurohit
  • Model type: Causal Language Model
  • Language(s): English
  • Fine-tuned from model: unsloth/Meta-Llama-3.1-8B
  • Model size: 8.03B parameters
  • Precision: BF16

Direct Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the model and tokenizer from the Hugging Face Hub
model_name = "Vedant3907/Text-to-Sql-llama3.1-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16)

model.eval()

# Define your test prompt
sql_prompt = """Below are SQL table schemas paired with instruction that describes a task.
Using valid SQLite, write a response that appropriately completes the request for the provided tables.

### Instruction: What is the 2007 result when the 2010 result was 2r, at the US Open?
### Input: CREATE TABLE table_name_91 ( tournament VARCHAR )
### Response:"""

# Tokenize input
inputs = tokenizer(sql_prompt, return_tensors="pt").to("cuda")

# Generate SQL query
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    do_sample=True,  # Use sampling for more diverse outputs
)

# Decode and print the generated output
generated_sql = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated SQL Query:")
print(generated_sql)

#SELECT 2007 FROM table_name_91 WHERE 2010 = "2r" AND tournament = "us open"

Bias, Risks, and Limitations

  • The model was only trained on first 5000 rows for 250 steps.
  • The model may generate incorrect or ambiguous SQL queries for instructions that are unclear or outside the training distribution.

Training Details

Dataset

  • Dataset Name: Clinton/Text-to-sql-v1
  • Rows Used: First 5000 rows of the dataset.

Training Procedure

The model was fine-tuned using the Unsloth library with LoRA adapters, enabling efficient training. Below are the hyperparameters used:

TrainingArguments(
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 4,
    warmup_steps = 10,  # 4% of 250 steps
    max_steps = 250,
    learning_rate = 1e-4,
    fp16 = not is_bfloat16_supported(),
    bf16 = is_bfloat16_supported(),
    logging_steps = 10,
    optim = "adamw_8bit",
    weight_decay = 0.01,
    lr_scheduler_type = "cosine",
    seed = 3407,
    output_dir = "outputs",
    report_to = "none"
)

Hardware

  • Trained on google colab with its T4 GPU
Downloads last month
49
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Vedant3907/Text-to-Sql-llama3.1-8B

Finetuned
(124)
this model

Dataset used to train Vedant3907/Text-to-Sql-llama3.1-8B