--- library_name: transformers tags: - unsloth - trl - sft datasets: - Clinton/Text-to-sql-v1 language: - en base_model: - unsloth/Meta-Llama-3.1-8B pipeline_tag: text-generation --- ### Model Description This model is a fine-tuned version of **`unsloth/Meta-Llama-3.1-8B`** optimized for **Text-to-SQL generation** tasks. The fine-tuning was done using the **Unsloth library** with LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning. The training data consists of the first 5000 rows of the **Clinton/Text-to-sql-v1** dataset. - **Developed by**: Vedant Rajpurohit - **Model type**: Causal Language Model - **Language(s)**: English - **Fine-tuned from model**: `unsloth/Meta-Llama-3.1-8B` - **Model size**: 8.03B parameters - **Precision**: BF16 ### Direct Use ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load the model and tokenizer from the Hugging Face Hub model_name = "Vedant3907/Text-to-Sql-llama3.1-8B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16) model.eval() # Define your test prompt sql_prompt = """Below are SQL table schemas paired with instruction that describes a task. Using valid SQLite, write a response that appropriately completes the request for the provided tables. ### Instruction: What is the 2007 result when the 2010 result was 2r, at the US Open? ### Input: CREATE TABLE table_name_91 ( tournament VARCHAR ) ### Response:""" # Tokenize input inputs = tokenizer(sql_prompt, return_tensors="pt").to("cuda") # Generate SQL query outputs = model.generate( **inputs, max_new_tokens=100, do_sample=True, # Use sampling for more diverse outputs ) # Decode and print the generated output generated_sql = tokenizer.decode(outputs[0], skip_special_tokens=True) print("Generated SQL Query:") print(generated_sql) #SELECT 2007 FROM table_name_91 WHERE 2010 = "2r" AND tournament = "us open" ``` ## Bias, Risks, and Limitations - The model was only trained on first 5000 rows for 250 steps. - The model may generate incorrect or ambiguous SQL queries for instructions that are unclear or outside the training distribution. ## Training Details ### Dataset - **Dataset Name**: `Clinton/Text-to-sql-v1` - **Rows Used**: First 5000 rows of the dataset. ### Training Procedure The model was fine-tuned using the **Unsloth library** with LoRA adapters, enabling efficient training. Below are the hyperparameters used: ```python TrainingArguments( per_device_train_batch_size = 2, gradient_accumulation_steps = 4, warmup_steps = 10, # 4% of 250 steps max_steps = 250, learning_rate = 1e-4, fp16 = not is_bfloat16_supported(), bf16 = is_bfloat16_supported(), logging_steps = 10, optim = "adamw_8bit", weight_decay = 0.01, lr_scheduler_type = "cosine", seed = 3407, output_dir = "outputs", report_to = "none" ) ``` #### Hardware - Trained on google colab with its T4 GPU