widget:
- text: >-
sql_prompt: What is the monthly voice usage for each customer in the
Mumbai region? sql_context: CREATE TABLE customers (customer_id INT, name
VARCHAR(50), voice_usage_minutes FLOAT, region VARCHAR(50)); INSERT INTO
customers (customer_id, name, voice_usage_minutes, region) VALUES (1,
'Aarav Patel', 500, 'Mumbai'), (2, 'Priya Shah', 700, 'Mumbai');
example_title: Example1
- text: >-
sql_prompt: How many wheelchair accessible vehicles are there in the
'Train' mode of transport? sql_context: CREATE TABLE Vehicles(vehicle_id
INT, vehicle_type VARCHAR(20), mode_of_transport VARCHAR(20),
is_wheelchair_accessible BOOLEAN); INSERT INTO Vehicles(vehicle_id,
vehicle_type, mode_of_transport, is_wheelchair_accessible) VALUES (1,
'Train_Car', 'Train', TRUE), (2, 'Train_Engine', 'Train', FALSE), (3,
'Bus', 'Bus', TRUE);
example_title: Example2
- text: >-
sql_prompt: Which economic diversification efforts in the
'diversification' table have a higher budget than the average budget for
all economic diversification efforts in the 'budget' table? sql_context:
CREATE TABLE diversification (id INT, effort VARCHAR(50), budget FLOAT);
CREATE TABLE budget (diversification_id INT, diversification_effort
VARCHAR(50), amount FLOAT);
example_title: Example3
BART (large-sized model), fine-tuned on synthetic_text_to_sql
Generate SQL from Natural Language question with a SQL context.
Model Details
Model Description
BART from facebook/bart-large-cnn is fintuned on gretelai/synthetic_text_to_sql dataset to generate SQL from NL and SQL context
- Model type: [BART]
- Language(s) (NLP): English
- License: openrail
- Finetuned from model facebook/bart-large-cnn
- Dataset: gretelai/synthetic_text_to_sql
Intended uses & limitations
Addressing the power of LLM in fintuned downstream task. Implemented as a personal Project.
How to use
Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("SwastikM/bart-large-nl2sql")
model = AutoModelForSeq2SeqLM.from_pretrained("SwastikM/bart-large-nl2sql")
query_question_with_context = "sql_prompt: Which economic diversification efforts in the 'diversification' table have a higher budget than the average budget for all economic diversification efforts in the 'budget' table? sql_context: CREATE TABLE diversification (id INT, effort VARCHAR(50), budget FLOAT); CREATE TABLE budget (diversification_id INT, diversification_effort VARCHAR(50), amount FLOAT);"
sql = model.generate(text)
print(sql)
Training Details
Training Data
gretelai/synthetic_text_to_sql
[More Information Needed]
Training Procedure
HuggingFace Accelerate with Training Loop.
Preprocessing
- Encoder Input: "sql_prompt: " + data['sql_prompt']+" sql_context: "+data['sql_context']
- Decoder Input: data['sql']
Training Hyperparameters
- Optimizer: AdamW
- lr: 2e-5
- decay: linear
- num_warmup_steps: 0
- batch_size: 8
- num_training_steps: 12500
Evaluation
Rouge Score
- Rouge1: 55.69
- Rouge2: 42.99
- RougeL: 51.43
- RougeLsum: 51.40
Hardware
-GPU: P100
Citation
BibTeX:
@software{gretel-synthetic-text-to-sql-2024, author = {Meyer, Yev and Emadi, Marjan and Nathawani, Dhruv and Ramaswamy, Lipika and Boyd, Kendrick and Van Segbroeck, Maarten and Grossman, Matthew and Mlocek, Piotr and Newberry, Drew}, title = {{Synthetic-Text-To-SQL}: A synthetic dataset for training language models to generate SQL queries from natural language prompts}, month = {April}, year = {2024}, url = {https://huggingface.co/datasets/gretelai/synthetic-text-to-sql} }
@article{DBLP:journals/corr/abs-1910-13461, author = {Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and Abdelrahman Mohamed and Omer Levy and Veselin Stoyanov and Luke Zettlemoyer}, title = {{BART:} Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension}, journal = {CoRR}, volume = {abs/1910.13461}, year = {2019}, url = {http://arxiv.org/abs/1910.13461}, eprinttype = {arXiv}, eprint = {1910.13461}, timestamp = {Thu, 31 Oct 2019 14:02:26 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-1910-13461.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Model Card Authors
Swastik Maiti