Model Card for ThotaBhanu/t5_sql_askdb
Model Details
Model Description
This model is a T5-based Natural Language to SQL converter, fine-tuned on the WikiSQL dataset. It is designed to convert English natural language queries into SQL queries that can be executed on relational databases.
- Developed by: Bhanu Prasad Thota
- Shared by: Bhanu Prasad Thota
- Model type: T5-based Sequence-to-Sequence Model
- Language(s): English
- License: MIT
- Finetuned from model:
t5-large
This model is particularly useful for text-to-SQL applications, allowing users to query databases using plain English instead of writing SQL.
Model Sources
- Repository: https://huggingface.co/ThotaBhanu/t5_sql_askdb
- Paper [optional]: N/A
- Demo [optional]: Coming soon
Uses
Direct Use
- Convert natural language questions into SQL queries
- Assist in database query automation
- Can be used in chatbots, data analytics tools, and enterprise database search systems
Downstream Use
- Can be fine-tuned further on custom datasets to improve domain-specific SQL generation
- Can be integrated into business intelligence tools for better user interaction
Out-of-Scope Use
- The model does not infer database schema automatically
- May generate incorrect SQL for complex nested queries or multi-table joins
- Not suitable for non-relational (NoSQL) databases
Bias, Risks, and Limitations
- The model may not always generate valid SQL for custom database schemas
- Assumes consistent column naming, which may not always be the case in enterprise databases
- Performance depends on how well the input query aligns with the training data format
Recommendations
- Always validate generated SQL before executing on a live database
- Use schema-aware validation methods for production environments
- Consider fine-tuning the model on domain-specific SQL queries
How to Get Started with the Model
Use the code below to generate SQL queries from natural language:
from transformers import T5Tokenizer, T5ForConditionalGeneration
# Load model and tokenizer
model_name = "ThotaBhanu/t5_sql_askdb"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
# Function to convert query to SQL
def generate_sql(query):
input_text = f"Convert to SQL: {query}"
inputs = tokenizer(input_text, return_tensors="pt")
output = model.generate(**inputs)
return tokenizer.decode(output[0], skip_special_tokens=True)
# Example usage
query = "Find all employees who joined in 2020"
sql_query = generate_sql(query)
print(f"๐ Query: {query}")
print(f"๐ Generated SQL: {sql_query}")
## Training Details
### Training Data
Dataset: WikiSQL
Size: 80,654 pairs of natural language questions and SQL queries
Preprocessing: Tokenization using T5Tokenizer, max length 128
### Training Procedure
Training framework: Hugging Face Transformers + PyTorch
Hardware used: NVIDIA V100 GPU
Optimizer: AdamW
Learning rate: 5e-5
Batch size: 8
Epochs: 5
#### Training Hyperparameters
Training precision: Mixed precision (fp16)
Gradient accumulation: Yes (to handle large batch sizes)
#### Speeds, Sizes, Times [optional]
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
[More Information Needed]
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
[More Information Needed]
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
[More Information Needed]
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[More Information Needed]
### Results
[More Information Needed]
#### Summary
## Model Examination [optional]
<!-- Relevant interpretability work for the model goes here -->
[More Information Needed]
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
## Technical Specifications [optional]
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
[More Information Needed]
#### Hardware
[More Information Needed]
#### Software
[More Information Needed]
## Citation [optional]
@misc{t5_sql_askdb,
author = {Bhanu Prasad Thota},
title = {T5-SQL AskDB Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ThotaBhanu/t5_sql_askdb}}
}
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Model Card Authors [optional]
[More Information Needed]
## Model Card Contact
[More Information Needed]
- Downloads last month
- 31
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.