SQL-LLaMA / README.md
DominikLindorfer's picture
Update README.md
f4f97ff
|
raw
history blame
3.83 kB
---
datasets:
- DominikLindorfer/SQL-LLaMA
language:
- en
---
# Model Card for SQL-LLaMA 2
SQL-LLaMA is a Text-2-SQL model based on LLaMA-2 [Ref. 1] for instruction-based generation of SQL code from natural language queries.
The corresponding code, traing details and statistical analysis of the dataset etc. can be [found on Github here](www.github.com/DominikLindorfer/SQL-LLaMA)
## Model Details
Instruction-finetuning follows the method proposed in Ref. [5] and the "-small" models follow the ideas proposed in the LIMA-paper [Ref. 6]
, showing excellent performance despite only using a dataset of 1.4K SQL instructions (see more details [here](www.github.com/DominikLindorfer/SQL-LLaMA)).
This project is unique in the sense that, in addition, it has been trained on only 1(!) single A100 40G GPU as well as 256GB RAM using Deepspeed ZeRO-3 offloading [Refs. 2,3 & 4].
### Model Sources
- **Repository:** www.github.com/DominikLindorfer/SQL-LLaMA
- **Finetuned from:** https://huggingface.co/meta-llama
- **Paper :** {{ paper | default("[More Information Needed]", true)}}
## How to Get Started with the Model
Please use the code provided in the [GitHub repository](www.github.com/DominikLindorfer/SQL-LLaMA) to get started with the model.
## Training Data
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
Curated training data can be found here on HF-Datasets: https://huggingface.co/datasets/DominikLindorfer/SQL-LLaMA
Please note that the respective models have been trained with *sql_create_dataset_cleaned.json* or *sql_create_dataset_small.json* as described [here](https://github.com/DominikLindorfer/SQL-LLaMA#data-release) using Refs. [7, 8 and 9].
## Compute Infrastructure
1 A100 40GB GPU and 256GB of RAM :)
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
[1]: Llama 2: Open Foundation and Fine-Tuned Chat Models. Hugo Touvron et.al. [https://arxiv.org/abs/2302.13971v1](https://arxiv.org/abs/2307.09288)
[2]: ZeRO-Offload: Democratizing Billion-Scale Model Training. Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He. https://arxiv.org/abs/2101.06840
[3]: ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He. https://arxiv.org/abs/2104.07857
[4]: ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He. https://arxiv.org/abs/1910.02054
[5]: Stanford Alpaca: An Instruction-following LLaMA model. Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto. https://github.com/tatsu-lab/stanford_alpaca
[6]: LIMA: Less Is More for Alignment. Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy. https://arxiv.org/abs/2305.11206
[7]: Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, Dragomir Radev. https://arxiv.org/abs/1809.08887
[8]: b-mc2's SQL_Create_Context Dataset on Huggingface. https://huggingface.co/datasets/b-mc2/sql-create-context
[9]: Toby Mao's SQLGlot - SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine. https://github.com/tobymao/sqlglot