Model Card for SQL-LLaMA 2

SQL-LLaMA is a Text-2-SQL model based on LLaMA-2 [Ref. 1] for instruction-based generation of SQL code from natural language queries. The corresponding code, traing details and statistical analysis of the dataset etc. can be found on Github here

Model Details

Instruction-finetuning follows the method proposed in Ref. [5] and the "-small" models follow the ideas proposed in the LIMA-paper [Ref. 6] , showing excellent performance despite only using a dataset of 1.4K SQL instructions (see more details here). This project is unique in the sense that, in addition, it has been trained on only 1(!) single A100 40G GPU as well as 256GB RAM using Deepspeed ZeRO-3 offloading [Refs. 2,3 & 4].

Model Sources

How to Get Started with the Model

Please use the code provided in the GitHub repository to get started with the model.

Training Data

Curated training data can be found here on HF-Datasets: https://huggingface.co/datasets/DominikLindorfer/SQL-LLaMA

Please note that the respective models have been trained with sql_create_dataset_cleaned.json or sql_create_dataset_small.json as described here using Refs. [7, 8 and 9].

Compute Infrastructure

1 A100 40GB GPU and 256GB of RAM :)


Dataset used to train DominikLindorfer/SQL-LLaMA