DominikLindorfer commited on
Commit
f4f97ff
·
1 Parent(s): 4115c22

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -1
README.md CHANGED
@@ -3,4 +3,63 @@ datasets:
3
  - DominikLindorfer/SQL-LLaMA
4
  language:
5
  - en
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - DominikLindorfer/SQL-LLaMA
4
  language:
5
  - en
6
+ ---
7
+
8
+ # Model Card for SQL-LLaMA 2
9
+
10
+ SQL-LLaMA is a Text-2-SQL model based on LLaMA-2 [Ref. 1] for instruction-based generation of SQL code from natural language queries.
11
+ The corresponding code, traing details and statistical analysis of the dataset etc. can be [found on Github here](www.github.com/DominikLindorfer/SQL-LLaMA)
12
+
13
+
14
+ ## Model Details
15
+
16
+ Instruction-finetuning follows the method proposed in Ref. [5] and the "-small" models follow the ideas proposed in the LIMA-paper [Ref. 6]
17
+ , showing excellent performance despite only using a dataset of 1.4K SQL instructions (see more details [here](www.github.com/DominikLindorfer/SQL-LLaMA)).
18
+ This project is unique in the sense that, in addition, it has been trained on only 1(!) single A100 40G GPU as well as 256GB RAM using Deepspeed ZeRO-3 offloading [Refs. 2,3 & 4].
19
+
20
+
21
+ ### Model Sources
22
+
23
+ - **Repository:** www.github.com/DominikLindorfer/SQL-LLaMA
24
+ - **Finetuned from:** https://huggingface.co/meta-llama
25
+ - **Paper :** {{ paper | default("[More Information Needed]", true)}}
26
+
27
+ ## How to Get Started with the Model
28
+
29
+ Please use the code provided in the [GitHub repository](www.github.com/DominikLindorfer/SQL-LLaMA) to get started with the model.
30
+
31
+ ## Training Data
32
+
33
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
34
+
35
+ Curated training data can be found here on HF-Datasets: https://huggingface.co/datasets/DominikLindorfer/SQL-LLaMA
36
+
37
+ Please note that the respective models have been trained with *sql_create_dataset_cleaned.json* or *sql_create_dataset_small.json* as described [here](https://github.com/DominikLindorfer/SQL-LLaMA#data-release) using Refs. [7, 8 and 9].
38
+
39
+ ## Compute Infrastructure
40
+
41
+ 1 A100 40GB GPU and 256GB of RAM :)
42
+
43
+ ## Citation
44
+
45
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
46
+
47
+ [1]: Llama 2: Open Foundation and Fine-Tuned Chat Models. Hugo Touvron et.al. [https://arxiv.org/abs/2302.13971v1](https://arxiv.org/abs/2307.09288)
48
+
49
+ [2]: ZeRO-Offload: Democratizing Billion-Scale Model Training. Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He. https://arxiv.org/abs/2101.06840
50
+
51
+ [3]: ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He. https://arxiv.org/abs/2104.07857
52
+
53
+ [4]: ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He. https://arxiv.org/abs/1910.02054
54
+
55
+ [5]: Stanford Alpaca: An Instruction-following LLaMA model. Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto. https://github.com/tatsu-lab/stanford_alpaca
56
+
57
+ [6]: LIMA: Less Is More for Alignment. Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy. https://arxiv.org/abs/2305.11206
58
+
59
+ [7]: Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, Dragomir Radev. https://arxiv.org/abs/1809.08887
60
+
61
+ [8]: b-mc2's SQL_Create_Context Dataset on Huggingface. https://huggingface.co/datasets/b-mc2/sql-create-context
62
+
63
+ [9]: Toby Mao's SQLGlot - SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine. https://github.com/tobymao/sqlglot
64
+
65
+