antonlabate commited on
Commit
2085b8a
1 Parent(s): 06ee52e
Files changed (2) hide show
  1. README.md +253 -0
  2. requirements.txt +15 -0
README.md ADDED
@@ -0,0 +1,253 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="left">
2
+ <br>
3
+ <img src="resdsql.png" width="700"/>
4
+ <br>
5
+ <p>
6
+
7
+ # RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL
8
+ This is the official implementation of the paper "RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL" (AAAI 2023).
9
+
10
+ If this repository could help you, please cite the following paper:
11
+ ```
12
+ @inproceedings{li2022resdsql,
13
+ author = {Haoyang Li and Jing Zhang and Cuiping Li and Hong Chen},
14
+ title = "RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL",
15
+ booktitle = "AAAI",
16
+ year = "2023"
17
+ }
18
+ ```
19
+
20
+ `Update (2023.3.13):` We evaluated our method on a diagnostic evaluation benchmark, [Dr.Spider](https://github.com/awslabs/diagnostic-robustness-text-to-sql), which contains 17 test sets to measure the robustness of Text-to-SQL parsers under different perturbation perspectives.
21
+
22
+ `Update (2023.5.19):` We added support for [CSpider](https://taolusi.github.io/CSpider-explorer/), a Chinese Text-to-SQL benchmark with Chinese questions, English database schema, and corresponding SQL queries.
23
+
24
+ `Update (2023.8.28):` Please check out our recent work [CodeS](https://github.com/RUCKBReasoning/codes), a series of Code LLMs (CodeS-1B, CodeS-3B, CodeS-7B, and CodeS-15B) specifically optimized for SQL generation. You can choose the model that best suits your computational resources and application needs to develop your Text-to-SQL parser!!
25
+
26
+ ## Overview
27
+ We introduce a new Text-to-SQL parser, **RESDSQL** (**R**anking-enhanced **E**ncoding plus a **S**keleton-aware **D**ecoding framework for Text-to-**SQL**), which attempts to decoulpe the schema linking and the skeleton parsing to reduce the difficuty of Text-to-SQL. More details can be found in our [paper](https://arxiv.org/abs/2302.05965). All experiments are conducted on a single NVIDIA A100 80G GPU.
28
+
29
+ ## Evaluation Results
30
+ We evaluate RESDSQL on six benchmarks: Spider, Spider-DK, Spider-Syn, Spider-Realistic, Dr.Spider, and CSpider. We adopt two metrics: Exact-set-Match accuracy (EM) and EXecution accuracy (EX). Let's look at the following numbers:
31
+
32
+ **On Spider:**
33
+
34
+ | Model | Dev EM | Dev EX | Test EM | Test EX |
35
+ |-------|--------|--------|---------|---------|
36
+ | RESDSQL-3B+NatSQL | **80.5%** | **84.1%** | **72.0%** | **79.9%** |
37
+ | RESDSQL-3B | 78.0% | 81.8% | - | - |
38
+ | RESDSQL-Large+NatSQL | 76.7% | 81.9% | - | - |
39
+ | RESDSQL-Large | 75.8% | 80.1% | - | - |
40
+ | RESDSQL-Base+NatSQL | 74.1% | 80.2% | - | - |
41
+ | RESDSQL-Base | 71.7% | 77.9% | - | - |
42
+
43
+ **On Spider-DK, Spider-Syn, and Spider-Realistic:**
44
+
45
+ | Model | DK EM | DK EX | Syn EM | Syn EX | Realistic EM | Realistic EX |
46
+ |-------|-------|-------|--------|--------|--------------|--------------|
47
+ | RESDSQL-3B+NatSQL| 53.3% | 66.0% | 69.1% | 76.9% | 77.4% | 81.9% |
48
+
49
+ **On Dr.Spider's perturbation sets:**
50
+ Following Dr.Spider, we only report **EX** for each post-perturbation set and choose PICARD and CodeX as our baseline methods.
51
+
52
+ | Perturbation set | PICARD | CodeX | RESDSQL-3B | RESDSQL-3B+NatSQL |
53
+ |------------------|--------|-------|-------------------|-----|
54
+ | DB-Schema-synonym | 56.5% | 62.0% | 63.3% | **68.3%** |
55
+ | DB-Schema-abbreviation | 64.7% | 68.6% | 64.5% | **70.0%** |
56
+ | DB-DBcontent-equivalence | 43.7% | **51.6%** | 40.3% | 40.1% |
57
+ | NLQ-Keyword-synonym | 66.3% | 55.5% | 67.5% | **72.4%** |
58
+ | NLQ-Keyword-carrier | 82.7% | 85.2% | **86.7%** | 83.5% |
59
+ | NLQ-Column-synonym | 57.2% | 54.7% | 57.4% | **63.1%** |
60
+ | NLQ-Column-carrier | 64.9% | 51.1% | **69.9%** | 63.9% |
61
+ | NLQ-Column-attribute | 56.3% | 46.2% | 58.8% | **71.4%** |
62
+ | NLQ-Column-value | 69.4% | 71.4% | 73.4% | **76.6%** |
63
+ | NLQ-Value-synonym | 53.0% | **59.9%** | 53.8% | 53.2% |
64
+ | NLQ-Multitype | 57.1% | 53.7% | 60.1% | **60.7%** |
65
+ | NLQ-Others | 78.3% | 69.7% | 77.3% | **79.0%** |
66
+ | SQL-Comparison | 68.0% | 66.9% | 70.2% | **82.0%** |
67
+ | SQL-Sort-order | 74.5% | 57.8% | 79.7% | **85.4%** |
68
+ | SQL-NonDB-number | 77.1% | **89.3%** | 83.2% | 85.5% |
69
+ | SQL-DB-text | 65.1% | 72.4% | 67.8% | **74.3%** |
70
+ | SQL-DB-number | 85.1% | 79.3% | 85.4% | **88.8%** |
71
+ | Average | 65.9% | 64.4% | 68.2% | **71.7%** |
72
+
73
+ Notice: We also employed the modified test suite script (see this [issue](https://github.com/awslabs/diagnostic-robustness-text-to-sql/issues/1)) to evaluate the model-generated results, but obtained the same numbers as above. Nevertheless, we suggest that further work should use their modified script to evaluate Dr.Spider.
74
+
75
+ **On CSpider's development set:**
76
+ | Model | EM | EXEC |
77
+ | ----- | -- | ---- |
78
+ | RESDSQL-3B+NatSQL | **66.3%** | **81.1%** |
79
+ | RESDSQL-Large+NatSQL | 64.3% | 81.1% |
80
+ | LGESQL + GTL + Electra + QT | 64.0% | - |
81
+ | LGESQL + ELECTRA + QT | 64.5% | - |
82
+ | RESDSQL-Base+NatSQL | 61.7% | 78.1% |
83
+
84
+
85
+ ## Prerequisites
86
+ Create a virtual anaconda environment:
87
+ ```sh
88
+ conda create -n your_env_name python=3.8.5
89
+ ```
90
+ Active it and install the cuda version Pytorch:
91
+ ```sh
92
+ conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
93
+ ```
94
+ Install other required modules and tools:
95
+ ```sh
96
+ pip install -r requirements.txt
97
+ pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz
98
+ python nltk_downloader.py
99
+ ```
100
+ Create several folders:
101
+ ```sh
102
+ mkdir eval_results
103
+ mkdir models
104
+ mkdir tensorboard_log
105
+ mkdir third_party
106
+ mkdir predictions
107
+ ```
108
+ Clone evaluation scripts:
109
+ ```sh
110
+ cd third_party
111
+ git clone https://github.com/ElementAI/spider.git
112
+ git clone https://github.com/ElementAI/test-suite-sql-eval.git
113
+ mv ./test-suite-sql-eval ./test_suite
114
+ cd ..
115
+ ```
116
+
117
+ ## Prepare data
118
+ Download [data](https://drive.google.com/file/d/19tsgBGAxpagULSl9r85IFKIZb4kyBGGu/view?usp=sharing) **(including Spider, Spider-DK, Spider-Syn, Spider-Realistic, Dr.Spider, and CSpider)** and [database](https://drive.google.com/file/d/1s4ItreFlTa8rUdzwVRmUR2Q9AHnxbNjo/view?usp=share_link) and then unzip them:
119
+ ```sh
120
+ unzip data.zip
121
+ unzip database.zip
122
+ ```
123
+ Notice: Dr.Spider has been preprocessed following the instructions on its Github page.
124
+
125
+ ## Inference
126
+ All evaluation results can be easily reproduced through our released scripts and checkpionts.
127
+ ### Step1: Prepare Checkpoints
128
+ Because RESDSQL is a two-stage algorithm, therefore, you should first download cross-encoder checkpoints. Here are links:
129
+ | Cross-encoder Checkpoints | Google Drive | Baidu Netdisk |
130
+ |----------|-----------|--------------|
131
+ | text2natsql_schema_item_classifier| [Link](https://drive.google.com/file/d/1UWNj1ZADfKa1G5I4gBYCJeEQO6piMg4G/view?usp=share_link) | [Link](https://pan.baidu.com/s/15eGPMSx7K8oLV4hkjnCzaw) (pwd: 18w8) |
132
+ | text2sql_schema_item_classifier | [Link](https://drive.google.com/file/d/1zHAhECq1uGPR9Rt1EDsTai1LbRx0jYIo/view?usp=share_link) | [Link](https://pan.baidu.com/s/1trSi8OBOcPo5NkZb_o-T4g) (pwd: dr62) |
133
+ | xlm_roberta_text2natsql_schema_item_classifier (trained on CSpider) | - | [Link](https://pan.baidu.com/s/1oTkuqoU-RWr3QsNC9Y-RpA) (pwd: 3sdu) |
134
+
135
+ Then, you should download T5 (for Spider) or mT5 (for CSpider) checkpoints:
136
+ | T5/mT5 Checkpoints | Google Drive/OneDrive | Baidu Netdisk |
137
+ |-------|-------|-------|
138
+ | text2natsql-t5-3b | [OneDrive link](https://1drv.ms/u/s!Ak05bBUBFYiktcdziiE79xaeKtO6qg?e=e9424n) | [Link](https://pan.baidu.com/s/1ReHso0QgX5aT-hGySUnlrQ) (pwd: 4r98) |
139
+ | text2sql-t5-3b | [Google Drive link](https://drive.google.com/file/d/1M-zVeB6TKrvcIzaH8vHBIKeWqPn95i11/view?usp=sharing) | [Link](https://pan.baidu.com/s/1mZxakfes4wRSEwnRW43i5A) (pwd: sc62) |
140
+ | text2natsql-t5-large | [Google Drive link](https://drive.google.com/file/d/1ZwFsH24_qKC3xwYdedPi6T_8argguWHe/view?usp=sharing) | [Link](https://pan.baidu.com/s/18H8lgnv9gfXmUo_oO_CdOA) (pwd: 7iyq) |
141
+ | text2sql-t5-large | [Google Drive link](https://drive.google.com/file/d/1-xwtKwfJZSrmJrU-_Xdkx1kPuZao7r7e/view?usp=sharing) | [Link](https://pan.baidu.com/s/1Mwg0OZZ48APEq9jPvQQNtw) (pwd: q58k) |
142
+ | text2natsql-t5-base | [Google Drive link](https://drive.google.com/file/d/1QyfSfHHrxfIM5X9gKUYNr_0ZRVvb1suV/view?usp=share_link) | [Link](https://pan.baidu.com/s/1XegaZFvXuZ_jf3P-9YPQCQ) (pwd: pyxf) |
143
+ | text2sql-t5-base | [Google Drive link](https://drive.google.com/file/d/1lqZ81f_fSZtg6BRcRw1-Ol-RJCcKRsmH/view?usp=sharing) | [Link](https://pan.baidu.com/s/1-6H7zStq0WCJHTjDuVspoQ) (pwd: wuek) |
144
+ | text2natsql-mt5-xl-cspider (trained on CSpider) | - | [Link](https://pan.baidu.com/s/1tFkGOiw5ETB83-Ct3MuVXA) (pwd: y7ei) |
145
+ | text2natsql-mt5-large-cspider (trained on CSpider) | - | [Link](https://pan.baidu.com/s/1LUjL-2nwNfUJhzI3cm7aEQ) (pwd: ydqk) |
146
+ | text2natsql-mt5-base-cspider (trained on CSpider) | - | [Link](https://pan.baidu.com/s/1tbEUIBPUTA2Oz7K2lHT9oA) (pwd: d8b8) |
147
+
148
+ The checkpoints should be placed in the `models` folder.
149
+
150
+ For CSpider, we only provide the NatSQL version because its performance is better than SQL in our pre-experiments. To support CSpider, we replace roberta-large with xlm-roberta-large in the first stage and replace t5 with mt5 in the second stage.
151
+
152
+ ### Step2: Run Inference
153
+ The inference scripts are located in `scripts/inference`.
154
+ Concretely, `infer_text2natsql.sh` is the inference script of RESDSQL-{Base, Large, 3B}+NatSQL, and `infer_text2sql.sh` is the inference script of RESDSQL-{Base, Large, 3B}. For example, you can run the inference of RESDSQL-3B+NatSQL on Spider's dev set via:
155
+ ```sh
156
+ sh scripts/inference/infer_text2natsql.sh 3b spider
157
+ ```
158
+ The first argument (model scale) can be selected from `[base, large, 3b]` and the second argument (dataset name) can be selected from `[spider, spider-realistic, spider-syn, spider-dk, DB_schema_synonym, DB_schema_abbreviation, DB_DBcontent_equivalence, NLQ_keyword_synonym, NLQ_keyword_carrier, NLQ_column_synonym, NLQ_column_carrier, NLQ_column_attribute, NLQ_column_value, NLQ_value_synonym, NLQ_multitype, NLQ_others, SQL_comparison, SQL_sort_order, SQL_NonDB_number, SQL_DB_text, SQL_DB_number]`.
159
+
160
+ The predicted SQL queries are recorded in `predictions/{dataset_name}/{model_name}/pred.sql`.
161
+
162
+ **Inference on CSpider's Dev Set (New Feature)**
163
+ We also provide inference scripts to run RESDSQL-{Base, Large, 3B}+NatSQL on CSpider's development set. Here is an example:
164
+ ```sh
165
+ sh scripts/inference/infer_text2natsql_cspider.sh 3b
166
+ ```
167
+ The first argument (model scale) can be selected from `[base, large, 3b]`.
168
+
169
+ ## Training on Spider
170
+ We provide scripts in `scripts/train/text2natsql` and `scripts/train/text2sql` to train RESDSQL on Spider's training set and evaluate on Spider's dev set.
171
+
172
+ **RESDSQL-{Base, Large, 3B}+NatSQL**
173
+ ```sh
174
+ # Step1: preprocess dataset
175
+ sh scripts/train/text2natsql/preprocess.sh
176
+ # Step2: train cross-encoder
177
+ sh scripts/train/text2natsql/train_text2natsql_schema_item_classifier.sh
178
+ # Step3: prepare text-to-natsql training and development set for T5
179
+ sh scripts/train/text2natsql/generate_text2natsql_dataset.sh
180
+ # Step4: fine-tune T5-3B (RESDSQL-3B+NatSQL)
181
+ sh scripts/train/text2natsql/train_text2natsql_t5_3b.sh
182
+ # Step4: (or) fine-tune T5-Large (RESDSQL-Large+NatSQL)
183
+ sh scripts/train/text2natsql/train_text2natsql_t5_large.sh
184
+ # Step4: (or) fine-tune T5-Base (RESDSQL-Base+NatSQL)
185
+ sh scripts/train/text2natsql/train_text2natsql_t5_base.sh
186
+ ```
187
+
188
+ **RESDSQL-{Base, Large, 3B}**
189
+ ```sh
190
+ # Step1: preprocess dataset
191
+ sh scripts/train/text2sql/preprocess.sh
192
+ # Step2: train cross-encoder
193
+ sh scripts/train/text2sql/train_text2sql_schema_item_classifier.sh
194
+ # Step3: prepare text-to-sql training and development set for T5
195
+ sh scripts/train/text2sql/generate_text2sql_dataset.sh
196
+ # Step4: fine-tune T5-3B (RESDSQL-3B)
197
+ sh scripts/train/text2sql/train_text2sql_t5_3b.sh
198
+ # Step4: (or) fine-tune T5-Large (RESDSQL-Large)
199
+ sh scripts/train/text2sql/train_text2sql_t5_large.sh
200
+ # Step4: (or) fine-tune T5-Base (RESDSQL-Base)
201
+ sh scripts/train/text2sql/train_text2sql_t5_base.sh
202
+ ```
203
+
204
+ **During training, the cross-encoder (i.e., the first stage) always keeps the best checkpoint, but T5 (i.e., the second stage) keeps all the intermediate checkpoints, because different test sets may achieve the best Text-to-SQL performance on different checkpoints**. Therefore, given a test set, we need to evaluate all the intermediate checkpoints and compare their performance to find the best checkpoint. The evaluation results of checkpoints are saved in `eval_results`.
205
+
206
+ Our paper also report the performence of RESDSQL-3B+NatSQL (the most powerful version of RESDSQL) on Spider-DK, Spider-Syn, and Spider-Realistic. To obtain results on these datasets, we provide evaluation scripts in `scripts/evaluate_robustness`. Here is an example for Spider-DK:
207
+ ```sh
208
+ # Step1: preprocess Spider-DK
209
+ sh scripts/evaluate_robustness/preprocess_spider_dk.sh
210
+ # Step2: Run evaluation on Spider-DK
211
+ sh scripts/evaluate_robustness/evaluate_on_spider_dk.sh
212
+ ```
213
+
214
+ ## Training on CSpider
215
+ We additionally provide scripts in `scripts/train/cspider_text2natsql` and `scripts/train/cspider_text2sql` to train RESDSQL on CSpider's training set and evaluate on CSpider's dev set.
216
+
217
+ **RESDSQL-{Base, Large, 3B}+NatSQL (CSpider version)**
218
+ ```sh
219
+ # Step1: preprocess CSpider
220
+ sh scripts/train/cspider_text2natsql/preprocess.sh
221
+ # Step2: train cross-encoder
222
+ sh scripts/train/cspider_text2natsql/train_text2natsql_schema_item_classifier.sh
223
+ # Step3: prepare text-to-natsql training and development set for mT5
224
+ sh scripts/train/cspider_text2natsql/generate_text2natsql_dataset.sh
225
+ # Step4: fine-tune mT5-XL (RESDSQL-3B+NatSQL)
226
+ sh scripts/train/cspider_text2natsql/train_text2natsql_mt5_xl.sh
227
+ # Step4: (or) fine-tune mT5-Large (RESDSQL-Large+NatSQL)
228
+ sh scripts/train/cspider_text2natsql/train_text2natsql_mt5_large.sh
229
+ # Step4: (or) fine-tune mT5-Base (RESDSQL-Base+NatSQL)
230
+ sh scripts/train/cspider_text2natsql/train_text2natsql_mt5_base.sh
231
+ ```
232
+
233
+ In order to train the NatSQL version on CSpider, we manually aligned and modified annotations of NatSQL. The aligned files are also released, see `NatSQL/NatSQLv1_6/train_cspider-natsql.json` and `NatSQL/NatSQLv1_6/dev_cspider-natsql.json`.
234
+
235
+ **RESDSQL-{Base, Large, 3B} (CSpider version)**
236
+ ```sh
237
+ # Step1: preprocess CSpider
238
+ sh scripts/train/cspider_text2sql/preprocess.sh
239
+ # Step2: train cross-encoder
240
+ sh scripts/train/cspider_text2sql/train_text2sql_schema_item_classifier.sh
241
+ # Step3: prepare text-to-sql training and development set for mT5
242
+ sh scripts/train/cspider_text2sql/generate_text2sql_dataset.sh
243
+ # Step4: fine-tune mT5-XL (RESDSQL-3B)
244
+ sh scripts/train/cspider_text2sql/train_text2sql_mt5_xl.sh
245
+ # Step4: (or) fine-tune mT5-Large (RESDSQL-Large)
246
+ sh scripts/train/cspider_text2sql/train_text2sql_mt5_large.sh
247
+ # Step4: (or) fine-tune mT5-Base (RESDSQL-Base)
248
+ sh scripts/train/cspider_text2sql/train_text2sql_mt5_base.sh
249
+ ```
250
+
251
+
252
+ ## Acknowledgements
253
+ We would thanks to Hongjin Su and Tao Yu for their help in evaluating our method on Spider's test set. We would also thanks to PICARD ([paper](https://arxiv.org/abs/2109.05093), [code](https://github.com/ServiceNow/picard)), NatSQL ([paper](https://arxiv.org/abs/2109.05153), [code](https://github.com/ygan/NatSQL)), Spider ([paper](https://arxiv.org/abs/1809.08887), [dataset](https://yale-lily.github.io/spider)), Spider-DK ([paper](https://arxiv.org/abs/2109.05157), [dataset](https://github.com/ygan/Spider-DK)), Spider-Syn ([paper](https://arxiv.org/abs/2106.01065), [dataset](https://github.com/ygan/Spider-Syn)), Spider-Realistic ([paper](https://arxiv.org/abs/2010.12773), [dataset](https://doi.org/10.5281/zenodo.5205322)), Dr.Spider ([paper](https://openreview.net/pdf?id=Wc5bmZZU9cy), [dataset](https://github.com/awslabs/diagnostic-robustness-text-to-sql)), and CSpider ([paper](https://arxiv.org/abs/1909.13293), [dataset](https://taolusi.github.io/CSpider-explorer/)) for their interesting work and open-sourced code and dataset.
requirements.txt ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ editdistance==0.6.2
2
+ protobuf==3.19.0
3
+ func_timeout==4.3.5
4
+ nltk==3.7
5
+ numpy==1.22.3
6
+ rapidfuzz==2.0.11
7
+ scikit_learn==1.2.1
8
+ spacy==2.2.3
9
+ sql_metadata==2.6.0
10
+ sqlparse==0.4.2
11
+ tokenizers==0.11.6
12
+ tqdm==4.63.0
13
+ transformers==4.28.1
14
+ tensorboard==2.8.0
15
+ sentencepiece==0.1.99