tscholak commited on
Commit
a2d3ae3
1 Parent(s): e1673e3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ thumbnail: "https://repository-images.githubusercontent.com/401779782/c2f46be5-b74b-4620-ad64-57487be3b1ab"
5
+ tags:
6
+ - text-to-SQL
7
+ license: "Apache 2.0"
8
+ datasets:
9
+ - spider
10
+ metrics:
11
+ - spider
12
+ ---
13
+
14
+ ## tscholak/3vnuv1vf
15
+
16
+ Fine-tuned weights for [PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models](https://arxiv.org/abs/2109.05093) based on [t5.1.1.lm100k.large](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k).
17
+
18
+
19
+ ### Training Data
20
+
21
+ The model has been fine-tuned on the 7000 training examples in the [Spider text-to-SQL dataset](https://yale-lily.github.io/spider). The model solves Spider's zero-shot text-to-SQL translation task, and that means that it can generalize to unseen SQL databases.
22
+
23
+
24
+ ### Training Objective
25
+
26
+ This model was initialized with [t5.1.1.lm100k.large](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) and fine-tuned with the text-to-text generation objective.
27
+
28
+ Questions are always grounded in a database schema, and the model is trained to predict the SQL query that would be used to answer the question. The input to the model is composed of the user's natural language question, the database identifier, and a list of tables and their columns:
29
+
30
+ ```
31
+ [question] | [db_id] | [table] : [column] ( [content] , [content] ) , [column] ( ... ) , [...] | [table] : ... | ...
32
+ ```
33
+
34
+ The model outputs the database identifier and the SQL query that will be executed on the database to answer the user's question:
35
+
36
+ ```
37
+ [db_id] | [sql]
38
+ ```
39
+
40
+ Click [here](https://huggingface.co/tscholak/3vnuv1vf?text=How+many+singers+do+we+have%3F+%7C+concert_singer+%7C+stadium+%3A+stadium_id%2C+location%2C+name%2C+capacity%2C+highest%2C+lowest%2C+average+%7C+singer+%3A+singer_id%2C+name%2C+country%2C+song_name%2C+song_release_year%2C+age%2C+is_male+%7C+concert+%3A+concert_id%2C+concert_name%2C+theme%2C+stadium_id%2C+year+%7C+singer_in_concert+%3A+concert_id%2C+singer_id) for an example of a query that the model can generate.
41
+
42
+
43
+ ### Performance
44
+
45
+ Out of the box, this model achieves 71.2 % exact-set match accuracy and 74.4 % execution accuracy on the Spider development set.
46
+
47
+ Using the PICARD constrained decoding method (see [the official PICARD implementation](https://github.com/ElementAI/picard)), the model's performance can be improved to **74.8 %** exact-set match accuracy and **79.2 %** execution accuracy on the Spider development set.
48
+
49
+
50
+ ### Usage
51
+
52
+ Please see [the official repository](https://github.com/ElementAI/picard) for scripts and docker images that support evaluation and serving of this model.
53
+
54
+
55
+ ### References
56
+
57
+ 1. [PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models](https://arxiv.org/abs/2109.05093)
58
+
59
+ 2. [Official PICARD code](https://github.com/ElementAI/picard)
60
+
61
+
62
+ ### Citation
63
+
64
+ ```bibtex
65
+ @inproceedings{Scholak2021:PICARD,
66
+ author = {Torsten Scholak and Nathan Schucher and Dzmitry Bahdanau},
67
+ title = "{PICARD}: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models",
68
+ booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
69
+ month = nov,
70
+ year = "2021",
71
+ publisher = "Association for Computational Linguistics",
72
+ url = "https://aclanthology.org/2021.emnlp-main.779",
73
+ pages = "9895--9901",
74
+ }
75
+ ```