rohitsroch
commited on
Commit
•
ed8bed3
1
Parent(s):
790f0b6
Update README.md
Browse files
README.md
CHANGED
@@ -7,12 +7,14 @@ tags:
|
|
7 |
- Text2SQL
|
8 |
datasets:
|
9 |
- wikisql
|
|
|
|
|
10 |
---
|
11 |
|
12 |
## Paper
|
13 |
|
14 |
## [NatSight: A framework for building domain agnostic Natural Language Interface to Databases for next-gen Augmented Analytics](https://dcal.iimb.ac.in/baiconf2022/full_papers/2346.pdf)
|
15 |
-
|
16 |
|
17 |
## Abstract
|
18 |
|
@@ -29,8 +31,35 @@ Experiment results on benchmark datasets show that our approach achieves a state
|
|
29 |
|
30 |
## NatSight-t5-small-wikisql
|
31 |
|
32 |
-
For weights initialization, we used [t5-small](https://huggingface.co/t5-small)
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
## Intended uses & limitations
|
36 |
|
|
|
7 |
- Text2SQL
|
8 |
datasets:
|
9 |
- wikisql
|
10 |
+
widget:
|
11 |
+
- text: "translate English to Sql: What was the number of race that Kevin Curtain won? </s> c0 | number <eom> v4 | Kevin Curtain </s> c0 | No <eom> c1 | Date <eom> c2 | Round <eom> c3 | Circuit <eom> c4 | Pole_Position <eom> c5 | Fastest_Lap <eom> c6 | Race_winner <eom> c7 | Report"
|
12 |
---
|
13 |
|
14 |
## Paper
|
15 |
|
16 |
## [NatSight: A framework for building domain agnostic Natural Language Interface to Databases for next-gen Augmented Analytics](https://dcal.iimb.ac.in/baiconf2022/full_papers/2346.pdf)
|
17 |
+
Authors: *Rohit Sroch*, *Dhiraj Patnaik*, *Jayachandran Ramachandran*
|
18 |
|
19 |
## Abstract
|
20 |
|
|
|
31 |
|
32 |
## NatSight-t5-small-wikisql
|
33 |
|
34 |
+
For weights initialization, we used [t5-small](https://huggingface.co/t5-small) and fine-tune as sequence-to-sequence task.
|
35 |
|
36 |
+
## Using Transformers🤗
|
37 |
+
|
38 |
+
```python
|
39 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
40 |
+
|
41 |
+
tokenizer = AutoTokenizer.from_pretrained("course5i/NatSight-t5-small-wikisql")
|
42 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("course5i/NatSight-t5-small-wikisql")
|
43 |
+
|
44 |
+
# define input
|
45 |
+
prefix = "translate English to Sql: "
|
46 |
+
raw_nat_query = "What was the number of race that Kevin Curtain won?"
|
47 |
+
query_mention_schema = "c0 | number <eom> v4 | Kevin Curtain"
|
48 |
+
table_header_schema = "c0 | No <eom> c1 | Date <eom> c2 | Round <eom> c3 | Circuit <eom> c4 | Pole_Position <eom> c5 | Fastest_Lap <eom> c6 | Race_winner <eom> c7 | Report"
|
49 |
+
|
50 |
+
encoder_input = prefix + raw_nat_query + " </s> " + query_mention_schema + " </s> " + table_header_schema
|
51 |
+
input_ids = tokenizer.encode(encoder_input, return_tensors="pt", add_special_tokens=True)
|
52 |
+
|
53 |
+
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_length=128)
|
54 |
+
preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
|
55 |
+
output = preds[0]
|
56 |
+
|
57 |
+
print("Output generic SQL query: {}".format(output))
|
58 |
+
|
59 |
+
# output
|
60 |
+
"SELECT COUNT(c0) FROM TABLE WHERE c4 = v4"
|
61 |
+
|
62 |
+
```
|
63 |
|
64 |
## Intended uses & limitations
|
65 |
|