rohitsroch commited on
Commit
ed8bed3
1 Parent(s): 790f0b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -2
README.md CHANGED
@@ -7,12 +7,14 @@ tags:
7
  - Text2SQL
8
  datasets:
9
  - wikisql
 
 
10
  ---
11
 
12
  ## Paper
13
 
14
  ## [NatSight: A framework for building domain agnostic Natural Language Interface to Databases for next-gen Augmented Analytics](https://dcal.iimb.ac.in/baiconf2022/full_papers/2346.pdf)
15
- Aurthors: *Rohit Sroch*, *Dhiraj Patnaik*, *Jayachandran Ramachandran*
16
 
17
  ## Abstract
18
 
@@ -29,8 +31,35 @@ Experiment results on benchmark datasets show that our approach achieves a state
29
 
30
  ## NatSight-t5-small-wikisql
31
 
32
- For weights initialization, we used [t5-small](https://huggingface.co/t5-small)
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## Intended uses & limitations
36
 
 
7
  - Text2SQL
8
  datasets:
9
  - wikisql
10
+ widget:
11
+ - text: "translate English to Sql: What was the number of race that Kevin Curtain won? </s> c0 | number <eom> v4 | Kevin Curtain </s> c0 | No <eom> c1 | Date <eom> c2 | Round <eom> c3 | Circuit <eom> c4 | Pole_Position <eom> c5 | Fastest_Lap <eom> c6 | Race_winner <eom> c7 | Report"
12
  ---
13
 
14
  ## Paper
15
 
16
  ## [NatSight: A framework for building domain agnostic Natural Language Interface to Databases for next-gen Augmented Analytics](https://dcal.iimb.ac.in/baiconf2022/full_papers/2346.pdf)
17
+ Authors: *Rohit Sroch*, *Dhiraj Patnaik*, *Jayachandran Ramachandran*
18
 
19
  ## Abstract
20
 
 
31
 
32
  ## NatSight-t5-small-wikisql
33
 
34
+ For weights initialization, we used [t5-small](https://huggingface.co/t5-small) and fine-tune as sequence-to-sequence task.
35
 
36
+ ## Using Transformers🤗
37
+
38
+ ```python
39
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
40
+
41
+ tokenizer = AutoTokenizer.from_pretrained("course5i/NatSight-t5-small-wikisql")
42
+ model = AutoModelForSeq2SeqLM.from_pretrained("course5i/NatSight-t5-small-wikisql")
43
+
44
+ # define input
45
+ prefix = "translate English to Sql: "
46
+ raw_nat_query = "What was the number of race that Kevin Curtain won?"
47
+ query_mention_schema = "c0 | number <eom> v4 | Kevin Curtain"
48
+ table_header_schema = "c0 | No <eom> c1 | Date <eom> c2 | Round <eom> c3 | Circuit <eom> c4 | Pole_Position <eom> c5 | Fastest_Lap <eom> c6 | Race_winner <eom> c7 | Report"
49
+
50
+ encoder_input = prefix + raw_nat_query + " </s> " + query_mention_schema + " </s> " + table_header_schema
51
+ input_ids = tokenizer.encode(encoder_input, return_tensors="pt", add_special_tokens=True)
52
+
53
+ generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_length=128)
54
+ preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
55
+ output = preds[0]
56
+
57
+ print("Output generic SQL query: {}".format(output))
58
+
59
+ # output
60
+ "SELECT COUNT(c0) FROM TABLE WHERE c4 = v4"
61
+
62
+ ```
63
 
64
  ## Intended uses & limitations
65