nielsr HF staff commited on
Commit
f0b3100
·
1 Parent(s): 75a9410

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - tapex
5
+ - table-question-answering
6
+ license: apache-2.0
7
+ datasets:
8
+ - wikisql
9
+ inference: false
10
+ ---
11
+
12
+ TAPEX-large model fine-tuned on WikiSQL. This model was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. Original repo can be found [here](https://github.com/microsoft/Table-Pretraining).
13
+
14
+ To load it and run inference, you can do the following:
15
+
16
+ ```
17
+ from transformers import BartTokenizer, BartForConditionalGeneration
18
+ import pandas as pd
19
+
20
+ tokenizer = BartTokenizer.from_pretrained("nielsr/tapex-large-finetuned-wikisql")
21
+ model = BartForConditionalGeneration.from_pretrained("nielsr/tapex-large-finetuned-wikisql")
22
+
23
+ # create table
24
+ data = {'Actors': ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], 'Number of movies': ["87", "53", "69"]}
25
+ table = pd.DataFrame.from_dict(data)
26
+
27
+ # turn into dict
28
+ table_dict = {"header": list(table.columns), "rows": [list(row.values) for i,row in table.iterrows()]}
29
+
30
+ # turn into format TAPEX expects
31
+ # define the linearizer based on this code: https://github.com/microsoft/Table-Pretraining/blob/main/tapex/processor/table_linearize.py
32
+ linearizer = IndexedRowTableLinearize()
33
+ linear_table = linearizer.process_table(table_dict)
34
+
35
+ # add question
36
+ question = "how many movies does George Clooney have?"
37
+ joint_input = question + " " + linear_table
38
+
39
+ # encode
40
+ encoding = tokenizer(joint_input, return_tensors="pt")
41
+
42
+ # forward pass
43
+ outputs = model.generate(**encoding)
44
+
45
+ # decode
46
+ tokenizer.batch_decode(outputs, skip_special_tokens=True)
47
+ ```