Qian Liu commited on
Commit
50df240
1 Parent(s): d435d6a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md CHANGED
@@ -1,3 +1,75 @@
1
  ---
 
 
 
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
+ tags:
4
+ - tapex
5
  license: mit
6
  ---
7
+
8
+ # TAPEX (base-sized model)
9
+
10
+ TAPEX was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. The original repo can be found [here](https://github.com/microsoft/Table-Pretraining).
11
+
12
+ ## Model description
13
+
14
+ TAPEX (**Ta**ble **P**re-training via **Ex**ecution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with *table reasoning* skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.
15
+
16
+ TAPEX is based on the BART architecture, the transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.
17
+
18
+ This model is the `tapex-base` model fine-tuned on the [WikiTableQuestions](https://huggingface.co/datasets/wikitablequestions) dataset.
19
+
20
+ ## Intended Uses
21
+
22
+ You can use the model for table question answering on *complex* questions. Some **solveable** questions are shown below (corresponding tables now shown):
23
+
24
+ | Question | Answer |
25
+ |:---: |:---:|
26
+ | according to the table, what is the last title that spicy horse produced? | Akaneiro: Demon Hunters |
27
+ | what is the difference in runners-up from coleraine academical institution and royal school dungannon? | 20 |
28
+ | what were the first and last movies greenstreet acted in? | The Maltese Falcon, Malaya |
29
+ | in which olympic games did arasay thondike not finish in the top 20? | 2012 |
30
+ | which broadcaster hosted 3 titles but they had only 1 episode? | Channel 4 |
31
+
32
+
33
+ ### How to Use
34
+
35
+ Here is how to use this model in transformers:
36
+
37
+ ```python
38
+ from transformers import TapexTokenizer, BartForConditionalGeneration
39
+ import pandas as pd
40
+
41
+ tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-base-finetuned-wtq")
42
+ model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-base-finetuned-wtq")
43
+
44
+ data = {
45
+ "year": [1896, 1900, 1904, 2004, 2008, 2012],
46
+ "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
47
+ }
48
+ table = pd.DataFrame.from_dict(data)
49
+
50
+ # tapex accepts uncased input since it is pre-trained on the uncased corpus
51
+ query = "In which year did beijing host the Olympic Games?"
52
+ encoding = tokenizer(table=table, query=query, return_tensors="pt")
53
+
54
+ outputs = model.generate(**encoding)
55
+
56
+ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
57
+ # [' 2008.0']
58
+ ```
59
+
60
+ ### How to Eval
61
+
62
+ Please find the eval script [here](https://github.com/SivilTaram/transformers/tree/add_tapex_bis/examples/research_projects/tapex).
63
+
64
+ ### BibTeX entry and citation info
65
+
66
+ ```bibtex
67
+ @inproceedings{
68
+ liu2022tapex,
69
+ title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
70
+ author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
71
+ booktitle={International Conference on Learning Representations},
72
+ year={2022},
73
+ url={https://openreview.net/forum?id=O50443AsCP}
74
+ }
75
+ ```