Qian Liu commited on
Commit
c53b0b4
1 Parent(s): 6ce3766

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md CHANGED
@@ -1,3 +1,75 @@
1
  ---
 
 
 
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
+ tags:
4
+ - tapex
5
  license: mit
6
  ---
7
+
8
+ # TAPEX (base-sized model)
9
+
10
+ TAPEX was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. The original repo can be found [here](https://github.com/microsoft/Table-Pretraining).
11
+
12
+ ## Model description
13
+
14
+ TAPEX (**Ta**ble **P**re-training via **Ex**ecution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with *table reasoning* skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.
15
+
16
+ TAPEX is based on the BART architecture, the transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.
17
+
18
+ This model is the `tapex-base` model fine-tuned on the [WikiSQL](https://huggingface.co/datasets/wikisql) dataset.
19
+
20
+ ## Intended Uses
21
+
22
+ You can use the model for table question answering on relatively simple questions. Some **solveable** questions are shown below (corresponding tables now shown):
23
+
24
+ | Question | Answer |
25
+ |:---: |:---:|
26
+ | tell me what the notes are for south australia | no slogan on current series |
27
+ | what position does the player who played for butler cc (ks) play? | guard-forward |
28
+ | how many schools did player number 3 play at? | 1.0 |
29
+ | how many winning drivers in the kraco twin 125 (r2) race were there? | 1.0 |
30
+ | for the episode(s) aired in the u.s. on 4 april 2008, what were the names? | "bust a move" part one, "bust a move" part two |
31
+
32
+
33
+ ### How to Use
34
+
35
+ Here is how to use this model in transformers:
36
+
37
+ ```python
38
+ from transformers import TapexTokenizer, BartForConditionalGeneration
39
+ import pandas as pd
40
+
41
+ tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-base-finetuned-wikisql")
42
+ model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-base")
43
+
44
+ data = {
45
+ "year": [1896, 1900, 1904, 2004, 2008, 2012],
46
+ "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
47
+ }
48
+ table = pd.DataFrame.from_dict(data)
49
+
50
+ # tapex accepts uncased input since it is pre-trained on the uncased corpus
51
+ query = "In which year did beijing host the Olympic Games?"
52
+ encoding = tokenizer(table=table, query=query, return_tensors="pt")
53
+
54
+ outputs = model.generate(**encoding)
55
+
56
+ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
57
+ # ['2008']
58
+ ```
59
+
60
+ ### How to Eval
61
+
62
+ Please find the eval script [here](https://github.com/SivilTaram/transformers/tree/add_tapex_bis/examples/research_projects/tapex).
63
+
64
+ ### BibTeX entry and citation info
65
+
66
+ ```bibtex
67
+ @inproceedings{
68
+ liu2022tapex,
69
+ title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
70
+ author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
71
+ booktitle={International Conference on Learning Representations},
72
+ year={2022},
73
+ url={https://openreview.net/forum?id=O50443AsCP}
74
+ }
75
+ ```