nielsr HF staff commited on
Commit
e383a7d
1 Parent(s): 6796650

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -74
README.md CHANGED
@@ -1,75 +1,76 @@
1
- ---
2
- language: en
3
- tags:
4
- - tapex
5
- license: mit
6
- ---
7
-
8
- # TAPEX (large-sized model)
9
-
10
- TAPEX was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. The original repo can be found [here](https://github.com/microsoft/Table-Pretraining).
11
-
12
- ## Model description
13
-
14
- TAPEX (**Ta**ble **P**re-training via **Ex**ecution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with *table reasoning* skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.
15
-
16
- TAPEX is based on the BART architecture, the transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.
17
-
18
- This model is the `tapex-base` model fine-tuned on the [WikiSQL](https://huggingface.co/datasets/wikisql) dataset.
19
-
20
- ## Intended Uses
21
-
22
- You can use the model for table question answering on relatively simple questions. Some **solveable** questions are shown below (corresponding tables now shown):
23
-
24
- | Question | Answer |
25
- |:---: |:---:|
26
- | tell me what the notes are for south australia | no slogan on current series |
27
- | what position does the player who played for butler cc (ks) play? | guard-forward |
28
- | how many schools did player number 3 play at? | 1.0 |
29
- | how many winning drivers in the kraco twin 125 (r2) race were there? | 1.0 |
30
- | for the episode(s) aired in the u.s. on 4 april 2008, what were the names? | "bust a move" part one, "bust a move" part two |
31
-
32
-
33
- ### How to Use
34
-
35
- Here is how to use this model in transformers:
36
-
37
- ```python
38
- from transformers import TapexTokenizer, BartForConditionalGeneration
39
- import pandas as pd
40
-
41
- tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-large-finetuned-wikisql")
42
- model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-large-finetuned-wikisql")
43
-
44
- data = {
45
- "year": [1896, 1900, 1904, 2004, 2008, 2012],
46
- "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
47
- }
48
- table = pd.DataFrame.from_dict(data)
49
-
50
- # tapex accepts uncased input since it is pre-trained on the uncased corpus
51
- query = "In which year did beijing host the Olympic Games?"
52
- encoding = tokenizer(table=table, query=query, return_tensors="pt")
53
-
54
- outputs = model.generate(**encoding)
55
-
56
- print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
57
- # [' 2008.0']
58
- ```
59
-
60
- ### How to Eval
61
-
62
- Please find the eval script [here](https://github.com/SivilTaram/transformers/tree/add_tapex_bis/examples/research_projects/tapex).
63
-
64
- ### BibTeX entry and citation info
65
-
66
- ```bibtex
67
- @inproceedings{
68
- liu2022tapex,
69
- title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
70
- author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
71
- booktitle={International Conference on Learning Representations},
72
- year={2022},
73
- url={https://openreview.net/forum?id=O50443AsCP}
74
- }
 
75
  ```
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - tapex
5
+ - table-question-answering
6
+ license: mit
7
+ ---
8
+
9
+ # TAPEX (large-sized model)
10
+
11
+ TAPEX was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. The original repo can be found [here](https://github.com/microsoft/Table-Pretraining).
12
+
13
+ ## Model description
14
+
15
+ TAPEX (**Ta**ble **P**re-training via **Ex**ecution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with *table reasoning* skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.
16
+
17
+ TAPEX is based on the BART architecture, the transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.
18
+
19
+ This model is the `tapex-base` model fine-tuned on the [WikiSQL](https://huggingface.co/datasets/wikisql) dataset.
20
+
21
+ ## Intended Uses
22
+
23
+ You can use the model for table question answering on relatively simple questions. Some **solveable** questions are shown below (corresponding tables now shown):
24
+
25
+ | Question | Answer |
26
+ |:---: |:---:|
27
+ | tell me what the notes are for south australia | no slogan on current series |
28
+ | what position does the player who played for butler cc (ks) play? | guard-forward |
29
+ | how many schools did player number 3 play at? | 1.0 |
30
+ | how many winning drivers in the kraco twin 125 (r2) race were there? | 1.0 |
31
+ | for the episode(s) aired in the u.s. on 4 april 2008, what were the names? | "bust a move" part one, "bust a move" part two |
32
+
33
+
34
+ ### How to Use
35
+
36
+ Here is how to use this model in transformers:
37
+
38
+ ```python
39
+ from transformers import TapexTokenizer, BartForConditionalGeneration
40
+ import pandas as pd
41
+
42
+ tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-large-finetuned-wikisql")
43
+ model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-large-finetuned-wikisql")
44
+
45
+ data = {
46
+ "year": [1896, 1900, 1904, 2004, 2008, 2012],
47
+ "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
48
+ }
49
+ table = pd.DataFrame.from_dict(data)
50
+
51
+ # tapex accepts uncased input since it is pre-trained on the uncased corpus
52
+ query = "In which year did beijing host the Olympic Games?"
53
+ encoding = tokenizer(table=table, query=query, return_tensors="pt")
54
+
55
+ outputs = model.generate(**encoding)
56
+
57
+ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
58
+ # [' 2008.0']
59
+ ```
60
+
61
+ ### How to Eval
62
+
63
+ Please find the eval script [here](https://github.com/SivilTaram/transformers/tree/add_tapex_bis/examples/research_projects/tapex).
64
+
65
+ ### BibTeX entry and citation info
66
+
67
+ ```bibtex
68
+ @inproceedings{
69
+ liu2022tapex,
70
+ title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
71
+ author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
72
+ booktitle={International Conference on Learning Representations},
73
+ year={2022},
74
+ url={https://openreview.net/forum?id=O50443AsCP}
75
+ }
76
  ```