1
---
2
language: en
3
tags:
4
- tapas
5
- table-question-answering
6
license: apache-2.0
7
datasets:
8
- wtq
9
---
10
11
# TAPAS mini model fine-tuned on WikiTable Questions (WTQ)
12
13
This model has 2 versions which can be used. The default version corresponds to the `tapas_wtq_wikisql_sqa_inter_masklm_mini_reset` checkpoint of the [original Github repository](https://github.com/google-research/tapas).
14
This model was pre-trained on MLM and an additional step which the authors call intermediate pre-training, and then fine-tuned in a chain on [SQA](https://www.microsoft.com/en-us/download/details.aspx?id=54253), [WikiSQL](https://github.com/salesforce/WikiSQL) and finally [WTQ](https://github.com/ppasupat/WikiTableQuestions). It uses relative position embeddings (i.e. resetting the position index at every cell of the table).
15
16
The other (non-default) version which can be used is: 
17
- `no_reset`, which corresponds to `tapas_wtq_wikisql_sqa_inter_masklm_mini` (intermediate pre-training, absolute position embeddings). 
18
19
Disclaimer: The team releasing TAPAS did not write a model card for this model so this model card has been written by
20
the Hugging Face team and contributors.
21
22
## Results
23
24
Size     |  Reset  | Dev Accuracy | Link
25
-------- | --------| -------- | ----
26
LARGE | noreset | 0.5062 | [tapas-large-finetuned-wtq (with absolute pos embeddings)](https://huggingface.co/google/tapas-large-finetuned-wtq/tree/no_reset)
27
LARGE | reset | 0.5097 | [tapas-large-finetuned-wtq](https://huggingface.co/google/tapas-large-finetuned-wtq/tree/main)
28
BASE | noreset | 0.4525 | [tapas-base-finetuned-wtq (with absolute pos embeddings)](https://huggingface.co/google/tapas-base-finetuned-wtq/tree/no_reset)
29
BASE | reset | 0.4638 | [tapas-base-finetuned-wtq](https://huggingface.co/google/tapas-base-finetuned-wtq/tree/main)
30
MEDIUM | noreset | 0.4324 | [tapas-medium-finetuned-wtq (with absolute pos embeddings)](https://huggingface.co/google/tapas-medium-finetuned-wtq/tree/no_reset)
31
MEDIUM | reset | 0.4324 | [tapas-medium-finetuned-wtq](https://huggingface.co/google/tapas-medium-finetuned-wtq/tree/main)
32
SMALL | noreset | 0.3681 | [tapas-small-finetuned-wtq (with absolute pos embeddings)](https://huggingface.co/google/tapas-small-finetuned-wtq/tree/no_reset)
33
SMALL | reset | 0.3762 | [tapas-small-finetuned-wtq](https://huggingface.co/google/tapas-small-finetuned-wtq/tree/main)
34
**MINI** | **noreset** | **0.2783** | [tapas-mini-finetuned-wtq (with absolute pos embeddings)](https://huggingface.co/google/tapas-mini-finetuned-wtq/tree/no_reset)
35
**MINI** | **reset** | **0.2854** | [tapas-mini-finetuned-wtq](https://huggingface.co/google/tapas-mini-finetuned-wtq/tree/main)
36
TINY | noreset | 0.0823 | [tapas-tiny-finetuned-wtq (with absolute pos embeddings)](https://huggingface.co/google/tapas-tiny-finetuned-wtq/tree/no_reset)
37
TINY | reset | 0.1039 | [tapas-tiny-finetuned-wtq](https://huggingface.co/google/tapas-tiny-finetuned-wtq/tree/main)
38
39
## Model description
40
41
TAPAS is a BERT-like transformers model pretrained on a large corpus of English data from Wikipedia in a self-supervised fashion. 
42
This means it was pretrained on the raw tables and associated texts only, with no humans labelling them in any way (which is why it
43
can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
44
was pretrained with two objectives:
45
46
- Masked language modeling (MLM): taking a (flattened) table and associated context, the model randomly masks 15% of the words in 
47
  the input, then runs the entire (partially masked) sequence through the model. The model then has to predict the masked words. 
48
  This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, 
49
  or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional 
50
  representation of a table and associated text.
51
- Intermediate pre-training: to encourage numerical reasoning on tables, the authors additionally pre-trained the model by creating 
52
  a balanced dataset of millions of syntactically created training examples. Here, the model must predict (classify) whether a sentence 
53
  is supported or refuted by the contents of a table. The training examples are created based on synthetic as well as counterfactual statements.
54
55
This way, the model learns an inner representation of the English language used in tables and associated texts, which can then be used 
56
to extract features useful for downstream tasks such as answering questions about a table, or determining whether a sentence is entailed
57
or refuted by the contents of a table. Fine-tuning is done by adding a cell selection head and aggregation head on top of the pre-trained model, and then jointly train these randomly initialized classification heads with the base model on SQa, WikiSQL and finally WTQ. 
58
59
60
## Intended uses & limitations
61
62
You can use this model for answering questions related to a table.
63
64
For code examples, we refer to the documentation of TAPAS on the HuggingFace website. 
65
66
67
## Training procedure
68
69
### Preprocessing
70
71
The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are
72
then of the form:
73
74
```
75
[CLS] Question [SEP] Flattened table [SEP]
76
```
77
78
The authors did first convert the WTQ dataset into the format of SQA using automatic conversion scripts.
79
80
### Fine-tuning
81
82
The model was fine-tuned on 32 Cloud TPU v3 cores for 50,000 steps with maximum sequence length 512 and batch size of 512.
83
In this setup, fine-tuning takes around 10 hours. The optimizer used is Adam with a learning rate of 1.93581e-5, and a warmup 
84
ratio of 0.128960. An inductive bias is added such that the model only selects cells of the same column. This is reflected by the 
85
`select_one_column` parameter of `TapasConfig`. See the [paper](https://arxiv.org/abs/2004.02349) for more details (tables 11 and
86
12). 
87
88
89
### BibTeX entry and citation info
90
91
```bibtex
92
@misc{herzig2020tapas,
93
      title={TAPAS: Weakly Supervised Table Parsing via Pre-training}, 
94
      author={Jonathan Herzig and Paweł Krzysztof Nowak and Thomas Müller and Francesco Piccinno and Julian Martin Eisenschlos},
95
      year={2020},
96
      eprint={2004.02349},
97
      archivePrefix={arXiv},
98
      primaryClass={cs.IR}
99
}
100
```
101
102
```bibtex
103
@misc{eisenschlos2020understanding,
104
      title={Understanding tables with intermediate pre-training}, 
105
      author={Julian Martin Eisenschlos and Syrine Krichene and Thomas Müller},
106
      year={2020},
107
      eprint={2010.00571},
108
      archivePrefix={arXiv},
109
      primaryClass={cs.CL}
110
}
111
```
112
113
```bibtex
114
@article{DBLP:journals/corr/PasupatL15,
115
  author    = {Panupong Pasupat and
116
               Percy Liang},
117
  title     = {Compositional Semantic Parsing on Semi-Structured Tables},
118
  journal   = {CoRR},
119
  volume    = {abs/1508.00305},
120
  year      = {2015},
121
  url       = {http://arxiv.org/abs/1508.00305},
122
  archivePrefix = {arXiv},
123
  eprint    = {1508.00305},
124
  timestamp = {Mon, 13 Aug 2018 16:47:37 +0200},
125
  biburl    = {https://dblp.org/rec/journals/corr/PasupatL15.bib},
126
  bibsource = {dblp computer science bibliography, https://dblp.org}
127
}
128
```