Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/iarfmoose/bert-base-cased-qa-evaluator/README.md
README.md
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# BERT-base-cased-qa-evaluator
|
2 |
+
|
3 |
+
This model takes a question answer pair as an input and outputs a value representing its prediction about whether the input was a valid question and answer pair or not. The model is a pretrained [BERT-base-cased](https://huggingface.co/bert-base-cased) with a sequence classification head.
|
4 |
+
|
5 |
+
## Intended uses
|
6 |
+
|
7 |
+
The QA evaluator was originally designed to be used with the [t5-base-question-generator](https://huggingface.co/iarfmoose/t5-base-question-generator) for evaluating the quality of generated questions.
|
8 |
+
|
9 |
+
The input for the QA evaluator follows the format for `BertForSequenceClassification`, but using the question and answer as the two sequences. Inputs should take the following format:
|
10 |
+
```
|
11 |
+
[CLS] <question> [SEP] <answer [SEP]
|
12 |
+
```
|
13 |
+
|
14 |
+
## Limitations and bias
|
15 |
+
|
16 |
+
The model is trained to evaluate if a question and answer are semantically related, but cannot determine whether an answer is actually true/correct or not.
|
17 |
+
|
18 |
+
## Training data
|
19 |
+
|
20 |
+
The training data was made up of question-answer pairs from the following datasets:
|
21 |
+
- [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)
|
22 |
+
- [RACE](http://www.cs.cmu.edu/~glai1/data/race/)
|
23 |
+
- [CoQA](https://stanfordnlp.github.io/coqa/)
|
24 |
+
- [MSMARCO](https://microsoft.github.io/msmarco/)
|
25 |
+
|
26 |
+
## Training procedure
|
27 |
+
|
28 |
+
The question and answer were concatenated 50% of the time. In the other 50% of the time a corruption operation was performed (either swapping the answer for an unrelated answer, or by copying part of the question into the answer). The model was then trained to predict whether the input sequence represented one of the original QA pairs or a corrupted input.
|