File size: 5,651 Bytes
be24ece a39235c be24ece a39235c be24ece |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
---
language: en
thumbnail: null
license: mit
tags:
- question-answering
- bert
- bert-base
datasets:
- squad
metrics:
- squad
widget:
- text: Which name is also used to describe the Amazon rainforest in English?
context: "The Amazon rainforest (Portuguese: Floresta Amaz\xF4nica or Amaz\xF4nia;\
\ Spanish: Selva Amaz\xF3nica, Amazon\xEDa or usually Amazonia; French: For\xEA\
t amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or\
\ the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon\
\ basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000\
\ sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by\
\ the rainforest. This region includes territory belonging to nine nations. The\
\ majority of the forest is contained within Brazil, with 60% of the rainforest,\
\ followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela,\
\ Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments\
\ in four nations contain \"Amazonas\" in their names. The Amazon represents over\
\ half of the planet's remaining rainforests, and comprises the largest and most\
\ biodiverse tract of tropical rainforest in the world, with an estimated 390\
\ billion individual trees divided into 16,000 species."
- text: How many square kilometers of rainforest is covered in the basin?
context: "The Amazon rainforest (Portuguese: Floresta Amaz\xF4nica or Amaz\xF4nia;\
\ Spanish: Selva Amaz\xF3nica, Amazon\xEDa or usually Amazonia; French: For\xEA\
t amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or\
\ the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon\
\ basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000\
\ sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by\
\ the rainforest. This region includes territory belonging to nine nations. The\
\ majority of the forest is contained within Brazil, with 60% of the rainforest,\
\ followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela,\
\ Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments\
\ in four nations contain \"Amazonas\" in their names. The Amazon represents over\
\ half of the planet's remaining rainforests, and comprises the largest and most\
\ biodiverse tract of tropical rainforest in the world, with an estimated 390\
\ billion individual trees divided into 16,000 species."
model-index:
- name: csarron/bert-base-uncased-squad-v1
results:
- task:
type: question-answering
name: Question Answering
dataset:
name: squad
type: squad
config: plain_text
split: validation
metrics:
- name: Exact Match
type: exact_match
value: 80.9104
verified: true
- name: F1
type: f1
value: 88.2302
verified: true
---
## BERT-base uncased model fine-tuned on SQuAD v1
This model was fine-tuned from the HuggingFace [BERT](https://www.aclweb.org/anthology/N19-1423/) base uncased checkpoint on [SQuAD1.1](https://rajpurkar.github.io/SQuAD-explorer).
This model is case-insensitive: it does not make a difference between english and English.
## Details
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| SQuAD1.1 | train | 90.6K |
| SQuAD1.1 | eval | 11.1k |
### Fine-tuning
- Python: `3.7.5`
- Machine specs:
`CPU: Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz`
`Memory: 32 GiB`
`GPUs: 2 GeForce GTX 1070, each with 8GiB memory`
`GPU driver: 418.87.01, CUDA: 10.1`
- script:
```shell
# after install https://github.com/huggingface/transformers
cd examples/question-answering
mkdir -p data
wget -O data/train-v1.1.json https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
wget -O data/dev-v1.1.json https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
python run_squad.py \
--model_type bert \
--model_name_or_path bert-base-uncased \
--do_train \
--do_eval \
--do_lower_case \
--train_file train-v1.1.json \
--predict_file dev-v1.1.json \
--per_gpu_train_batch_size 12 \
--per_gpu_eval_batch_size=16 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 320 \
--doc_stride 128 \
--data_dir data \
--output_dir data/bert-base-uncased-squad-v1 2>&1 | tee train-energy-bert-base-squad-v1.log
```
It took about 2 hours to finish.
### Results
**Model size**: `418M`
| Metric | # Value | # Original ([Table 2](https://www.aclweb.org/anthology/N19-1423.pdf))|
| ------ | --------- | --------- |
| **EM** | **80.9** | **80.8** |
| **F1** | **88.2** | **88.5** |
Note that the above results didn't involve any hyperparameter search.
## Example Usage
```python
from transformers import pipeline
qa_pipeline = pipeline(
"question-answering",
model="csarron/bert-base-uncased-squad-v1",
tokenizer="csarron/bert-base-uncased-squad-v1"
)
predictions = qa_pipeline({
'context': "The game was played on February 7, 2016 at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.",
'question': "What day was the game played on?"
})
print(predictions)
# output:
# {'score': 0.8730505704879761, 'start': 23, 'end': 39, 'answer': 'February 7, 2016'}
```
> Created by [Qingqing Cao](https://awk.ai/) | [GitHub](https://github.com/csarron) | [Twitter](https://twitter.com/sysnlp)
> Made with ❤️ in New York.
|