|
--- |
|
language: en |
|
thumbnail: null |
|
license: mit |
|
tags: |
|
- question-answering |
|
- bert |
|
- bert-base |
|
datasets: |
|
- squad |
|
metrics: |
|
- squad |
|
widget: |
|
- text: Which name is also used to describe the Amazon rainforest in English? |
|
context: "The Amazon rainforest (Portuguese: Floresta Amaz\xF4nica or Amaz\xF4nia;\ |
|
\ Spanish: Selva Amaz\xF3nica, Amazon\xEDa or usually Amazonia; French: For\xEA\ |
|
t amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or\ |
|
\ the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon\ |
|
\ basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000\ |
|
\ sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by\ |
|
\ the rainforest. This region includes territory belonging to nine nations. The\ |
|
\ majority of the forest is contained within Brazil, with 60% of the rainforest,\ |
|
\ followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela,\ |
|
\ Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments\ |
|
\ in four nations contain \"Amazonas\" in their names. The Amazon represents over\ |
|
\ half of the planet's remaining rainforests, and comprises the largest and most\ |
|
\ biodiverse tract of tropical rainforest in the world, with an estimated 390\ |
|
\ billion individual trees divided into 16,000 species." |
|
- text: How many square kilometers of rainforest is covered in the basin? |
|
context: "The Amazon rainforest (Portuguese: Floresta Amaz\xF4nica or Amaz\xF4nia;\ |
|
\ Spanish: Selva Amaz\xF3nica, Amazon\xEDa or usually Amazonia; French: For\xEA\ |
|
t amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or\ |
|
\ the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon\ |
|
\ basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000\ |
|
\ sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by\ |
|
\ the rainforest. This region includes territory belonging to nine nations. The\ |
|
\ majority of the forest is contained within Brazil, with 60% of the rainforest,\ |
|
\ followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela,\ |
|
\ Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments\ |
|
\ in four nations contain \"Amazonas\" in their names. The Amazon represents over\ |
|
\ half of the planet's remaining rainforests, and comprises the largest and most\ |
|
\ biodiverse tract of tropical rainforest in the world, with an estimated 390\ |
|
\ billion individual trees divided into 16,000 species." |
|
model-index: |
|
- name: csarron/bert-base-uncased-squad-v1 |
|
results: |
|
- task: |
|
type: question-answering |
|
name: Question Answering |
|
dataset: |
|
name: squad |
|
type: squad |
|
config: plain_text |
|
split: validation |
|
metrics: |
|
- name: Exact Match |
|
type: exact_match |
|
value: 80.9104 |
|
verified: true |
|
- name: F1 |
|
type: f1 |
|
value: 88.2302 |
|
verified: true |
|
--- |
|
|
|
## BERT-base uncased model fine-tuned on SQuAD v1 |
|
|
|
This model was fine-tuned from the HuggingFace [BERT](https://www.aclweb.org/anthology/N19-1423/) base uncased checkpoint on [SQuAD1.1](https://rajpurkar.github.io/SQuAD-explorer). |
|
This model is case-insensitive: it does not make a difference between english and English. |
|
|
|
## Details |
|
|
|
| Dataset | Split | # samples | |
|
| -------- | ----- | --------- | |
|
| SQuAD1.1 | train | 90.6K | |
|
| SQuAD1.1 | eval | 11.1k | |
|
|
|
|
|
### Fine-tuning |
|
- Python: `3.7.5` |
|
|
|
- Machine specs: |
|
|
|
`CPU: Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz` |
|
|
|
`Memory: 32 GiB` |
|
|
|
`GPUs: 2 GeForce GTX 1070, each with 8GiB memory` |
|
|
|
`GPU driver: 418.87.01, CUDA: 10.1` |
|
|
|
- script: |
|
|
|
```shell |
|
# after install https://github.com/huggingface/transformers |
|
|
|
cd examples/question-answering |
|
mkdir -p data |
|
|
|
wget -O data/train-v1.1.json https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json |
|
|
|
wget -O data/dev-v1.1.json https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json |
|
|
|
python run_squad.py \ |
|
--model_type bert \ |
|
--model_name_or_path bert-base-uncased \ |
|
--do_train \ |
|
--do_eval \ |
|
--do_lower_case \ |
|
--train_file train-v1.1.json \ |
|
--predict_file dev-v1.1.json \ |
|
--per_gpu_train_batch_size 12 \ |
|
--per_gpu_eval_batch_size=16 \ |
|
--learning_rate 3e-5 \ |
|
--num_train_epochs 2.0 \ |
|
--max_seq_length 320 \ |
|
--doc_stride 128 \ |
|
--data_dir data \ |
|
--output_dir data/bert-base-uncased-squad-v1 2>&1 | tee train-energy-bert-base-squad-v1.log |
|
``` |
|
|
|
It took about 2 hours to finish. |
|
|
|
### Results |
|
|
|
**Model size**: `418M` |
|
|
|
| Metric | # Value | # Original ([Table 2](https://www.aclweb.org/anthology/N19-1423.pdf))| |
|
| ------ | --------- | --------- | |
|
| **EM** | **80.9** | **80.8** | |
|
| **F1** | **88.2** | **88.5** | |
|
|
|
Note that the above results didn't involve any hyperparameter search. |
|
|
|
## Example Usage |
|
|
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
qa_pipeline = pipeline( |
|
"question-answering", |
|
model="csarron/bert-base-uncased-squad-v1", |
|
tokenizer="csarron/bert-base-uncased-squad-v1" |
|
) |
|
|
|
predictions = qa_pipeline({ |
|
'context': "The game was played on February 7, 2016 at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.", |
|
'question': "What day was the game played on?" |
|
}) |
|
|
|
print(predictions) |
|
# output: |
|
# {'score': 0.8730505704879761, 'start': 23, 'end': 39, 'answer': 'February 7, 2016'} |
|
``` |
|
|
|
> Created by [Qingqing Cao](https://awk.ai/) | [GitHub](https://github.com/csarron) | [Twitter](https://twitter.com/sysnlp) |
|
|
|
> Made with ❤️ in New York. |
|
|