--- language: en thumbnail: null license: mit tags: - question-answering - bert - bert-base datasets: - squad metrics: - squad widget: - text: Which name is also used to describe the Amazon rainforest in English? context: "The Amazon rainforest (Portuguese: Floresta Amaz\xF4nica or Amaz\xF4nia;\ \ Spanish: Selva Amaz\xF3nica, Amazon\xEDa or usually Amazonia; French: For\xEA\ t amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or\ \ the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon\ \ basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000\ \ sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by\ \ the rainforest. This region includes territory belonging to nine nations. The\ \ majority of the forest is contained within Brazil, with 60% of the rainforest,\ \ followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela,\ \ Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments\ \ in four nations contain \"Amazonas\" in their names. The Amazon represents over\ \ half of the planet's remaining rainforests, and comprises the largest and most\ \ biodiverse tract of tropical rainforest in the world, with an estimated 390\ \ billion individual trees divided into 16,000 species." - text: How many square kilometers of rainforest is covered in the basin? context: "The Amazon rainforest (Portuguese: Floresta Amaz\xF4nica or Amaz\xF4nia;\ \ Spanish: Selva Amaz\xF3nica, Amazon\xEDa or usually Amazonia; French: For\xEA\ t amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or\ \ the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon\ \ basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000\ \ sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by\ \ the rainforest. This region includes territory belonging to nine nations. The\ \ majority of the forest is contained within Brazil, with 60% of the rainforest,\ \ followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela,\ \ Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments\ \ in four nations contain \"Amazonas\" in their names. The Amazon represents over\ \ half of the planet's remaining rainforests, and comprises the largest and most\ \ biodiverse tract of tropical rainforest in the world, with an estimated 390\ \ billion individual trees divided into 16,000 species." model-index: - name: csarron/bert-base-uncased-squad-v1 results: - task: type: question-answering name: Question Answering dataset: name: squad type: squad config: plain_text split: validation metrics: - name: Exact Match type: exact_match value: 80.9104 verified: true - name: F1 type: f1 value: 88.2302 verified: true --- ## BERT-base uncased model fine-tuned on SQuAD v1 This model was fine-tuned from the HuggingFace [BERT](https://www.aclweb.org/anthology/N19-1423/) base uncased checkpoint on [SQuAD1.1](https://rajpurkar.github.io/SQuAD-explorer). This model is case-insensitive: it does not make a difference between english and English. ## Details | Dataset | Split | # samples | | -------- | ----- | --------- | | SQuAD1.1 | train | 90.6K | | SQuAD1.1 | eval | 11.1k | ### Fine-tuning - Python: `3.7.5` - Machine specs: `CPU: Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz` `Memory: 32 GiB` `GPUs: 2 GeForce GTX 1070, each with 8GiB memory` `GPU driver: 418.87.01, CUDA: 10.1` - script: ```shell # after install https://github.com/huggingface/transformers cd examples/question-answering mkdir -p data wget -O data/train-v1.1.json https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json wget -O data/dev-v1.1.json https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json python run_squad.py \ --model_type bert \ --model_name_or_path bert-base-uncased \ --do_train \ --do_eval \ --do_lower_case \ --train_file train-v1.1.json \ --predict_file dev-v1.1.json \ --per_gpu_train_batch_size 12 \ --per_gpu_eval_batch_size=16 \ --learning_rate 3e-5 \ --num_train_epochs 2.0 \ --max_seq_length 320 \ --doc_stride 128 \ --data_dir data \ --output_dir data/bert-base-uncased-squad-v1 2>&1 | tee train-energy-bert-base-squad-v1.log ``` It took about 2 hours to finish. ### Results **Model size**: `418M` | Metric | # Value | # Original ([Table 2](https://www.aclweb.org/anthology/N19-1423.pdf))| | ------ | --------- | --------- | | **EM** | **80.9** | **80.8** | | **F1** | **88.2** | **88.5** | Note that the above results didn't involve any hyperparameter search. ## Example Usage ```python from transformers import pipeline qa_pipeline = pipeline( "question-answering", model="csarron/bert-base-uncased-squad-v1", tokenizer="csarron/bert-base-uncased-squad-v1" ) predictions = qa_pipeline({ 'context': "The game was played on February 7, 2016 at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.", 'question': "What day was the game played on?" }) print(predictions) # output: # {'score': 0.8730505704879761, 'start': 23, 'end': 39, 'answer': 'February 7, 2016'} ``` > Created by [Qingqing Cao](https://awk.ai/) | [GitHub](https://github.com/csarron) | [Twitter](https://twitter.com/sysnlp) > Made with ❤️ in New York.