File size: 3,068 Bytes
cac4cc8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
language: multilingual
thumbnail:
---
# DistilBERT multilingual fine-tuned on TydiQA (GoldP task) dataset for multilingual Q&A ππβ
## Details of the language model
[distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased)
## Details of the Tydi QA dataset
TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and without the use of translation, and is designed for the **training and evaluation** of automatic question answering systems. This repository provides evaluation code and a baseline system for the dataset. https://ai.google.com/research/tydiqa
## Details of the downstream task (Gold Passage or GoldP aka the secondary task)
Given a passage that is guaranteed to contain the answer, predict the single contiguous span of characters that answers the question. the gold passage task differs from the [primary task](https://github.com/google-research-datasets/tydiqa/blob/master/README.md#the-tasks) in several ways:
* only the gold answer passage is provided rather than the entire Wikipedia article;
* unanswerable questions have been discarded, similar to MLQA and XQuAD;
* we evaluate with the SQuAD 1.1 metrics like XQuAD; and
* Thai and Japanese are removed since the lack of whitespace breaks some tools.
## Model training πͺποΈβ
The model was fine-tuned on a Tesla P100 GPU and 25GB of RAM.
The script is the following:
```python
python transformers/examples/question-answering/run_squad.py \
--model_type distilbert \
--model_name_or_path distilbert-base-multilingual-cased \
--do_train \
--do_eval \
--train_file /path/to/dataset/train.json \
--predict_file /path/to/dataset/dev.json \
--per_gpu_train_batch_size 24 \
--per_gpu_eval_batch_size 24 \
--learning_rate 3e-5 \
--num_train_epochs 5 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /content/model_output \
--overwrite_output_dir \
--save_steps 1000 \
--threads 400
```
## Global Results (dev set) π
| Metric | # Value |
| --------- | ----------- |
| **EM** | **63.85** |
| **F1** | **75.70** |
## Specific Results (per language) ππ
| Language | # Samples | # EM | # F1 |
| --------- | ----------- |--------| ------ |
| Arabic | 1314 | 66.66 | 80.02 |
| Bengali | 180 | 53.09 | 63.50 |
| English | 654 | 62.42 | 73.12 |
| Finnish | 1031 | 64.57 | 75.15 |
| Indonesian| 773 | 67.89 | 79.70 |
| Korean | 414 | 51.29 | 61.73 |
| Russian | 1079 | 55.42 | 70.08 |
| Swahili | 596 | 74.51 | 81.15 |
| Telegu | 874 | 66.21 | 79.85 |
## Similar models
You can also try [bert-multi-cased-finedtuned-xquad-tydiqa-goldp](https://huggingface.co/mrm8488/bert-multi-cased-finedtuned-xquad-tydiqa-goldp) that achieves **F1 = 82.16** and **EM = 71.06** (And of course better marks per language).
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
> Made with <span style="color: #e25555;">♥</span> in Spain
|