File size: 804 Bytes
619903a
5e4240c
 
58377a1
 
 
619903a
 
 
 
 
 
 
 
 
 
7b920a8
 
 
 
619903a
31a7992
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
---
tags: 
- question-answering
datasets:
- squad
- anukaver/EstQA
---

# Question answering model for Estonian
This is a question answering model based on XLM-Roberta base model. It is fine-tuned subsequentially on:
1. English SQuAD v1.1
2. SQuAD v1.1 translated into Estonian
3. Small native Estonian dataset (800 samples)

The model has retained good multilingual properties and can be used for extractive QA tasks in all languages included in XLM-Roberta. The performance is best in the fine-tuning languages of Estonian and English.

| Tested on | F1 | EM |
| ----------- | --- | --- |
| EstQA test set | 82.4 | 75.3 |
| SQuAD v1.1 dev set | 86.9 | 77.9 |

The Estonian dataset used for fine-tuning and validating results is available in https://huggingface.co/datasets/anukaver/EstQA/ (version 1.0)