Question answering model for Estonian

This is a question answering model based on XLM-Roberta base model. It is fine-tuned subsequentially on:

English SQuAD v1.1
SQuAD v1.1 translated into Estonian
Small native Estonian dataset (800 samples)

The model has retained good multilingual properties and can be used for extractive QA tasks in all languages included in XLM-Roberta. The performance is best in the fine-tuning languages of Estonian and English.

Tested on	F1	EM
EstQA test set	82.4	75.3
SQuAD v1.1 dev set	86.9	77.9

The Estonian dataset used for fine-tuning and validating results is available in https://huggingface.co/datasets/anukaver/EstQA/ (version 1.0)

anukaver
/

xlm-roberta-est-qa

Question answering model for Estonian

Datasets used to train anukaver/xlm-roberta-est-qa