## Overview
**Language model:** deepset/roberta-base-squad2-distilled
**Language:** English
**Training data:** SQuAD 2.0 training set
**Eval data:** SQuAD 2.0 dev set
**Infrastructure**: 4x V100 GPU
**Published**: Dec 8th, 2021
## Details
- haystack's distillation feature was used for training. deepset/roberta-large-squad2 was used as the teacher model.
## Hyperparameters
batch_size = 80
n_epochs = 4
max_seq_len = 384
learning_rate = 3e-5
lr_schedule = LinearWarmup
embeds_dropout_prob = 0.1
temperature = 1.5
distillation_loss_weight = 0.75
## Performance
"exact": 79.8366040596311
"f1": 83.916407079888
## Authors
**Timo Möller:** timo.moeller@deepset.ai
**Julian Risch:** julian.risch@deepset.ai
**Malte Pietsch:** malte.pietsch@deepset.ai
**Michel Bartels:** michel.bartels@deepset.ai
