---
license: mit
---
# Distilled-RoBERTa

The DistilBERT model is a [RoBERTa](https://huggingface.co/deepset/roberta-base-squad2-distilled) model, which is trained on the SQuAD 2.0 training set, fine-tuned on the [NewsQA](https://huggingface.co/datasets/lucadiliello/newsqa) dataset. 

## Hyperparameters
```
batch_size = 16
n_epochs = 3
max_seq_len = 512
learning_rate = 2e-5
optimizer=AdamW
lr_schedule = LinearWarmup
weight_decay=0.01
embeds_dropout_prob = 0.1
```