Distilled-RoBERTa

The DistilBERT model is a RoBERTa model, which is trained on the SQuAD 2.0 training set, fine-tuned on the NewsQA dataset.

Hyperparameters

batch_size = 16
n_epochs = 3
max_seq_len = 512
learning_rate = 2e-5
optimizer=AdamW
lr_schedule = LinearWarmup
weight_decay=0.01
embeds_dropout_prob = 0.1

Downloads last month: 1

Safetensors

Model size

124M params

Tensor type

F32

Question Answering

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.