Edit model card

Distilled-RoBERTa

The DistilBERT model is a RoBERTa model, which is trained on the SQuAD 2.0 training set, fine-tuned on the NewsQA dataset.

Hyperparameters

batch_size = 16
n_epochs = 3
max_seq_len = 512
learning_rate = 2e-5
optimizer=AdamW
lr_schedule = LinearWarmup
weight_decay=0.01
embeds_dropout_prob = 0.1
Downloads last month
1
Safetensors
Model size
124M params
Tensor type
F32
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.