roberta-base distilled into tinyroberta

Overview

Language model: roberta-base
Language: English
Training data: The PILE
Infrastructure: 4x Tesla v100

Hyperparameters

batch_size = 96
n_epochs = 4
max_seq_len = 384
learning_rate = 1e-4
lr_schedule = LinearWarmup
warmup_proportion = 0.2
teacher = "deepset/roberta-base"

Distillation

This model was distilled using the TinyBERT approach described in this paper and implemented in haystack. We have performed intermediate layer distillation with roberta-base as the teacher which resulted in deepset/tinyroberta-6l-768d. This model has not been distilled for any specific task. If you are interested in using distillation to improve its performance on a downstream task, you can take advantage of haystack's new distillation functionality. You can also check out deepset/tinyroberta-squad2 for a model that is already distilled on an extractive QA downstream task.

About us

deepset is the company behind the production-ready open-source AI framework Haystack.

Some of our other work:

Get in touch and join the Haystack community

For more info on Haystack, visit our GitHub repo and Documentation.

We also have a Discord community open to everyone!

Twitter | LinkedIn | Discord | GitHub Discussions | Website | YouTube

By the way: we're hiring!

Downloads last month
32
Safetensors
Model size
81.5M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for deepset/tinyroberta-6l-768d

Finetunes
7 models

Dataset used to train deepset/tinyroberta-6l-768d