Edit model card



Language model: tinyroberta-squad2
Language: English
Training data: The PILE
Infrastructure: 4x Tesla v100


batch_size = 96
n_epochs = 4
base_LM_model = "deepset/tinyroberta-squad2-step1"
max_seq_len = 384
learning_rate = 1e-4
lr_schedule = LinearWarmup
warmup_proportion = 0.2
teacher = "deepset/roberta-base"


This model was distilled using the TinyBERT approach described in this paper and implemented in haystack. We have performed intermediate layer distillation with roberta-base as the teacher which resulted in deepset/tinyroberta-6l-768d. This model has not been distilled for any specific task. If you are interested in using distillation to improve its performance on a downstream task, you can take advantage of haystack's new distillation functionality. You can also check out deepset/tinyroberta-squad2 for a model that is already distilled on an extractive QA downstream task.


In Transformers

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "deepset/tinyroberta-squad2"

model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)


from farm.modeling.adaptive_model import AdaptiveModel
from farm.modeling.tokenization import Tokenizer
from farm.infer import Inferencer

model_name = "deepset/tinyroberta-squad2"
model = AdaptiveModel.convert_from_transformers(model_name, device="cpu", task_type="question_answering")
tokenizer = Tokenizer.load(model_name)

In haystack

For doing QA at scale (i.e. many docs instead of single paragraph), you can load the model also in haystack:

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2")
# or 
reader = TransformersReader(model_name_or_path="deepset/roberta-base-squad2",tokenizer="deepset/roberta-base-squad2")


Branden Chan: branden.chan [at] deepset.ai Timo M枚ller: timo.moeller [at] deepset.ai Malte Pietsch: malte.pietsch [at] deepset.ai Tanay Soni: tanay.soni [at] deepset.ai Michel Bartels: michel.bartels [at] deepset.ai

About us

deepset logo We bring NLP to the industry via open source!
Our focus: Industry specific language models & large scale QA systems.

Some of our work:

Get in touch: Twitter | LinkedIn | Slack | GitHub Discussions | Website

By the way: we're hiring!

Downloads last month
Model size
81.5M params
Tensor type

Dataset used to train deepset/tinyroberta-6l-768d