This is roberta-base not distilled

#4
by bilalghanem - opened

Hi
I found that the size of this model is the same as xlm-roberta-base model.
What do you mean by "distilled" in the name of this model?

This model was distilled from the deepset/xlm-roberta-large-squad2 model.

The number of params is the same as xlm-roberta!
Both has 278,084,405.

Where on the other hand, monolingual Roberta and its distilled:
RoBERTa: 124,686,389
DistRoBERTa: 82,159,157

Am I missing something here?

Hi, @bilalghanem I believe there is some confusion.

This model was distilled from the xlm-roberta-large-squad2 which is the same size as xlm-roberta-large model:
xlm-roberta-large-squad2: pytorch weights are 2.24 GB link
xlm-roberta-large: pytorch weights are 2.24 GB link

So this model (xlm-roberta-base-squad2-distilled) was made by distilling the large model into a base model so we expect this model (xlm-roberta-base-squad2-distilled) to be the same size as xlm-roberta-base model.
xlm-roberta-base-squad2-distilled: pytorch weights are 1.11 GB link
xlm-roberta-base-squad2: pytorch weights are 1.11 GB link
xlm-roberta-base: pytorch weights are 1.12 GB link

Let me know if this answers your question and you can learn more about model distillation on our blog: Model Distillation with Haystack

I see, thanks for the clarification!

Then I'd suggest you to name it deepset/xlm-roberta-large-squad2-distilled, not base.

Because the way the models are named in huggingface is different, e.g. distilbert-base-uncased is a distilled model of bert-base-uncased, not `bert-large-uncased.

Thanks!

Sign up or log in to comment