julien-c HF staff commited on
Commit
a547ce1
1 Parent(s): 3124ec8

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/distilroberta-base-README.md

Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - exbert
5
+
6
+ license: apache-2.0
7
+ datasets:
8
+ - openwebtext
9
+ ---
10
+
11
+ # DistilRoBERTa base model
12
+
13
+ This model is a distilled version of the [RoBERTa-base model](https://huggingface.co/roberta-base). It follows the same training procedure as [DistilBERT](https://huggingface.co/distilbert-base-uncased).
14
+ The code for the distillation process can be found [here](https://github.com/huggingface/transformers/tree/master/examples/distillation).
15
+ This model is case-sensitive: it makes a difference between english and English.
16
+
17
+ The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base).
18
+ On average DistilRoBERTa is twice as fast as Roberta-base.
19
+
20
+ We encourage to check [RoBERTa-base model](https://huggingface.co/roberta-base) to know more about usage, limitations and potential biases.
21
+
22
+ ## Training data
23
+
24
+ DistilRoBERTa was pre-trained on [OpenWebTextCorpus](https://skylion007.github.io/OpenWebTextCorpus/), a reproduction of OpenAI's WebText dataset (it is ~4 times less training data than the teacher RoBERTa).
25
+
26
+ ## Evaluation results
27
+
28
+ When fine-tuned on downstream tasks, this model achieves the following results:
29
+
30
+ Glue test results:
31
+
32
+ | Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE |
33
+ |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|
34
+ | | 84.0 | 89.4 | 90.8 | 92.5 | 59.3 | 88.3 | 86.6 | 67.9 |
35
+
36
+ ### BibTeX entry and citation info
37
+
38
+ ```bibtex
39
+ @article{Sanh2019DistilBERTAD,
40
+ title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
41
+ author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
42
+ journal={ArXiv},
43
+ year={2019},
44
+ volume={abs/1910.01108}
45
+ }
46
+ ```
47
+
48
+ <a href="https://huggingface.co/exbert/?model=distilroberta-base">
49
+ <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
50
+ </a>