sgugger Marissa commited on
Commit
6045845
1 Parent(s): 1a01b38

Update model card (#1)

Browse files

- Update model card (f252510342164b319c7a3fb889da271261df12c5)
- Update README.md (52e8c5edc83669797309ee50add628ed350d8b50)


Co-authored-by: Marissa Gerchick <Marissa@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +104 -4
README.md CHANGED
@@ -5,16 +5,72 @@ datasets:
5
  - wikipedia
6
  ---
7
 
8
- # DistilBERT base multilingual model (cased)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  This model is a distilled version of the [BERT base multilingual model](bert-base-multilingual-cased). The code for the distillation process can be found
11
  [here](https://github.com/huggingface/transformers/tree/master/examples/distillation). This model is cased: it does make a difference between english and English.
12
 
13
  The model is trained on the concatenation of Wikipedia in 104 different languages listed [here](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages).
14
  The model has 6 layers, 768 dimension and 12 heads, totalizing 134M parameters (compared to 177M parameters for mBERT-base).
15
- On average DistilmBERT is twice as fast as mBERT-base.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- We encourage to check [BERT base multilingual model](bert-base-multilingual-cased) to know more about usage, limitations and potential biases.
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  | Model | English | Spanish | Chinese | German | Arabic | Urdu |
20
  | :---: | :---: | :---: | :---: | :---: | :---: | :---:|
@@ -22,7 +78,17 @@ We encourage to check [BERT base multilingual model](bert-base-multilingual-case
22
  | mBERT base uncased (reported)| 81.4 | 74.3 | 63.8 | 70.5 | 62.1 | 58.3 |
23
  | DistilmBERT | 78.2 | 69.1 | 64.0 | 66.3 | 59.1 | 54.7 |
24
 
25
- ### BibTeX entry and citation info
 
 
 
 
 
 
 
 
 
 
26
 
27
  ```bibtex
28
  @article{Sanh2019DistilBERTAD,
@@ -33,3 +99,37 @@ We encourage to check [BERT base multilingual model](bert-base-multilingual-case
33
  volume={abs/1910.01108}
34
  }
35
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - wikipedia
6
  ---
7
 
8
+ # Model Card for DistilBERT base multilingual (cased)
9
+
10
+ # Table of Contents
11
+
12
+ 1. [Model Details](#model-details)
13
+ 2. [Uses](#uses)
14
+ 3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
15
+ 4. [Training Details](#training-details)
16
+ 5. [Evaluation](#evaluation)
17
+ 6. [Environmental Impact](#environmental-impact)
18
+ 7. [Citation](#citation)
19
+ 8. [How To Get Started With the Model](#how-to-get-started-with-the-model)
20
+
21
+ # Model Details
22
+
23
+ ## Model Description
24
 
25
  This model is a distilled version of the [BERT base multilingual model](bert-base-multilingual-cased). The code for the distillation process can be found
26
  [here](https://github.com/huggingface/transformers/tree/master/examples/distillation). This model is cased: it does make a difference between english and English.
27
 
28
  The model is trained on the concatenation of Wikipedia in 104 different languages listed [here](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages).
29
  The model has 6 layers, 768 dimension and 12 heads, totalizing 134M parameters (compared to 177M parameters for mBERT-base).
30
+ On average, this model, referred to as DistilmBERT, is twice as fast as mBERT-base.
31
+
32
+ We encourage potential users of this model to check out the [BERT base multilingual model card](https://huggingface.co/bert-base-multilingual-cased) to learn more about usage, limitations and potential biases.
33
+
34
+ - **Developed by:** Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf (Hugging Face)
35
+ - **Model type:** Transformer-based language model
36
+ - **Language(s) (NLP):** 104 languages; see full list [here](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages)
37
+ - **License:** Apache 2.0
38
+ - **Related Models:** [BERT base multilingual model](https://huggingface.co/bert-base-multilingual-cased)
39
+ - **Resources for more information:**
40
+ - [GitHub Repository](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)
41
+ - [Associated Paper](https://arxiv.org/abs/1910.01108)
42
+
43
+ # Uses
44
+
45
+ ## Direct Use and Downstream Use
46
+
47
+ You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=bert) to look for fine-tuned versions on a task that interests you.
48
+
49
+ Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2.
50
+
51
+ ## Out of Scope Use
52
+
53
+ The model should not be used to intentionally create hostile or alienating environments for people. The model was not trained to be factual or true representations of people or events, and therefore using the models to generate such content is out-of-scope for the abilities of this model.
54
+
55
+ # Bias, Risks, and Limitations
56
+
57
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
58
+
59
+ ## Recommendations
60
 
61
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
62
+
63
+ # Training Details
64
+
65
+ - The model was pretrained with the supervision of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) on the concatenation of Wikipedia in 104 different languages
66
+ - The model has 6 layers, 768 dimension and 12 heads, totalizing 134M parameters.
67
+ - Further information about the training procedure and data is included in the [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) model card.
68
+
69
+ # Evaluation
70
+
71
+ The model developers report the following accuracy results for DistilmBERT (see [GitHub Repo](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)):
72
+
73
+ > Here are the results on the test sets for 6 of the languages available in XNLI. The results are computed in the zero shot setting (trained on the English portion and evaluated on the target language portion):
74
 
75
  | Model | English | Spanish | Chinese | German | Arabic | Urdu |
76
  | :---: | :---: | :---: | :---: | :---: | :---: | :---:|
 
78
  | mBERT base uncased (reported)| 81.4 | 74.3 | 63.8 | 70.5 | 62.1 | 58.3 |
79
  | DistilmBERT | 78.2 | 69.1 | 64.0 | 66.3 | 59.1 | 54.7 |
80
 
81
+ # Environmental Impact
82
+
83
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
84
+
85
+ - **Hardware Type:** More information needed
86
+ - **Hours used:** More information needed
87
+ - **Cloud Provider:** More information needed
88
+ - **Compute Region:** More information needed
89
+ - **Carbon Emitted:** More information needed
90
+
91
+ # Citation
92
 
93
  ```bibtex
94
  @article{Sanh2019DistilBERTAD,
 
99
  volume={abs/1910.01108}
100
  }
101
  ```
102
+
103
+ APA
104
+ - Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
105
+
106
+ # How to Get Started With the Model
107
+
108
+ You can use the model directly with a pipeline for masked language modeling:
109
+
110
+ ```python
111
+ >>> from transformers import pipeline
112
+ >>> unmasker = pipeline('fill-mask', model='distilbert-base-multilingual-cased')
113
+ >>> unmasker("Hello I'm a [MASK] model.")
114
+
115
+ [{'score': 0.040800247341394424,
116
+ 'sequence': "Hello I'm a virtual model.",
117
+ 'token': 37859,
118
+ 'token_str': 'virtual'},
119
+ {'score': 0.020015988498926163,
120
+ 'sequence': "Hello I'm a big model.",
121
+ 'token': 22185,
122
+ 'token_str': 'big'},
123
+ {'score': 0.018680453300476074,
124
+ 'sequence': "Hello I'm a Hello model.",
125
+ 'token': 31178,
126
+ 'token_str': 'Hello'},
127
+ {'score': 0.017396586015820503,
128
+ 'sequence': "Hello I'm a model model.",
129
+ 'token': 13192,
130
+ 'token_str': 'model'},
131
+ {'score': 0.014229810796678066,
132
+ 'sequence': "Hello I'm a perfect model.",
133
+ 'token': 43477,
134
+ 'token_str': 'perfect'}]
135
+ ```