Marissa commited on
Commit
8170ea6
1 Parent(s): ce151e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -155,7 +155,9 @@ Users (both direct and downstream) should be made aware of the risks, biases and
155
 
156
  # Training
157
 
158
- This model is the XLM model trained on Wikipedia text in 100 languages. The preprocessing included tokenization and byte-pair-encoding. See the [GitHub repo](https://github.com/facebookresearch/XLM#the-17-and-100-languages) and the [associated paper](https://arxiv.org/pdf/1911.02116.pdf) for further details on the training data and training procedure.
 
 
159
 
160
  # Evaluation
161
 
@@ -183,6 +185,10 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
183
  - **Compute Region:** More information needed
184
  - **Carbon Emitted:** More information needed
185
 
 
 
 
 
186
  # Citation
187
 
188
  **BibTeX:**
 
155
 
156
  # Training
157
 
158
+ This model is the XLM model trained on Wikipedia text in 100 languages. The preprocessing included tokenization with byte-pair-encoding. See the [GitHub repo](https://github.com/facebookresearch/XLM#the-17-and-100-languages) and the [associated paper](https://arxiv.org/pdf/1911.02116.pdf) for further details on the training data and training procedure.
159
+
160
+ [Conneau et al. (2020)](https://arxiv.org/pdf/1911.02116.pdf) report that this model has 16 layers, 1280 hidden states, 16 attention heads, and the dimension of the feed-forward layer is 1520. The vocabulary size is 200k and the total number of parameters is 570M (see Table 7).
161
 
162
  # Evaluation
163
 
 
185
  - **Compute Region:** More information needed
186
  - **Carbon Emitted:** More information needed
187
 
188
+ # Technical Specifications
189
+
190
+ [Conneau et al. (2020)](https://arxiv.org/pdf/1911.02116.pdf) report that this model has 16 layers, 1280 hidden states, 16 attention heads, and the dimension of the feed-forward layer is 1520. The vocabulary size is 200k and the total number of parameters is 570M (see Table 7).
191
+
192
  # Citation
193
 
194
  **BibTeX:**