hassan4830
commited on
Commit
•
6bbd573
1
Parent(s):
285fa31
Update README.md
Browse files
README.md
CHANGED
@@ -14,23 +14,11 @@ This [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) text classifica
|
|
14 |
|
15 |
## Model description
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
- Distillation loss: the model was trained to return the same probabilities as the BERT base model.
|
24 |
-
- Masked language modeling (MLM): this is part of the original training loss of the BERT base model. When taking a
|
25 |
-
sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the
|
26 |
-
model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that
|
27 |
-
usually see the words one after the other, or from autoregressive models like GPT which internally mask the future
|
28 |
-
tokens. It allows the model to learn a bidirectional representation of the sentence.
|
29 |
-
- Cosine embedding loss: the model was also trained to generate hidden states as close as possible as the BERT base
|
30 |
-
model.
|
31 |
-
|
32 |
-
This way, the model learns the same inner representation of the English language than its teacher model, while being
|
33 |
-
faster for inference or downstream tasks.
|
34 |
|
35 |
## Intended uses & limitations
|
36 |
|
|
|
14 |
|
15 |
## Model description
|
16 |
|
17 |
+
XLM-RoBERTa is a scaled cross-lingual sentence encoder. It is trained on 2.5T of data across 100 languages data filtered from Common Crawl. XLM-R achieves state-of-the-arts results on multiple cross-lingual benchmarks.
|
18 |
+
|
19 |
+
The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov.
|
20 |
+
|
21 |
+
It is based on Facebook’s RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
## Intended uses & limitations
|
24 |
|