hassan4830 commited on
Commit
6bbd573
1 Parent(s): 285fa31

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -17
README.md CHANGED
@@ -14,23 +14,11 @@ This [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) text classifica
14
 
15
  ## Model description
16
 
17
- DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a
18
- self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only,
19
- with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic
20
- process to generate inputs and labels from those texts using the BERT base model. More precisely, it was pretrained
21
- with three objectives:
22
-
23
- - Distillation loss: the model was trained to return the same probabilities as the BERT base model.
24
- - Masked language modeling (MLM): this is part of the original training loss of the BERT base model. When taking a
25
- sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the
26
- model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that
27
- usually see the words one after the other, or from autoregressive models like GPT which internally mask the future
28
- tokens. It allows the model to learn a bidirectional representation of the sentence.
29
- - Cosine embedding loss: the model was also trained to generate hidden states as close as possible as the BERT base
30
- model.
31
-
32
- This way, the model learns the same inner representation of the English language than its teacher model, while being
33
- faster for inference or downstream tasks.
34
 
35
  ## Intended uses & limitations
36
 
 
14
 
15
  ## Model description
16
 
17
+ XLM-RoBERTa is a scaled cross-lingual sentence encoder. It is trained on 2.5T of data across 100 languages data filtered from Common Crawl. XLM-R achieves state-of-the-arts results on multiple cross-lingual benchmarks.
18
+
19
+ The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov.
20
+
21
+ It is based on Facebook’s RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ## Intended uses & limitations
24