Commit
·
4626758
1
Parent(s):
7570fd6
Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
model_creators:
|
3 |
- Leonardo Zilio, Hadeel Saadany, Prashant Sharma, Diptesh Kanojia, Constantin Orasan
|
4 |
license: mit
|
5 |
tags:
|
@@ -51,7 +51,19 @@ It achieves the following results on the evaluation set:
|
|
51 |
|
52 |
## Model description
|
53 |
|
54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
## Intended uses & limitations
|
57 |
|
|
|
1 |
---
|
2 |
+
model_creators:
|
3 |
- Leonardo Zilio, Hadeel Saadany, Prashant Sharma, Diptesh Kanojia, Constantin Orasan
|
4 |
license: mit
|
5 |
tags:
|
|
|
51 |
|
52 |
## Model description
|
53 |
|
54 |
+
RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means
|
55 |
+
it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
|
56 |
+
publicly available data) with an automatic process to generate inputs and labels from those texts.
|
57 |
+
|
58 |
+
More precisely, it was pretrained with the Masked language modeling (MLM) objective. Taking a sentence, the model
|
59 |
+
randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict
|
60 |
+
the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one
|
61 |
+
after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to
|
62 |
+
learn a bidirectional representation of the sentence.
|
63 |
+
|
64 |
+
This way, the model learns an inner representation of the English language that can then be used to extract features
|
65 |
+
useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
|
66 |
+
classifier using the features produced by the BERT model as inputs.
|
67 |
|
68 |
## Intended uses & limitations
|
69 |
|