joanllop commited on
Commit
f20520c
1 Parent(s): 61093f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md CHANGED
@@ -74,12 +74,103 @@ You can use the raw model for fill mask or fine-tune it to a downstream task.
74
 
75
  ## How to use
76
  Here is how to use this model:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
 
79
  ## Limitations and bias
80
 
81
  At the time of submission, no measures have been taken to estimate the bias and toxicity embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated. Nevertheless, here's an example of how the model can have biased predictions:
82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
  ## Training
85
 
 
74
 
75
  ## How to use
76
  Here is how to use this model:
77
+ ```
78
+ python
79
+ >>> from transformers import pipeline
80
+ >>> from pprint import pprint
81
+ >>> unmasker = pipeline('fill-mask', model='PlanTL-GOB-ES/roberta-base-bne')
82
+ >>> pprint(unmasker("Gracias a los datos de la BNE se ha podido <mask> este modelo del lenguaje."))
83
+ [{'score': 0.08422081917524338,
84
+ 'token': 3832,
85
+ 'token_str': ' desarrollar',
86
+ 'sequence': 'Gracias a los datos de la BNE se ha podido desarrollar este modelo del lenguaje.'},
87
+ {'score': 0.06348305940628052,
88
+ 'token': 3078,
89
+ 'token_str': ' crear',
90
+ 'sequence': 'Gracias a los datos de la BNE se ha podido crear este modelo del lenguaje.'},
91
+ {'score': 0.06148449331521988,
92
+ 'token': 2171,
93
+ 'token_str': ' realizar',
94
+ 'sequence': 'Gracias a los datos de la BNE se ha podido realizar este modelo del lenguaje.'},
95
+ {'score': 0.056218471378088,
96
+ 'token': 10880,
97
+ 'token_str': ' elaborar',
98
+ 'sequence': 'Gracias a los datos de la BNE se ha podido elaborar este modelo del lenguaje.'},
99
+ {'score': 0.05133328214287758,
100
+ 'token': 31915,
101
+ 'token_str': ' validar',
102
+ 'sequence': 'Gracias a los datos de la BNE se ha podido validar este modelo del lenguaje.'}]
103
+ ```
104
+
105
+ Here is how to use this model to get the features of a given text in PyTorch:
106
+
107
+ ```python
108
+ >>> from transformers import RobertaTokenizer, RobertaModel
109
+ >>> tokenizer = RobertaTokenizer.from_pretrained('PlanTL-GOB-ES/roberta-base-bne')
110
+ >>> model = RobertaModel.from_pretrained('PlanTL-GOB-ES/roberta-base-bne')
111
+ >>> text = "Gracias a los datos de la BNE se ha podido desarrollar este modelo del lenguaje."
112
+ >>> encoded_input = tokenizer(text, return_tensors='pt')
113
+ >>> output = model(**encoded_input)
114
+ >>> print(output.last_hidden_state.shape)
115
+ torch.Size([1, 19, 768])
116
+ ```
117
 
118
 
119
  ## Limitations and bias
120
 
121
  At the time of submission, no measures have been taken to estimate the bias and toxicity embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated. Nevertheless, here's an example of how the model can have biased predictions:
122
 
123
+ ```python
124
+ >>> from transformers import pipeline, set_seed
125
+ >>> from pprint import pprint
126
+ >>> unmasker = pipeline('fill-mask', model='PlanTL-GOB-ES/roberta-base-bne')
127
+ >>> set_seed(42)
128
+ >>> pprint(unmasker("Antonio está pensando en <mask>."))
129
+ [{'score': 0.07950365543365479,
130
+ 'sequence': 'Antonio está pensando en ti.',
131
+ 'token': 486,
132
+ 'token_str': ' ti'},
133
+ {'score': 0.03375273942947388,
134
+ 'sequence': 'Antonio está pensando en irse.',
135
+ 'token': 13134,
136
+ 'token_str': ' irse'},
137
+ {'score': 0.031026942655444145,
138
+ 'sequence': 'Antonio está pensando en casarse.',
139
+ 'token': 24852,
140
+ 'token_str': ' casarse'},
141
+ {'score': 0.030703715980052948,
142
+ 'sequence': 'Antonio está pensando en todo.',
143
+ 'token': 665,
144
+ 'token_str': ' todo'},
145
+ {'score': 0.02838558703660965,
146
+ 'sequence': 'Antonio está pensando en ello.',
147
+ 'token': 1577,
148
+ 'token_str': ' ello'}]
149
+
150
+ >>> set_seed(42)
151
+ >>> pprint(unmasker("Mohammed está pensando en <mask>."))
152
+ [{'score': 0.05433618649840355,
153
+ 'sequence': 'Mohammed está pensando en morir.',
154
+ 'token': 9459,
155
+ 'token_str': ' morir'},
156
+ {'score': 0.0400255024433136,
157
+ 'sequence': 'Mohammed está pensando en irse.',
158
+ 'token': 13134,
159
+ 'token_str': ' irse'},
160
+ {'score': 0.03705748915672302,
161
+ 'sequence': 'Mohammed está pensando en todo.',
162
+ 'token': 665,
163
+ 'token_str': ' todo'},
164
+ {'score': 0.03658654913306236,
165
+ 'sequence': 'Mohammed está pensando en quedarse.',
166
+ 'token': 9331,
167
+ 'token_str': ' quedarse'},
168
+ {'score': 0.03329474478960037,
169
+ 'sequence': 'Mohammed está pensando en ello.',
170
+ 'token': 1577,
171
+ 'token_str': ' ello'}]
172
+ ```
173
+
174
 
175
  ## Training
176