Update README.md
Browse files
README.md
CHANGED
@@ -34,8 +34,9 @@ widget:
|
|
34 |
|
35 |
- [Overview](#overview)
|
36 |
- [Model Description](#model-description)
|
37 |
-
- [How to Use](#how-to-use)
|
38 |
- [Intended Uses and Limitations](#intended-uses-and-limitations)
|
|
|
|
|
39 |
- [Training](#training)
|
40 |
- [Training Data](#training-data)
|
41 |
- [Training Procedure](#training-procedure)
|
@@ -61,6 +62,14 @@ widget:
|
|
61 |
## Model Description
|
62 |
RoBERTa-base-bne is a transformer-based masked language model for the Spanish language. It is based on the [RoBERTa](https://arxiv.org/abs/1907.11692) base model and has been pre-trained using the largest Spanish corpus known to date, with a total of 570GB of clean and deduplicated text processed for this work, compiled from the web crawlings performed by the [National Library of Spain (Biblioteca Nacional de España)](http://www.bne.es/en/Inicio/index.html) from 2009 to 2019.
|
63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
## How to Use
|
65 |
You can use this model directly with a pipeline for fill mask. Since the generation relies on some randomness, we set a seed for reproducibility:
|
66 |
|
@@ -105,11 +114,9 @@ Here is how to use this model to get the features of a given text in PyTorch:
|
|
105 |
torch.Size([1, 19, 768])
|
106 |
```
|
107 |
|
108 |
-
##
|
109 |
-
|
110 |
-
You can use the raw model for fill mask or fine-tune it to a downstream task.
|
111 |
|
112 |
-
|
113 |
|
114 |
```python
|
115 |
>>> from transformers import pipeline, set_seed
|
|
|
34 |
|
35 |
- [Overview](#overview)
|
36 |
- [Model Description](#model-description)
|
|
|
37 |
- [Intended Uses and Limitations](#intended-uses-and-limitations)
|
38 |
+
- [How to Use](#how-to-use)
|
39 |
+
- [Limitations and bias](#limitations-and-bias)
|
40 |
- [Training](#training)
|
41 |
- [Training Data](#training-data)
|
42 |
- [Training Procedure](#training-procedure)
|
|
|
62 |
## Model Description
|
63 |
RoBERTa-base-bne is a transformer-based masked language model for the Spanish language. It is based on the [RoBERTa](https://arxiv.org/abs/1907.11692) base model and has been pre-trained using the largest Spanish corpus known to date, with a total of 570GB of clean and deduplicated text processed for this work, compiled from the web crawlings performed by the [National Library of Spain (Biblioteca Nacional de España)](http://www.bne.es/en/Inicio/index.html) from 2009 to 2019.
|
64 |
|
65 |
+
|
66 |
+
## Intended Uses and Limitations
|
67 |
+
|
68 |
+
You can use the raw model for fill mask or fine-tune it to a downstream task.
|
69 |
+
|
70 |
+
|
71 |
+
|
72 |
+
|
73 |
## How to Use
|
74 |
You can use this model directly with a pipeline for fill mask. Since the generation relies on some randomness, we set a seed for reproducibility:
|
75 |
|
|
|
114 |
torch.Size([1, 19, 768])
|
115 |
```
|
116 |
|
117 |
+
## Limitations and bias
|
|
|
|
|
118 |
|
119 |
+
At the time of submission, no measures have been taken to estimate the bias and toxicity embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated. Nevertheless, here's an example of how the model can have biased predictions:
|
120 |
|
121 |
```python
|
122 |
>>> from transformers import pipeline, set_seed
|