mmarimon commited on
Commit
96bf243
1 Parent(s): f49f71f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -5
README.md CHANGED
@@ -34,8 +34,9 @@ widget:
34
 
35
  - [Overview](#overview)
36
  - [Model Description](#model-description)
37
- - [How to Use](#how-to-use)
38
  - [Intended Uses and Limitations](#intended-uses-and-limitations)
 
 
39
  - [Training](#training)
40
  - [Training Data](#training-data)
41
  - [Training Procedure](#training-procedure)
@@ -61,6 +62,14 @@ widget:
61
  ## Model Description
62
  RoBERTa-base-bne is a transformer-based masked language model for the Spanish language. It is based on the [RoBERTa](https://arxiv.org/abs/1907.11692) base model and has been pre-trained using the largest Spanish corpus known to date, with a total of 570GB of clean and deduplicated text processed for this work, compiled from the web crawlings performed by the [National Library of Spain (Biblioteca Nacional de España)](http://www.bne.es/en/Inicio/index.html) from 2009 to 2019.
63
 
 
 
 
 
 
 
 
 
64
  ## How to Use
65
  You can use this model directly with a pipeline for fill mask. Since the generation relies on some randomness, we set a seed for reproducibility:
66
 
@@ -105,11 +114,9 @@ Here is how to use this model to get the features of a given text in PyTorch:
105
  torch.Size([1, 19, 768])
106
  ```
107
 
108
- ## Intended Uses and Limitations
109
-
110
- You can use the raw model for fill mask or fine-tune it to a downstream task.
111
 
112
- The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of unfiltered content from the internet, which is far from neutral. At the time of submission, no measures have been taken to estimate the bias and toxicity embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated. Nevertheless, here's an example of how the model can have biased predictions:
113
 
114
  ```python
115
  >>> from transformers import pipeline, set_seed
 
34
 
35
  - [Overview](#overview)
36
  - [Model Description](#model-description)
 
37
  - [Intended Uses and Limitations](#intended-uses-and-limitations)
38
+ - [How to Use](#how-to-use)
39
+ - [Limitations and bias](#limitations-and-bias)
40
  - [Training](#training)
41
  - [Training Data](#training-data)
42
  - [Training Procedure](#training-procedure)
 
62
  ## Model Description
63
  RoBERTa-base-bne is a transformer-based masked language model for the Spanish language. It is based on the [RoBERTa](https://arxiv.org/abs/1907.11692) base model and has been pre-trained using the largest Spanish corpus known to date, with a total of 570GB of clean and deduplicated text processed for this work, compiled from the web crawlings performed by the [National Library of Spain (Biblioteca Nacional de España)](http://www.bne.es/en/Inicio/index.html) from 2009 to 2019.
64
 
65
+
66
+ ## Intended Uses and Limitations
67
+
68
+ You can use the raw model for fill mask or fine-tune it to a downstream task.
69
+
70
+
71
+
72
+
73
  ## How to Use
74
  You can use this model directly with a pipeline for fill mask. Since the generation relies on some randomness, we set a seed for reproducibility:
75
 
 
114
  torch.Size([1, 19, 768])
115
  ```
116
 
117
+ ## Limitations and bias
 
 
118
 
119
+ At the time of submission, no measures have been taken to estimate the bias and toxicity embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated. Nevertheless, here's an example of how the model can have biased predictions:
120
 
121
  ```python
122
  >>> from transformers import pipeline, set_seed