asier-gutierrez commited on
Commit
3c50d18
1 Parent(s): 16af6de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -20,6 +20,12 @@ widget:
20
 
21
  # RoBERTa large trained with data from National Library of Spain (BNE)
22
 
 
 
 
 
 
 
23
  ## Citing
24
  Check out our paper for all the details: https://arxiv.org/abs/2107.07253
25
 
@@ -33,4 +39,8 @@ Check out our paper for all the details: https://arxiv.org/abs/2107.07253
33
  primaryClass={cs.CL}
34
  }
35
  ```
36
- For more information visit our [GitHub repository](https://github.com/PlanTL-SANIDAD/lm-spanish)
 
 
 
 
 
20
 
21
  # RoBERTa large trained with data from National Library of Spain (BNE)
22
 
23
+ ## Introduction
24
+ This work presents the Spanish RoBERTa-large model. The model has been pre-trained using the largest Spanish corpus known to date, with a total of 570GB of clean and deduplicated text processed for this work, compiled from the web crawlings performed by the National Library of Spain from 2009 to 2019.
25
+
26
+ ## Evaluation
27
+ For evaluation details visit our [GitHub repository](https://github.com/PlanTL-SANIDAD/lm-spanish).
28
+
29
  ## Citing
30
  Check out our paper for all the details: https://arxiv.org/abs/2107.07253
31
 
 
39
  primaryClass={cs.CL}
40
  }
41
  ```
42
+
43
+ ## Corpora
44
+ | Corpora | Number of documents | Size (GB) |
45
+ |---------|---------------------|-----------|
46
+ | BNE | 201,080,084 | 570GB |