BSC-LT
/

roberta-large-bne

national library of spain

Inference Endpoints

Model card Files Files and versions Community

asier-gutierrez commited on Aug 2, 2021

Commit

3c50d18

•

1 Parent(s): 16af6de

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -20,6 +20,12 @@ widget:
 # RoBERTa large trained with data from National Library of Spain (BNE)
 ## Citing
 Check out our paper for all the details: https://arxiv.org/abs/2107.07253
@@ -33,4 +39,8 @@ Check out our paper for all the details: https://arxiv.org/abs/2107.07253
       primaryClass={cs.CL}
 }
 ```
-For more information visit our [GitHub repository](https://github.com/PlanTL-SANIDAD/lm-spanish)

 # RoBERTa large trained with data from National Library of Spain (BNE)
+## Introduction
+This work presents the Spanish RoBERTa-large model. The model has been pre-trained using the largest Spanish corpus known to date, with a total of 570GB of clean and deduplicated text processed for this work, compiled from the web crawlings performed by the National Library of Spain from 2009 to 2019.
+## Evaluation
+For evaluation details visit our [GitHub repository](https://github.com/PlanTL-SANIDAD/lm-spanish).
 ## Citing
 Check out our paper for all the details: https://arxiv.org/abs/2107.07253
       primaryClass={cs.CL}
 }
 ```
+## Corpora
+| Corpora | Number of documents | Size (GB) |
+|---------|---------------------|-----------|
+| BNE     |         201,080,084 |     570GB |