Pablogps commited on
Commit
3926be2
1 Parent(s): e1123e2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ This is a **RoBERTa-base** model trained from scratch in Spanish.
2
+
3
+ The training dataset is mc4 (1) subsampling documents to a total of about 50 million examples. Sampling is biased towards average perplexity values (defining perplexity boundaries based on quartiles), discarding more often documents with very large values (Q4, poor quality) of very small values (Q1, short, repetitive texts).
4
+
5
+ This model has been trained for 250.000 steps.
6
+
7
+ (1) https://huggingface.co/datasets/bertin-project/mc4-es-sampled