gonzalez-agirre commited on
Commit
4810598
1 Parent(s): 74834e9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -3
README.md CHANGED
@@ -113,12 +113,10 @@ Once the model has been successfully initialized, we continue its pre-training i
113
  | Dataset | Language | Tokens (pre-epoch) | Epochs |
114
  |---------------------|----------|--------------------|--------------|
115
  | Wikipedia | en | 2169.97M | 1.428144485 |
116
- | Lyrics | en | 100.60M | 0.7140722425 |
117
  | C4_es | es | 53709.80M | 0.1049686196 |
118
  | Biomedical | es | 455.03M | 0.7140722425 |
119
  | Legal | es | 995.70M | 0.7140722425 |
120
  | Wikipedia | es | 693.60M | 1.428144485 |
121
- | Lyrics | es | 125.93M | 0.7140722425 |
122
  | Gutenberg | es | 53.18M | 0.7140722425 |
123
  | C4_ca | ca | 2826.00M | 2.142216727 |
124
  | Biomedical | ca | 11.80M | 1.428144485 |
@@ -127,7 +125,6 @@ Once the model has been successfully initialized, we continue its pre-training i
127
  | CaWaC | ca | 57.79M | 2.142216727 |
128
  | Wikipedia | ca | 228.01M | 3.570361212 |
129
  | Vilaweb | ca | 50.34M | 2.142216727 |
130
- | Lyrics | ca | 0.50M | 2.142216727 |
131
 
132
  The resulting dataset has the following language distribution:
133
 
 
113
  | Dataset | Language | Tokens (pre-epoch) | Epochs |
114
  |---------------------|----------|--------------------|--------------|
115
  | Wikipedia | en | 2169.97M | 1.428144485 |
 
116
  | C4_es | es | 53709.80M | 0.1049686196 |
117
  | Biomedical | es | 455.03M | 0.7140722425 |
118
  | Legal | es | 995.70M | 0.7140722425 |
119
  | Wikipedia | es | 693.60M | 1.428144485 |
 
120
  | Gutenberg | es | 53.18M | 0.7140722425 |
121
  | C4_ca | ca | 2826.00M | 2.142216727 |
122
  | Biomedical | ca | 11.80M | 1.428144485 |
 
125
  | CaWaC | ca | 57.79M | 2.142216727 |
126
  | Wikipedia | ca | 228.01M | 3.570361212 |
127
  | Vilaweb | ca | 50.34M | 2.142216727 |
 
128
 
129
  The resulting dataset has the following language distribution:
130