Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,6 @@ widget:
|
|
33 |
- [Intended Uses and Limitations](#intended-uses-and-limitations)
|
34 |
- [Training](#training)
|
35 |
- [Training Data](#training-data)
|
36 |
-
- [tokenization and pre-training](#tokenization-and-pre-training)
|
37 |
- [Training Procedure](#training-procedure)
|
38 |
- [Additional Information](#additional-information)
|
39 |
- [Authors](#authors)
|
@@ -131,13 +130,13 @@ Some of the statistics of the corpus:
|
|
131 |
|---------|---------------------|------------------|-----------|
|
132 |
| BNE | 201,080,084 | 135,733,450,668 | 570GB |
|
133 |
|
134 |
-
### Tokenization and pre-training
|
135 |
-
The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [GPT-2](http://www.persagen.com/files/misc/radford2019language.pdf) model with a vocabulary size of 50,262 tokens. The GPT2-large-bne pre-training consists of an autoregressive language model training that follows the approach of the GPT-2. The training lasted a total of 10 days with 32 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
|
136 |
-
|
137 |
### Training Procedure
|
138 |
The pretraining objective used for this architecture is next token prediction.
|
139 |
The configuration of the **GPT2-large-bne** model is as follows:
|
140 |
- gpt2-large: 36-layer, 1280-hidden, 20-heads, 774M parameters.
|
|
|
|
|
|
|
141 |
|
142 |
## Additional Information
|
143 |
|
@@ -178,7 +177,7 @@ This work was funded by the Spanish State Secretariat for Digitalization and Art
|
|
178 |
|
179 |
This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
180 |
|
181 |
-
|
182 |
|
183 |
Copyright by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) (2022)
|
184 |
|
|
|
33 |
- [Intended Uses and Limitations](#intended-uses-and-limitations)
|
34 |
- [Training](#training)
|
35 |
- [Training Data](#training-data)
|
|
|
36 |
- [Training Procedure](#training-procedure)
|
37 |
- [Additional Information](#additional-information)
|
38 |
- [Authors](#authors)
|
|
|
130 |
|---------|---------------------|------------------|-----------|
|
131 |
| BNE | 201,080,084 | 135,733,450,668 | 570GB |
|
132 |
|
|
|
|
|
|
|
133 |
### Training Procedure
|
134 |
The pretraining objective used for this architecture is next token prediction.
|
135 |
The configuration of the **GPT2-large-bne** model is as follows:
|
136 |
- gpt2-large: 36-layer, 1280-hidden, 20-heads, 774M parameters.
|
137 |
+
The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [GPT-2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) model with a vocabulary size of 50,262 tokens.
|
138 |
+
The GPT2-large-bne pre-training consists of an autoregressive language model training that follows the approach of the GPT-2.
|
139 |
+
The training lasted a total of 10 days with 32 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
|
140 |
|
141 |
## Additional Information
|
142 |
|
|
|
177 |
|
178 |
This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
179 |
|
180 |
+
### Copyright
|
181 |
|
182 |
Copyright by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) (2022)
|
183 |
|