joanllop commited on
Commit
761860a
1 Parent(s): 331bbac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -33,7 +33,6 @@ widget:
33
  - [Intended Uses and Limitations](#intended-uses-and-limitations)
34
  - [Training](#training)
35
  - [Training Data](#training-data)
36
- - [tokenization and pre-training](#tokenization-and-pre-training)
37
  - [Training Procedure](#training-procedure)
38
  - [Additional Information](#additional-information)
39
  - [Authors](#authors)
@@ -131,13 +130,13 @@ Some of the statistics of the corpus:
131
  |---------|---------------------|------------------|-----------|
132
  | BNE | 201,080,084 | 135,733,450,668 | 570GB |
133
 
134
- ### Tokenization and pre-training
135
- The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [GPT-2](http://www.persagen.com/files/misc/radford2019language.pdf) model with a vocabulary size of 50,262 tokens. The GPT2-large-bne pre-training consists of an autoregressive language model training that follows the approach of the GPT-2. The training lasted a total of 10 days with 32 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
136
-
137
  ### Training Procedure
138
  The pretraining objective used for this architecture is next token prediction.
139
  The configuration of the **GPT2-large-bne** model is as follows:
140
  - gpt2-large: 36-layer, 1280-hidden, 20-heads, 774M parameters.
 
 
 
141
 
142
  ## Additional Information
143
 
@@ -178,7 +177,7 @@ This work was funded by the Spanish State Secretariat for Digitalization and Art
178
 
179
  This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
180
 
181
- ## Copyright
182
 
183
  Copyright by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) (2022)
184
 
 
33
  - [Intended Uses and Limitations](#intended-uses-and-limitations)
34
  - [Training](#training)
35
  - [Training Data](#training-data)
 
36
  - [Training Procedure](#training-procedure)
37
  - [Additional Information](#additional-information)
38
  - [Authors](#authors)
 
130
  |---------|---------------------|------------------|-----------|
131
  | BNE | 201,080,084 | 135,733,450,668 | 570GB |
132
 
 
 
 
133
  ### Training Procedure
134
  The pretraining objective used for this architecture is next token prediction.
135
  The configuration of the **GPT2-large-bne** model is as follows:
136
  - gpt2-large: 36-layer, 1280-hidden, 20-heads, 774M parameters.
137
+ The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [GPT-2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) model with a vocabulary size of 50,262 tokens.
138
+ The GPT2-large-bne pre-training consists of an autoregressive language model training that follows the approach of the GPT-2.
139
+ The training lasted a total of 10 days with 32 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
140
 
141
  ## Additional Information
142
 
 
177
 
178
  This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
179
 
180
+ ### Copyright
181
 
182
  Copyright by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) (2022)
183