PlanTL-GOB-ES
/

gpt2-large-bne

Text Generation

national library of spain

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

joanllop commited on Nov 9, 2022

Commit

761860a

•

1 Parent(s): 331bbac

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -33,7 +33,6 @@ widget:
 - [Intended Uses and Limitations](#intended-uses-and-limitations)
 - [Training](#training)
   - [Training Data](#training-data)
-  - [tokenization and pre-training](#tokenization-and-pre-training)
   - [Training Procedure](#training-procedure)
 - [Additional Information](#additional-information)
    - [Authors](#authors)
@@ -131,13 +130,13 @@ Some of the statistics of the corpus:
 |---------|---------------------|------------------|-----------|
 | BNE     |         201,080,084 |  135,733,450,668 |     570GB |
-### Tokenization and pre-training
-The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [GPT-2](http://www.persagen.com/files/misc/radford2019language.pdf) model with a vocabulary size of 50,262 tokens. The GPT2-large-bne pre-training consists of an autoregressive language model training that follows the approach of the GPT-2. The training lasted a total of 10 days with 32 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
 ### Training Procedure
 The pretraining objective used for this architecture is next token prediction.
 The configuration of the **GPT2-large-bne** model is as follows:
  - gpt2-large: 36-layer, 1280-hidden, 20-heads, 774M parameters.
 ## Additional Information
@@ -178,7 +177,7 @@ This work was funded by the Spanish State Secretariat for Digitalization and Art
 This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
-## Copyright
 Copyright by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) (2022)

 - [Intended Uses and Limitations](#intended-uses-and-limitations)
 - [Training](#training)
   - [Training Data](#training-data)
   - [Training Procedure](#training-procedure)
 - [Additional Information](#additional-information)
    - [Authors](#authors)
 |---------|---------------------|------------------|-----------|
 | BNE     |         201,080,084 |  135,733,450,668 |     570GB |
 ### Training Procedure
 The pretraining objective used for this architecture is next token prediction.
 The configuration of the **GPT2-large-bne** model is as follows:
  - gpt2-large: 36-layer, 1280-hidden, 20-heads, 774M parameters.
+The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [GPT-2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) model with a vocabulary size of 50,262 tokens.
+The GPT2-large-bne pre-training consists of an autoregressive language model training that follows the approach of the GPT-2.
+The training lasted a total of 10 days with 32 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
 ## Additional Information
 This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+### Copyright
 Copyright by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) (2022)