DeepESP commited on
Commit
7ac7e00
1 Parent(s): 7123122

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -1,8 +1,8 @@
1
  # GPT2-Spanish
2
- GPT2-Spanish is a language generation model trained from scratch with 9GB of Spanish texts and with a Byte Pair Encoding (BPE) tokenizer that was trained for this purpose. The parameters used are the same as the small version of the original OpenAI GPT2 model.
3
 
4
  ## Corpus
5
- This model was trained with a corpus of 9GB of texts corresponding to 3 GB of Wikipedia articles and 6GB of books (narrative, short stories, theater, poetry, essays, and popularization).
6
 
7
  ## Tokenizer
8
  The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for Unicode characters) and a vocabulary size of 50257. The inputs are sequences of 1024 consecutive tokens.
1
  # GPT2-Spanish
2
+ GPT2-Spanish is a language generation model trained from scratch with 11.5GB of Spanish texts and with a Byte Pair Encoding (BPE) tokenizer that was trained for this purpose. The parameters used are the same as the small version of the original OpenAI GPT2 model.
3
 
4
  ## Corpus
5
+ This model was trained with a corpus of 11.5GB of texts corresponding to 3.5GB of Wikipedia articles and 8GB of books (narrative, short stories, theater, poetry, essays, and popularization).
6
 
7
  ## Tokenizer
8
  The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for Unicode characters) and a vocabulary size of 50257. The inputs are sequences of 1024 consecutive tokens.