nik kon commited on
Commit
c0e8b2b
1 Parent(s): 2b8064c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -4,4 +4,4 @@ language: el
4
 
5
  ## gpt2-greek
6
 
7
- It is trained from scratch a generative Transformer model as GPT-2 on a large corpus of Greek text so that the model can generate long stretches of contiguous coherent text. The model is trained on a collection of almost 5GB Greek texts, with the main source to be from Greek Wikipedia. The content is extracted using the Wikiextractor tool (Attardi, 2012). The dataset is constructed as 5 sentences per sample (about 3.7 millions of samples) and the end of document is marked with the string <|endoftext|> providing the model with paragraph information, as done for the original GPT-2 training set by Radford . The input sentences are pre-processed and tokenized using 22,000 merges of byte-pair encoding. Attention dropouts with a rate of 0.1 are used for regularization on all layers and L2 weight decay of 0,01. In addition, a batch size of 4 and accumulated gradients over 8 iterations are used, resulting in an effective batch size of 32. The model uses the Adam optimization scheme with a learning rate of 1e-4 and is trained for 20 epochs. The learning rate increases linearly from zero over the first 9000 updates and decreases linearly by using a linear schedule. The model is trained until there is no progress in validation loss. The implementation is based on the open-source PyTorch-transformer library(HuggingFace 2019).
 
4
 
5
  ## gpt2-greek
6
 
7
+ It is trained from scratch a generative Transformer model as GPT-2 on a large corpus of Greek text so that the model can generate long stretches of contiguous coherent text. The model is trained on a collection of almost 5GB Greek texts, with the main source to be from Greek Wikipedia. The content is extracted using the Wikiextractor tool (Attardi, 2012). The dataset is constructed as 5 sentences per sample (about 3.7 millions of samples) and the end of document is marked with the string <|endoftext|> providing the model with paragraph information, as done for the original GPT-2 training set by Radford . The input sentences are pre-processed and tokenized using 22,000 merges of byte-pair encoding. Attention dropouts with a rate of 0.1 are used for regularization on all layers and L2 weight decay of 0,01. In addition, a batch size of 4 and accumulated gradients over 8 iterations are used, resulting in an effective batch size of 32. The model uses the Adam optimization scheme with a learning rate of 1e-4 and is trained for 20 epochs. The learning rate increases linearly from zero over the first 9000 updates and decreases linearly by using a linear schedule. The implementation is based on the open-source PyTorch-transformer library(HuggingFace 2019).