HeyLucasLeao commited on
Commit
fa013b0
1 Parent(s): 26a826f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -1,18 +1,19 @@
1
  ## GPT-Neo Small Portuguese
2
 
3
- ##### Model Description
4
  This is a finetuned version from GPT-Neo 125M by EletheurAI to Portuguese language.
5
 
6
- ##### Training data
7
  It was training from 227,382 selected texts from a PTWiki Dump. You can found all the data from here: https://archive.org/details/ptwiki-dump-20210520
8
 
9
- ##### Training Procedure
10
  Every text was passed through a GPT2-Tokenizer with bos and eos tokens to separate it, with max sequence length that the GPT-Neo could support. It was finetuned using the default metrics of the Trainer Class, available on the Hugging Face library.
11
 
12
  ##### Learning Rate: **2e-4**
13
  ##### Epochs: **1**
14
 
15
- ##### Goals
 
16
  My true intention was totally educational, thus making available a Portuguese version of this model.
17
 
18
  How to use
@@ -45,8 +46,8 @@ sample_outputs = model.generate(generated,
45
 
46
  # Decoding and printing sequences
47
  for i, sample_output in enumerate(sample_outputs):
48
- print(">> Generated text {}\
49
- \
50
  {}".format(i+1, tokenizer.decode(sample_output.tolist())))
51
 
52
  # >> Generated text
 
1
  ## GPT-Neo Small Portuguese
2
 
3
+ #### Model Description
4
  This is a finetuned version from GPT-Neo 125M by EletheurAI to Portuguese language.
5
 
6
+ #### Training data
7
  It was training from 227,382 selected texts from a PTWiki Dump. You can found all the data from here: https://archive.org/details/ptwiki-dump-20210520
8
 
9
+ #### Training Procedure
10
  Every text was passed through a GPT2-Tokenizer with bos and eos tokens to separate it, with max sequence length that the GPT-Neo could support. It was finetuned using the default metrics of the Trainer Class, available on the Hugging Face library.
11
 
12
  ##### Learning Rate: **2e-4**
13
  ##### Epochs: **1**
14
 
15
+ #### Goals
16
+
17
  My true intention was totally educational, thus making available a Portuguese version of this model.
18
 
19
  How to use
 
46
 
47
  # Decoding and printing sequences
48
  for i, sample_output in enumerate(sample_outputs):
49
+ print(">> Generated text {}\\
50
+ \\
51
  {}".format(i+1, tokenizer.decode(sample_output.tolist())))
52
 
53
  # >> Generated text