Milos commited on
Commit
ce46daa
1 Parent(s): 9ae2a34

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -9,7 +9,7 @@ license: gpl-3.0
9
  ---
10
 
11
  # Slovak GPT-J-405M
12
- Slovak GPT-J-405M is the second model released in Slovak GPT-J series after its smaller variant [Slovak GPT-J-162M](https://huggingface.co/Milos/slovak-gpt-j-162M).
13
  ## Model Description
14
  Model is based on [GPT-J](https://github.com/kingoflolz/mesh-transformer-jax/) and has over 405M trainable parameters.
15
 
@@ -37,7 +37,7 @@ The dataset was preprocessed and cleaned in a specific way that involves minor b
37
 
38
  ## Training procedure
39
 
40
- This model was trained for a bit more than 36.5 billion tokens over 69,001 steps on TPU v3-8 pod. The cross-entropy validation loss at the last step was 2.821.
41
 
42
  ## Intended Use
43
 
@@ -122,7 +122,7 @@ Since the dataset contains profanity, politically incorrect language, and (unint
122
 
123
  ## Citation and Related Information
124
 
125
- This was done as a moonlighting project during summer of 2021 to better understand transformers. I didn't have much free time to open source it properly, so it all sat on my hard drive until now :) Based on the popularity and interest in this model I might release _substantially_ larger versions of Slovak GPT-J models that are way more capable.
126
 
127
  If you use this model or have any questions about it feel free to hit me up at [twitter](https://twitter.com/miloskondela) or check out my [github](https://github.com/kondela) profile.
128
 
 
9
  ---
10
 
11
  # Slovak GPT-J-405M
12
+ Slovak GPT-J-405M is the second model released in Slovak GPT-J series after its smaller variant [Slovak GPT-J-162M](https://huggingface.co/Milos/slovak-gpt-j-162M). Since then a larger [Slovak GPT-J-1.4B](https://huggingface.co/Milos/slovak-gpt-j-1.4B) was released.
13
  ## Model Description
14
  Model is based on [GPT-J](https://github.com/kingoflolz/mesh-transformer-jax/) and has over 405M trainable parameters.
15
 
 
37
 
38
  ## Training procedure
39
 
40
+ This model was trained for a bit more than 36.5 billion tokens over 69,001 steps on TPU v3-8 pod. The cross-entropy validation loss at the last step was `2.821`.
41
 
42
  ## Intended Use
43
 
 
122
 
123
  ## Citation and Related Information
124
 
125
+ This was done as a moonlighting project during summer of 2021 to better understand transformers. I didn't have much free time to open source it properly, so it all sat on my hard drive until now :)
126
 
127
  If you use this model or have any questions about it feel free to hit me up at [twitter](https://twitter.com/miloskondela) or check out my [github](https://github.com/kondela) profile.
128