Milos
/

slovak-gpt-j-405M

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Milos commited on Feb 18, 2022

Commit

ce46daa

•

1 Parent(s): 9ae2a34

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ license: gpl-3.0
 ---
 # Slovak GPT-J-405M
-Slovak GPT-J-405M is the second model released in Slovak GPT-J series after its smaller variant [Slovak GPT-J-162M](https://huggingface.co/Milos/slovak-gpt-j-162M).
 ## Model Description
 Model is based on [GPT-J](https://github.com/kingoflolz/mesh-transformer-jax/) and has over 405M trainable parameters.
@@ -37,7 +37,7 @@ The dataset was preprocessed and cleaned in a specific way that involves minor b
 ## Training procedure
-This model was trained for a bit more than 36.5 billion tokens over 69,001 steps on TPU v3-8 pod. The cross-entropy validation loss at the last step was 2.821.
 ## Intended Use
@@ -122,7 +122,7 @@ Since the dataset contains profanity, politically incorrect language, and (unint
 ## Citation and Related Information
-This was done as a moonlighting project during summer of 2021 to better understand transformers. I didn't have much free time to open source it properly, so it all sat on my hard drive until now :) Based on the popularity and interest in this model I might release _substantially_ larger versions of Slovak GPT-J models that are way more capable.
 If you use this model or have any questions about it feel free to hit me up at [twitter](https://twitter.com/miloskondela) or check out my [github](https://github.com/kondela) profile.

 ---
 # Slovak GPT-J-405M
+Slovak GPT-J-405M is the second model released in Slovak GPT-J series after its smaller variant [Slovak GPT-J-162M](https://huggingface.co/Milos/slovak-gpt-j-162M). Since then a larger [Slovak GPT-J-1.4B](https://huggingface.co/Milos/slovak-gpt-j-1.4B) was released.
 ## Model Description
 Model is based on [GPT-J](https://github.com/kingoflolz/mesh-transformer-jax/) and has over 405M trainable parameters.
 ## Training procedure
+This model was trained for a bit more than 36.5 billion tokens over 69,001 steps on TPU v3-8 pod. The cross-entropy validation loss at the last step was `2.821`.
 ## Intended Use
 ## Citation and Related Information
+This was done as a moonlighting project during summer of 2021 to better understand transformers. I didn't have much free time to open source it properly, so it all sat on my hard drive until now :)
 If you use this model or have any questions about it feel free to hit me up at [twitter](https://twitter.com/miloskondela) or check out my [github](https://github.com/kondela) profile.