Milos commited on
Commit
1ca9a66
1 Parent(s): 103d8be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -33,7 +33,7 @@ Model is based on [GPT-J](https://github.com/kingoflolz/mesh-transformer-jax/) a
33
  ## Training data
34
 
35
  Slovak GPT-J models were trained on a privately collected dataset consisting of predominantly Slovak text spanning different categories, e.g. web, news articles or even biblical texts - in total, over 40GB of text data was used to train this model.
36
- The dataset was preprocessed and cleaned in a specific way that involves minor but a few caveats, so in order to achieve the expected performance, feel free to refer to [How to use](###How to use) section. Please, keep in mind that despite the effort to remove inappropriate corpus, the model still might generate sensitive content or leak sensitive information.
37
 
38
  ## Training procedure
39
 
@@ -125,7 +125,7 @@ Since the dataset contains profanity, politically incorrect language, and (unint
125
 
126
  ## Citation and Related Information
127
 
128
- This was done as a moonlighting project during summer of 2021 to better understand transformers. I didn't have much free to open source it properly, so it all sat on my hard drive until now :) Based on the popularity and interest in this model I might release _substantially_ larger versions of Slovak GPT-J models that are way more capable.
129
 
130
  If you use this model or have any questions about it feel free to hit me up at [twitter](https://twitter.com/miloskondela) or check out my [github](https://github.com/kondela) profile.
131
 
33
  ## Training data
34
 
35
  Slovak GPT-J models were trained on a privately collected dataset consisting of predominantly Slovak text spanning different categories, e.g. web, news articles or even biblical texts - in total, over 40GB of text data was used to train this model.
36
+ The dataset was preprocessed and cleaned in a specific way that involves minor but a few caveats, so in order to achieve the expected performance, feel free to refer to [How to use] section. Please, keep in mind that despite the effort to remove inappropriate corpus, the model still might generate sensitive content or leak sensitive information.
37
 
38
  ## Training procedure
39
 
125
 
126
  ## Citation and Related Information
127
 
128
+ This was done as a moonlighting project during summer of 2021 to better understand transformers. I didn't have much free time to open source it properly, so it all sat on my hard drive until now :)
129
 
130
  If you use this model or have any questions about it feel free to hit me up at [twitter](https://twitter.com/miloskondela) or check out my [github](https://github.com/kondela) profile.
131