crumb
/

fake-gpt-j-17m

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

crumb commited on Nov 3, 2022

Commit

30bdecb

•

1 Parent(s): 755a695

Update README.md

Files changed (1) hide show

README.md +5 -11

README.md CHANGED Viewed

@@ -11,17 +11,14 @@ should probably proofread and complete it, then remove this comment. -->
 # gpt-fake-lang-17m
-This model is a pre-trained GPT2 (with 17m parameters) on a synthetic dataset (1gb of documents created in 4 fake languages, each with a formal and informal writing style).
 It achieves the following results on the evaluation set:
 - Loss: 3.5592
-## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
@@ -33,12 +30,9 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.001
-- train_batch_size: 4
-- eval_batch_size: 4
 - seed: 42
-- gradient_accumulation_steps: 16
-- total_train_batch_size: 64
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 1

 # gpt-fake-lang-17m
+This model is a pre-trained GPTJ (with 17m parameters) on a synthetic dataset (1gb of documents created in 4 fake languages, each with a formal and informal writing style) for 1 epoch.
 It achieves the following results on the evaluation set:
 - Loss: 3.5592
 ## Intended uses & limitations
+This model is to be used as a base model for fine-tuning any language/task to probe the effectiveness of both pre-training on an algorithmically generated corpus and extremely small models. It can only generate text based on its training data (which will be uploaded as a huggingface dataset soon).
 ## Training and evaluation data
 The following hyperparameters were used during training:
 - learning_rate: 0.001
+- batch_size 64
 - seed: 42
+- optimizer: Adam
 - lr_scheduler_type: linear
 - num_epochs: 1