crumb
/

gpt-j-6b-shakespeare

Text Generation

Model card Files Files and versions Community

crumb commited on Jul 8, 2022

Commit

1d44aac

•

1 Parent(s): 110184d

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -40,7 +40,9 @@ This checkpoint was afterwards finetuned on [tiny_shakespeare](https://huggingfa
 | batch size | 8 |
 | context length (tokens) | 256 |
-I used a modified version of [hivemind's 8bit training script](https://huggingface.co/hivemind/gpt-j-6B-8bit) on 1 Tesla T4 for ~15 minutes
 No LORA adapters were used for the sake of easy loading and inference with 🤗. Only Linear biases and LayerNorm scales were passed to the optimizer.

 | batch size | 8 |
 | context length (tokens) | 256 |
+Trained on 1 Tesla T4 (à la [google colab](https://colab.research.google.com/)) for ~15 minutes
+A good starting point to finetune your own gpt-j-6b would be [hivemind's 8bit training code](https://huggingface.co/hivemind/gpt-j-6B-8bit).
 No LORA adapters were used for the sake of easy loading and inference with 🤗. Only Linear biases and LayerNorm scales were passed to the optimizer.