crumb
commited on
Commit
•
1d44aac
1
Parent(s):
110184d
Update README.md
Browse files
README.md
CHANGED
@@ -40,7 +40,9 @@ This checkpoint was afterwards finetuned on [tiny_shakespeare](https://huggingfa
|
|
40 |
| batch size | 8 |
|
41 |
| context length (tokens) | 256 |
|
42 |
|
43 |
-
|
|
|
|
|
44 |
|
45 |
No LORA adapters were used for the sake of easy loading and inference with 🤗. Only Linear biases and LayerNorm scales were passed to the optimizer.
|
46 |
|
|
|
40 |
| batch size | 8 |
|
41 |
| context length (tokens) | 256 |
|
42 |
|
43 |
+
Trained on 1 Tesla T4 (à la [google colab](https://colab.research.google.com/)) for ~15 minutes
|
44 |
+
|
45 |
+
A good starting point to finetune your own gpt-j-6b would be [hivemind's 8bit training code](https://huggingface.co/hivemind/gpt-j-6B-8bit).
|
46 |
|
47 |
No LORA adapters were used for the sake of easy loading and inference with 🤗. Only Linear biases and LayerNorm scales were passed to the optimizer.
|
48 |
|