crumb
/

gpt-j-6b-shakespeare

Text Generation

Model card Files Files and versions Community

mia naomi commited on Jul 13, 2022

Commit

ad1bad8

•

1 Parent(s): f41b4c6

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -11,7 +11,8 @@ datasets:
 # GPT-J 6b Shakespeare
-<p style="color:green"> <b> The "Hosted inference API" does not work. Go to the <a href="https://huggingface.co/crumb/gpt-j-6b-shakespeare#how-to-use">How to Use</a> section </b>
 ## Model Description
@@ -42,7 +43,7 @@ Trained on 1 Tesla T4 from [google colab](https://colab.research.google.com/)
 ```TrainOutput(global_step=147, training_loss=1.665000240818984, metrics={'train_runtime': 2828.7347, 'train_samples_per_second': 0.417, 'train_steps_per_second': 0.052, 'total_flos': 1555992281088.0, 'train_loss': 1.665000240818984, 'epoch': 1.0})```
-A good starting point to finetune your own gpt-j-6b would be [hivemind's 8bit training code](https://huggingface.co/hivemind/gpt-j-6B-8bit), or with the notebook [in this repository](https://huggingface.co/crumb/gpt-j-6b-shakespeare/blob/main/gpt_j_6b_bias+norm_fit.ipynb) which you can download and open in [google colab](https://colab.research.google.com/) or any other ipynb service
 No LORA adapters were used for the sake of easy loading and inference with 🤗. Only Linear biases and LayerNorm scales were passed to the optimizer.

 # GPT-J 6b Shakespeare
+<p style="color:green"> <b> 1.) The "Hosted inference API" does not work. Go to the <a href="https://huggingface.co/crumb/gpt-j-6b-shakespeare#how-to-use">How to Use</a> section <br>
+2.) This is a "proof of concept" and not fully trained, simple training script also in "How to Use" section. </b>
 ## Model Description
 ```TrainOutput(global_step=147, training_loss=1.665000240818984, metrics={'train_runtime': 2828.7347, 'train_samples_per_second': 0.417, 'train_steps_per_second': 0.052, 'total_flos': 1555992281088.0, 'train_loss': 1.665000240818984, 'epoch': 1.0})```
+A good starting point to finetune your own gpt-j-6b would be [hivemind's 8bit training code](https://huggingface.co/hivemind/gpt-j-6B-8bit), or with the notebook in [this repository](https://github.com/aicrumb/gpt-j-8bit) which you can download and open in [google colab](https://colab.research.google.com/) or any other ipynb service
 No LORA adapters were used for the sake of easy loading and inference with 🤗. Only Linear biases and LayerNorm scales were passed to the optimizer.