Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GPT-2 (125M) 4k tokens

Fine-tuned GPT2 Smallest model on The Pile with a token length of 4k. Weights are included and it follows Karpathy's nanoGPT implementation. The model has been trained for ~1 million iterations with increasing batch size, ending at 32k. The final loss is 3.9 which is probably due to 768 embedding size.

Downloads last month
21
Safetensors
Model size
176M params
Tensor type
F32
·
BOOL
·