Edit model card

6.7m (6,700,128) param GPT-J model.

n_positions - 128
n_embd - 64
n_layer - 4
n_head - 8
rotary_dim - 64
tokenizer - gpt-j

First, trained on 4,194,304 samples from the c4 dataset, at a length of 128 tokens each, that comes out to 536,870,912 (0.53B) tokens seen during training. A batch size of 16 with 128 gradient accumulation steps was used, making the effective batch size 2048. A cosine learning rate schedule was used starting at 1e-3.

Downloads last month
41
Safetensors
Model size
6.77M params
Tensor type
F32
·
U8
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train crumb/pico-gpt-j-6.7m