6.7m (6,700,128) param GPT-J model.

n_positions - 128
n_embd - 64
n_layer - 4
n_head - 8
rotary_dim - 64
tokenizer - gpt-j

First, trained on 4,194,304 samples from the c4 dataset, at a length of 128 tokens each, that comes out to 536,870,912 (0.53B) tokens seen during training. A batch size of 16 with 128 gradient accumulation steps was used, making the effective batch size 2048. A cosine learning rate schedule was used starting at 1e-3.

Downloads last month: 41

Safetensors

Model size

6.77M params

Tensor type

F32

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

crumb
/

pico-gpt-j-6.7m

Dataset used to train crumb/pico-gpt-j-6.7m