apcl
/

Jam-CGPT

Model card Files Files and versions Community

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Jam-CGPT

Jam-CGPT is a GPT2-like model that follows jam's pretraining procedure to pretrain models ranging from 38 million to 350 million parameters and finetuning with comments generated by GPT-3.5 and data size ranging from 170k to 2.15m.

Jam-CGPT Training Details

We follow jam's pretraining procedure and use the same data to pretrain our 38m, 110m and 350m parameters models.
We finetune our Jam-CGPT with the summaries generated by GPT-3.5 and 4 different dataset size Jam-CGPT dataset.
We finetune our models for 3 epochs.
Our GitHub repo contains the code for reproduction using the same data.

Jam-CGPT 38 million parameters model

Hyperparameter	Description	Value
e	embedding dimensions	512
L	number of layers	4
h	attention heads	4
c	block size / context length	256
b	batch size	64
a	accumulation steps	2
d	dropout	0.20
r	learning rate	3e-5
y	iterations	1e-5
iter	number of iterations after pretraing	757,000

Jam-CGPT 110 million parameters model

Hyperparameter	Description	Value
e	embedding dimensions	768
L	number of layers	10
h	attention heads	8
c	block size / context length	256
b	batch size	32
a	accumulation steps	4
d	dropout	0.20
r	learning rate	3e-5
y	iterations	1e-5
iter	number of iterations after pretraing	762,000

Jam-CGPT 350 million parameters model

Hyperparameter	Description	Value
e	embedding dimensions	1024
L	number of layers	24
h	attention heads	16
c	block size / context length	256
b	batch size	4
a	accumulation steps	32
d	dropout	0.20
r	learning rate	3e-5
y	weight decay	1e-5
iter	iterations	272,000

Note that you can adjust the batch size and accumulation steps based on your GPU memory. But, the batch size * accumulation steps should be 128.
If you finetune your models with multiple GPUs, you can turn down accumulation steps. For example, if you finetune with 2 GPUs, you will need to half the accumulation steps.
We pretrained 38m and 110m models for 3 epochs.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support