Llama-124M-experimental-pretrain

This is an experimental pretraining run done solely on a home PC.

Model Description

Training code adapted from https://github.com/Lightning-AI/litgpt .
Cost: Around 20 RMB ($3).
Model architecture: Transformer decoder with gated SiLU MLP, RMS Norm, RoPE positional embedding, and grouped query attention.
Language(s) (NLP): Mainly English.
License: apache-2.0
Parameter count: 124M (0.124B)

Uses

After downloading this repository, run

litgpt generate "./Llama-124M-experimental-pretrain --prompt "What is GPT-4? GPT-4 is"

The output will look something like:

What is GPT-4? GPT-4 is an extremely powerful, highly immersive, and powerful, in the sense that it is able to be used to help you deal with various technical issues, while still providing an easy to use experience that will help you get better and faster results. It
Time for inference 1: 0.42 sec total, 119.97 tokens/sec
Memory used: 0.27 GB

Bias, Risks, and Limitations

This model is too small to avoid hallucinations, and there is no code in the training dataset. Dont expect this model to provide any sort of assistance. Just for fun.

Training Details

Training Data

This model is trained on https://huggingface.co/datasets/EleutherAI/rpj-v2-sample for two epochs, with a total of 19 billion tokens. The trained context length is 2048.

Training Hyperparameters

Training regime: bf16-mixed.
Learning rate: Cosine schedule from 5e-4 to 5e-5.

Speeds

The training run lasted for approximately 43 hours on one PC with 1x RTX 4090.

Evaluation

Tasks	Version	Filter	Metric		Value		Stderr
arc_easy	1	none	acc	↑	0.3969	±	0.0100
		none	acc_norm	↑	0.3628	±	0.0099
lambada_openai	1	none	acc	↑	0.2626	±	0.0061
		none	perplexity	↓	71.1943	±	2.8730
piqa	1	none	acc	↑	0.5871	±	0.0115
		none	acc_norm	↑	0.5843	±	0.0115
sciq	1	none	acc	↑	0.6940	±	0.0146
		none	acc_norm	↑	0.5970	±	0.0155

Environmental Impact

Hardware Type: RTX 4090 x 1
Hours used: 44
Carbon Emitted: 6.6 kg of CO2.