Llama-124M-experimental-pretrain

This is an experimental pretraining run done solely on a home PC.

Model Description

  • Training code adapted from https://github.com/Lightning-AI/litgpt .
  • Cost: Around 20 RMB ($3).
  • Model architecture: Transformer decoder with gated SiLU MLP, RMS Norm, RoPE positional embedding, and grouped query attention.
  • Language(s) (NLP): Mainly English.
  • License: apache-2.0
  • Parameter count: 124M (0.124B)

Uses

After downloading this repository, run

litgpt generate "./Llama-124M-experimental-pretrain --prompt "What is GPT-4? GPT-4 is"

The output will look something like:

What is GPT-4? GPT-4 is an extremely powerful, highly immersive, and powerful, in the sense that it is able to be used to help you deal with various technical issues, while still providing an easy to use experience that will help you get better and faster results. It
Time for inference 1: 0.42 sec total, 119.97 tokens/sec
Memory used: 0.27 GB

Bias, Risks, and Limitations

This model is too small to avoid hallucinations, and there is no code in the training dataset. Dont expect this model to provide any sort of assistance. Just for fun.

Training Details

Training Data

This model is trained on https://huggingface.co/datasets/EleutherAI/rpj-v2-sample for two epochs, with a total of 19 billion tokens. The trained context length is 2048.

Training Hyperparameters

  • Training regime: bf16-mixed.
  • Learning rate: Cosine schedule from 5e-4 to 5e-5.

Speeds

The training run lasted for approximately 43 hours on one PC with 1x RTX 4090.

Evaluation

Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 0 acc 0.3969 ± 0.0100
none 0 acc_norm 0.3628 ± 0.0099
lambada_openai 1 none 0 acc 0.2626 ± 0.0061
none 0 perplexity 71.1943 ± 2.8730
piqa 1 none 0 acc 0.5871 ± 0.0115
none 0 acc_norm 0.5843 ± 0.0115
sciq 1 none 0 acc 0.6940 ± 0.0146
none 0 acc_norm 0.5970 ± 0.0155

Environmental Impact

  • Hardware Type: RTX 4090 x 1
  • Hours used: 44
  • Carbon Emitted: 6.6 kg of CO2.
Downloads last month
1
Inference API
Unable to determine this model's library. Check the docs .