rhysjones's picture
Update README.md
6fb7983 verified
metadata
datasets: HuggingFaceFW/fineweb-edu
widget:
  - example_title: Example interaction
    text: During photosynthesis in green plants
inference:
  parameters:
    repetition_penalty: 1.3
language:
  - en
library_name: transformers
license: mit

Model Card for gpt2-124M-edu-fineweb-10B

A 124M parameter GPT2 model trained with the 10B fineweb-edu dataset using https://github.com/karpathy/llm.c

Training took 20 hours on a single 4090 GPU (limited to 350W), giving the following graphs:

gpt2-124M-edu-fineweb-10B

Training

The training parameters where:

./train_gpt2cu \
    -i "dev/data/edu_fineweb10B/edu_fineweb_train_*.bin" \
    -j "dev/data/edu_fineweb10B/edu_fineweb_val_*.bin" \
    -o log124M \
    -e "d12" \
    -b 56 -t 1024 \
    -d 458752 \
    -r 1 \
    -z 1 \
    -c 0.1 \
    -l 0.002 \
    -q 0.0 \
    -u 700 \
    -n 5000 \
    -v 250 -s 20000 \
    -h 1

The model has had no further finetuning.

Evaluation

Evals using Eleuther AI Harness as described in the open_llm_leaderboard and comparing with those published for openai-community/gpt2

gpt2-124M-edu-fineweb-10B Evals

Eval Test Score
arc_challenge (25 shot) 24.49
gsm8k (5 shot) 0.08
hellaswag (10 shot) 32.64
mmlu (5 shot) 26.06
truthfulqa (0 shot) 42.45
winogrande (5 shot) 52.17
Overall Score 29.65