YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
CSE-251B-Project
1. Get training data
Download and tokenize FineWeb-Edu:
# Using Karpathy's build-nanogpt data script:
git clone https://github.com/karpathy/build-nanogpt.git
cd build-nanogpt
python fineweb.py
This downloads the FineWeb-Edu 10B-token sample and tokenizes it into binary shards. You can also use nanoGPT's data preparation scripts for other datasets like OpenWebText.
This might take about 10 minutes to download.
2. Train a baseline
Train a small model to verify everything works:
# Default (learned positional embeddings, wpe):
python train.py --n_layer=8 --n_head=8 --n_embd=512 --max_iters=200 --batch_size=4 --save_interval=10
# Use ALiBi positional bias instead of wpe:
python train.py --pos_encoding=alibi --n_layer=8 --n_head=8 --n_embd=512 --max_iters=200 --batch_size=4 --save_interval=10
--pos_encoding choices:
| Value | Description |
|---|---|
wpe (default) |
Learned absolute positional embeddings (GPT-2 style) |
alibi |
ALiBi — no learned position params; linear bias added to attention logits |
I had to use use a small batch_size for my machine (Titan GPU).
3. Evaluate on the val set
Copy over the val.bin file from the contest repo.
# Local eval during development
p evaluate.py --model_dir . --data val.bin --checkpoint_filename log/model_00199.pt
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support