CSE-251B-Project

1. Get training data

Download and tokenize FineWeb-Edu:

# Using Karpathy's build-nanogpt data script:
git clone https://github.com/karpathy/build-nanogpt.git
cd build-nanogpt
python fineweb.py

This downloads the FineWeb-Edu 10B-token sample and tokenizes it into binary shards. You can also use nanoGPT's data preparation scripts for other datasets like OpenWebText.

This might take about 10 minutes to download.

2. Train a baseline

Train a small model to verify everything works:

# Default (learned positional embeddings, wpe):
python train.py --n_layer=8 --n_head=8 --n_embd=512 --max_iters=200 --batch_size=4 --save_interval=10

# Use ALiBi positional bias instead of wpe:
python train.py --pos_encoding=alibi --n_layer=8 --n_head=8 --n_embd=512 --max_iters=200 --batch_size=4 --save_interval=10

--pos_encoding choices:

Value	Description
`wpe` (default)	Learned absolute positional embeddings (GPT-2 style)
`alibi`	ALiBi — no learned position params; linear bias added to attention logits

I had to use use a small batch_size for my machine (Titan GPU).

3. Evaluate on the val set

Copy over the val.bin file from the contest repo.

# Local eval during development
p evaluate.py --model_dir . --data val.bin --checkpoint_filename log/model_00199.pt

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support