HuggingFaceFW/fineweb
Viewer • Updated • 52.5B • 946k • 2.8k
CinnabarLM is a tiny, 4M-parameter LLM trained for ~33 minutes on a T4 GPU (on Colab)! It's only 16 MB in size!
Because it's a good idea to make tiny LLMs. Some people already did with MicroLM, Spark 4 5M and Tenete 8M, but not myself!
| Parameter | Value |
|---|---|
| Tokenizer | Custom BPE tokenizer |
| Vocabulary Size | 4096 tokens |
| Batch Size | 64 |
| Context Window | 256 tokens |
n_embed |
192 |
n_head |
8 |
n_layer |
6 |
| Dropout | 0.1 |
| Hyperparameter | Value |
|---|---|
max_iters |
10000 |
eval_interval |
500 |
learning_rate |
6e-4 |
min_lr |
6e-5 |
warmup_iters |
500 |
weight_decay |
0.1 |
beta1, beta2 |
0.9, 0.95 |