smol_llama-220M-GQA-fineweb-edu-10BT

This model is a continously pretrained version of BEE-spoke-data/smol_llama-220M-GQA on the 10BT-sample subset of HuggingFaceFW/fineweb-edu.

It achieves the following results on the evaluation set:

Loss: 2.7416
Accuracy: 0.4560
Num Input Tokens Seen: 10810818560

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 80085
gradient_accumulation_steps: 32
total_train_batch_size: 256
optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Input Tokens Seen
2.8567	0.0145	300	2.8291	0.4450	157286400
2.8517	0.0291	600	2.8153	0.4465	314572800
2.8224	0.0436	900	2.8025	0.4481	471859200
2.8178	0.0582	1200	2.7912	0.4495	629145600
2.8001	0.0727	1500	2.7832	0.4505	786432000
2.8045	0.0873	1800	2.7772	0.4512	943718400
2.8019	0.1018	2100	2.7729	0.4516	1101004800
2.7995	0.1164	2400	2.7691	0.4522	1258291200
2.8006	0.1309	2700	2.7657	0.4526	1415577600
2.7886	0.1455	3000	2.7631	0.4528	1572864000
2.7907	0.1600	3300	2.7606	0.4532	1730150400
2.7907	0.1746	3600	2.7588	0.4536	1887436800
2.7788	0.1891	3900	2.7569	0.4537	2044723200
2.7942	0.2037	4200	2.7552	0.4540	2202009600
2.793	0.2182	4500	2.7538	0.4543	2359296000
2.7958	0.2328	4800	2.7526	0.4544	2516582400
2.78	0.2473	5100	2.7515	0.4547	2673868800
2.7937	0.2619	5400	2.7506	0.4548	2831155200
2.7717	0.2764	5700	2.7498	0.4548	2988441600
2.7832	0.2910	6000	2.7490	0.4548	3145728000
2.768	0.3055	6300	2.7482	0.4550	3303014400
2.7653	0.3201	6600	2.7476	0.4551	3460300800
2.7843	0.3346	6900	2.7470	0.4551	3617587200
2.7765	0.3492	7200	2.7464	0.4550	3774873600
2.7778	0.3637	7500	2.7460	0.4552	3932160000
2.7655	0.3783	7800	2.7455	0.4553	4089446400
2.7943	0.3928	8100	2.7449	0.4554	4246732800
2.7715	0.4074	8400	2.7447	0.4552	4404019200
2.7828	0.4219	8700	2.7443	0.4554	4561305600
2.7883	0.4365	9000	2.7440	0.4556	4718592000
2.7627	0.4510	9300	2.7437	0.4556	4875878400
2.7841	0.4656	9600	2.7435	0.4557	5033164800
2.7734	0.4801	9900	2.7433	0.4557	5190451200
2.7829	0.4947	10200	2.7430	0.4557	5347737600
2.781	0.5092	10500	2.7429	0.4557	5505024000
2.7757	0.5238	10800	2.7428	0.4557	5662310400
2.779	0.5383	11100	2.7426	0.4559	5819596800
2.7771	0.5529	11400	2.7425	0.4559	5976883200
2.7828	0.5674	11700	2.7424	0.4560	6134169600
2.7814	0.5820	12000	2.7423	0.4558	6291456000
2.7735	0.5965	12300	2.7422	0.4559	6448742400
2.7848	0.6111	12600	2.7420	0.4559	6606028800
2.7748	0.6256	12900	2.7420	0.4559	6763315200
2.7697	0.6402	13200	2.7419	0.4560	6920601600
2.7689	0.6547	13500	2.7419	0.4560	7077888000
2.7747	0.6692	13800	2.7419	0.4559	7235174400
2.786	0.6838	14100	2.7418	0.4561	7392460800
2.7801	0.6983	14400	2.7417	0.4560	7549747200
2.7658	0.7129	14700	2.7417	0.4561	7707033600
2.7717	0.7274	15000	2.7417	0.4560	7864320000
2.7717	0.7420	15300	2.7417	0.4560	8021606400
2.777	0.7565	15600	2.7417	0.4559	8178892800
2.7793	0.7711	15900	2.7416	0.4560	8336179200
2.7718	0.7856	16200	2.7416	0.4559	8493465600
2.7757	0.8002	16500	2.7416	0.4560	8650752000
2.7763	0.8147	16800	2.7416	0.4559	8808038400
2.7581	0.8293	17100	2.7416	0.4559	8965324800
2.7719	0.8438	17400	2.7416	0.4560	9122611200
2.7609	0.8584	17700	2.7416	0.4560	9279897600
2.7753	0.8729	18000	2.7416	0.4559	9437184000
2.7674	0.8875	18300	2.7415	0.4560	9594470400
2.7601	0.9020	18600	2.7416	0.4560	9751756800
2.7823	0.9166	18900	2.7416	0.4560	9909043200
2.7767	0.9311	19200	2.7416	0.4560	10066329600
2.7759	0.9457	19500	2.7416	0.4560	10223616000
2.7722	0.9602	19800	2.7415	0.4560	10380902400
2.7764	0.9748	20100	2.7416	0.4560	10538188800
2.7724	0.9893	20400	2.7416	0.4559	10695475200

Framework versions

Transformers 4.41.1
Pytorch 2.3.1+cu118
Datasets 2.19.1
Tokenizers 0.19.1

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	6.52
IFEval (0-Shot)	19.88
BBH (3-Shot)	2.31
MATH Lvl 5 (4-Shot)	0.00
GPQA (0-shot)	1.23
MuSR (0-shot)	14.26
MMLU-PRO (5-shot)	1.41

BEE-spoke-data
/

smol_llama-220M-GQA-fineweb_edu

smol_llama-220M-GQA-fineweb-edu-10BT

Training procedure

Training hyperparameters

Training results

Framework versions

Open LLM Leaderboard Evaluation Results

Model tree for BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu

Dataset used to train BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu

Evaluation results