Baby-Llama-58M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 6.7221

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.00025
train_batch_size: 128
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
num_epochs: 80
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
135.1538	1.0	8	118.8448
112.3406	2.0	16	102.1364
107.9124	3.0	24	86.8275
85.5837	4.0	32	71.8709
82.7059	5.0	40	60.4278
62.0973	6.0	48	51.7763
56.6325	7.0	56	44.4392
46.5864	8.0	64	39.5206
40.749	9.0	72	36.8323
34.1225	10.0	80	30.4178
26.3662	11.0	88	25.6518
21.4543	12.0	96	21.5034
17.4064	13.0	104	18.2917
14.5338	14.0	112	16.0543
12.8652	15.0	120	14.5666
11.1266	16.0	128	13.6536
9.5181	17.0	136	12.6228
8.0769	18.0	144	11.2297
7.3252	19.0	152	10.6871
6.7225	20.0	160	10.5576
6.1834	21.0	168	9.6600
6.0954	22.0	176	9.5832
5.715	23.0	184	9.4159
5.5297	24.0	192	8.8495
5.1538	25.0	200	8.6964
5.0472	26.0	208	8.4671
5.0581	27.0	216	8.3979
4.6914	28.0	224	8.2086
4.6117	29.0	232	8.2212
4.5157	30.0	240	8.1633
4.1918	31.0	248	8.1399
4.5274	32.0	256	7.7368
4.0493	33.0	264	7.7647
4.2799	34.0	272	7.8127
4.5331	35.0	280	7.6971
4.5937	36.0	288	7.6908
3.9957	37.0	296	7.6509
4.3035	38.0	304	7.5682
4.2626	39.0	312	7.4550
3.7238	40.0	320	7.4516
3.9562	41.0	328	7.2862
3.8612	42.0	336	7.3332
3.6178	43.0	344	7.3013
3.7672	44.0	352	7.2144
3.715	45.0	360	7.2103
3.7594	46.0	368	7.2457
4.3614	47.0	376	7.1274
4.0406	48.0	384	7.0472
3.5213	49.0	392	6.9963
3.7373	50.0	400	7.0503
3.7399	51.0	408	6.9916
3.8109	52.0	416	6.9899
3.3897	53.0	424	6.9132
3.2456	54.0	432	6.9393
3.8682	55.0	440	6.9017
3.3904	56.0	448	6.8995
3.8449	57.0	456	6.8478
3.6319	58.0	464	6.8388
3.4726	59.0	472	6.8123
3.5895	60.0	480	6.8452
3.4	61.0	488	6.7875
3.6904	62.0	496	6.7963
3.3957	63.0	504	6.7976
3.4602	64.0	512	6.8317
3.2714	65.0	520	6.8063
3.5695	66.0	528	6.7709
3.1538	67.0	536	6.7849
3.5586	68.0	544	6.7565
3.194	69.0	552	6.7629
3.0488	70.0	560	6.7462
3.6931	71.0	568	6.7269
3.7324	72.0	576	6.7367
3.2075	73.0	584	6.7460
3.3394	74.0	592	6.7111
3.4074	75.0	600	6.7456
3.3679	76.0	608	6.7225
3.2689	77.0	616	6.7234
3.6886	78.0	624	6.7247
3.4587	79.0	632	6.7224
3.6444	80.0	640	6.7221

Framework versions

Transformers 4.39.1
Pytorch 2.1.2+cu121
Datasets 2.16.1
Tokenizers 0.15.0

ninagroot
/

Baby-Llama-58M-RUN2

Baby-Llama-58M

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results