Baby-Llama-58M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 4.7109

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.00025
train_batch_size: 128
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
num_epochs: 80
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
311.1646	1.0	3	287.5772
309.9048	2.0	6	282.5104
295.7833	3.0	9	266.8010
269.5852	4.0	12	247.3416
250.6772	5.0	15	231.4105
243.0754	6.0	18	224.6885
235.779	7.0	21	217.7554
235.8358	8.0	24	211.6984
224.1199	9.0	27	204.9522
216.0247	10.0	30	197.5209
206.4354	11.0	33	189.5172
189.1456	12.0	36	179.2765
181.0333	13.0	39	157.3401
152.062	14.0	42	137.4234
132.3128	15.0	45	120.5469
118.0474	16.0	48	106.6884
107.6354	17.0	51	97.7495
98.2458	18.0	54	88.4898
86.4009	19.0	57	77.8249
75.9386	20.0	60	67.9337
65.627	21.0	63	58.1877
53.5903	22.0	66	49.0234
47.114	23.0	69	41.2838
38.9667	24.0	72	34.4503
32.8846	25.0	75	29.7438
27.1886	26.0	78	24.2863
23.0713	27.0	81	20.1505
18.9003	28.0	84	16.9556
15.9133	29.0	87	14.4738
13.5544	30.0	90	12.6399
11.6834	31.0	93	11.1016
10.2371	32.0	96	9.9052
9.2371	33.0	99	8.9413
8.352	34.0	102	8.1600
7.5322	35.0	105	7.6794
7.0653	36.0	108	7.3031
6.6853	37.0	111	6.9564
6.3257	38.0	114	6.7247
5.9869	39.0	117	6.4649
5.8618	40.0	120	6.2734
5.6025	41.0	123	6.1253
5.4913	42.0	126	6.0822
5.3086	43.0	129	5.8575
5.1904	44.0	132	5.6860
5.1193	45.0	135	5.6821
5.0846	46.0	138	5.5831
5.017	47.0	141	5.5245
4.7435	48.0	144	5.3877
4.7546	49.0	147	5.3523
4.8606	50.0	150	5.3845
4.7146	51.0	153	5.2239
4.6273	52.0	156	5.1927
4.4469	53.0	159	5.1898
4.5135	54.0	162	5.0846
4.4061	55.0	165	5.0756
4.3577	56.0	168	5.0474
4.2169	57.0	171	5.0125
4.3001	58.0	174	4.9770
4.2399	59.0	177	4.9469
4.3372	60.0	180	4.9162
4.2669	61.0	183	4.9166
4.2394	62.0	186	4.8618
4.2965	63.0	189	4.8595
4.1188	64.0	192	4.8285
4.2886	65.0	195	4.8265
4.2688	66.0	198	4.8103
4.2429	67.0	201	4.7904
3.9653	68.0	204	4.7787
4.2676	69.0	207	4.7604
4.2029	70.0	210	4.7588
4.0962	71.0	213	4.7560
4.0643	72.0	216	4.7449
4.0713	73.0	219	4.7341
4.1192	74.0	222	4.7275
4.135	75.0	225	4.7186
3.9914	76.0	228	4.7135
4.0225	77.0	231	4.7144
3.9907	78.0	234	4.7152
4.0444	79.0	237	4.7123
4.0321	80.0	240	4.7109

Framework versions

Transformers 4.39.1
Pytorch 2.1.2+cu121
Datasets 2.16.1
Tokenizers 0.15.0

ninagroot
/

Baby-Llama-58M-RUN3-old

Baby-Llama-58M

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results