prm800k_mistral_full_1203_re

This model is a fine-tuned version of peiyi9979/math-shepherd-mistral-7b-prm on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6381
Accuracy: 0.7273
Precision: 0.5455
Recall: 0.2202
F1: 0.3137

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 4
seed: 908932403
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.3
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
No log	0	0	0.7264	0.5221	0.3206	0.6147	0.4214
0.4487	0.0277	100	0.5639	0.7481	0.5882	0.3670	0.4520
0.4543	0.0553	200	0.5885	0.7532	0.5729	0.5046	0.5366
0.5093	0.0830	300	0.6134	0.7247	0.5224	0.3211	0.3977
0.4375	0.1107	400	0.5853	0.7299	0.5385	0.3211	0.4023
0.5172	0.1383	500	0.5865	0.7273	0.5270	0.3578	0.4262
0.4233	0.1660	600	0.6196	0.7091	0.4867	0.5046	0.4955
0.4578	0.1937	700	0.7282	0.6753	0.3667	0.2018	0.2604
0.5126	0.2213	800	0.6442	0.7299	0.6	0.1376	0.2239
0.497	0.2490	900	0.5881	0.7039	0.4742	0.4220	0.4466
0.5491	0.2767	1000	0.5973	0.7169	0.5	0.4128	0.4523
0.443	0.3044	1100	0.6581	0.7221	0.5333	0.1468	0.2302
0.5123	0.3320	1200	0.6223	0.6831	0.4330	0.3853	0.4078
0.5508	0.3597	1300	0.6529	0.7169	0.5	0.2018	0.2876
0.4592	0.3874	1400	0.6542	0.7325	0.5517	0.2936	0.3832
0.4795	0.4150	1500	0.6218	0.7013	0.4318	0.1743	0.2484
0.4955	0.4427	1600	0.7782	0.7247	0.6364	0.0642	0.1167
0.5032	0.4704	1700	0.5619	0.7169	0.5	0.6697	0.5725
0.5327	0.4980	1800	0.6404	0.7299	0.5556	0.2294	0.3247
0.508	0.5257	1900	0.6181	0.7299	0.5352	0.3486	0.4222
0.4908	0.5534	2000	0.6056	0.7481	0.6071	0.3119	0.4121
0.4834	0.5810	2100	0.6065	0.7429	0.5758	0.3486	0.4343
0.506	0.6087	2200	0.6348	0.7481	0.6	0.3303	0.4260
0.4991	0.6364	2300	0.6207	0.7506	0.6327	0.2844	0.3924
0.3951	0.6640	2400	0.6659	0.7221	0.5385	0.1284	0.2074
0.4769	0.6917	2500	0.6327	0.7143	0.4925	0.3028	0.375
0.4661	0.7194	2600	0.6489	0.7247	0.5306	0.2385	0.3291
0.4985	0.7470	2700	0.6353	0.7273	0.5435	0.2294	0.3226
0.4713	0.7747	2800	0.6370	0.7299	0.5455	0.2752	0.3659
0.3479	0.8024	2900	0.6417	0.7273	0.5333	0.2936	0.3787
0.377	0.8300	3000	0.6435	0.7273	0.54	0.2477	0.3396
0.4907	0.8577	3100	0.6059	0.7221	0.5156	0.3028	0.3815
0.3445	0.8854	3200	0.6503	0.7351	0.5714	0.2569	0.3544
0.4569	0.9131	3300	0.6253	0.7221	0.5185	0.2569	0.3436
0.3566	0.9407	3400	0.6265	0.7247	0.5283	0.2569	0.3457
0.3483	0.9684	3500	0.6361	0.7247	0.5333	0.2202	0.3117
0.4514	0.9961	3600	0.6381	0.7273	0.5455	0.2202	0.3137

Framework versions

Transformers 4.46.0
Pytorch 2.4.0+cu118
Datasets 3.0.0
Tokenizers 0.20.1

mtzig
/

prm800k_mistral_full_1203_re

prm800k_mistral_full_1203_re

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for mtzig/prm800k_mistral_full_1203_re

Evaluation results