BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e

second epoch of fine-tuning on the same dataset w/ different seed

This model is a fine-tuned version of BEE-spoke-data/tFINE-900m-e16-d32-instruct on the pszemraj/infinity-instruct-7m-T2T_en dataset. It achieves the following results on the evaluation set:

Loss: 1.1159
Num Input Tokens Seen: 810839096

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 6969
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.234	0.0969	2000	1.2439	78067836
1.2248	0.1938	4000	1.2256	156868756
1.2024	0.2907	6000	1.2009	235148092
1.2074	0.3876	8000	1.1777	313452856
1.1617	0.4845	10000	1.1597	392316428
1.1755	0.5815	12000	1.1437	471101508
1.1473	0.6784	14000	1.1321	549831184
1.1743	0.7753	16000	1.1244	628937800
1.137	0.8722	18000	1.1179	707117360
1.0713	0.9691	20000	1.1160	785755388

BEE-spoke-data
/

tFINE-900m-e16-d32-instruct_2e

BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e

Training procedure

Training hyperparameters

Training results

Model tree for BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e

Dataset used to train BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e