cllm-1.0.0

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 4
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 50
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss
No log	0.0044	2000	3.8105
4.2179	0.0089	4000	3.4461
4.2179	0.0133	6000	3.2612
3.2341	0.0178	8000	3.1610
3.2341	0.0222	10000	3.0834
3.0422	0.0267	12000	3.0260
3.0422	0.0311	14000	2.9872
2.9347	0.0356	16000	2.9444
2.9347	0.0400	18000	2.9090
2.874	0.0445	20000	2.8854
2.874	0.0489	22000	2.8585
2.8204	0.0534	24000	2.8405
2.8204	0.0578	26000	2.8245
2.7825	0.0622	28000	2.8106
2.7825	0.0667	30000	2.7993
2.7555	0.0711	32000	2.7867
2.7555	0.0756	34000	2.7738
2.7285	0.0800	36000	2.7700
2.7285	0.0845	38000	2.7597
2.7179	0.0889	40000	2.7591
2.7179	0.0934	42000	2.7488
2.7119	0.0978	44000	2.7512
2.7119	0.1023	46000	2.7487
2.7043	0.1067	48000	2.7436
2.7043	0.1111	50000	2.7422
2.7013	0.1156	52000	2.7435
2.7013	0.1200	54000	2.7388
2.7029	0.1245	56000	2.7380
2.7029	0.1289	58000	2.7403