ModernBERT_category

This model is a fine-tuned version of CocoRoF/ModernBERT-SimCSE-multitask_v03-distill on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 32
total_train_batch_size: 1024
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1.0

Training Loss	Epoch	Step	Validation Loss
15.608	0.0444	1000	0.4726
14.207	0.0888	2000	0.4257
12.9331	0.1331	3000	0.4045
12.5004	0.1775	4000	0.3896
12.5729	0.2219	5000	0.3786
12.2146	0.2663	6000	0.3713
11.8243	0.3107	7000	0.3632
11.3651	0.3550	8000	0.3578
11.7742	0.3994	9000	0.3524
11.022	0.4438	10000	0.3483
10.871	0.4882	11000	0.3453
11.24	0.5326	12000	0.3404
10.6222	0.5769	13000	0.3380
10.9927	0.6213	14000	0.3354
10.8912	0.6657	15000	0.3330
10.7683	0.7101	16000	0.3311
10.4059	0.7545	17000	0.3286
10.4617	0.7988	18000	0.3258
10.5632	0.8432	19000	0.3247
9.9193	0.8876	20000	0.3231
9.7854	0.9320	21000	0.3205
10.3546	0.9764	22000	0.3204