ModernBERT_category

This model is a fine-tuned version of CocoRoF/ModernBERT-SimCSE-multitask_v03-distill on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3204

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 1024
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
15.608 0.0444 1000 0.4726
14.207 0.0888 2000 0.4257
12.9331 0.1331 3000 0.4045
12.5004 0.1775 4000 0.3896
12.5729 0.2219 5000 0.3786
12.2146 0.2663 6000 0.3713
11.8243 0.3107 7000 0.3632
11.3651 0.3550 8000 0.3578
11.7742 0.3994 9000 0.3524
11.022 0.4438 10000 0.3483
10.871 0.4882 11000 0.3453
11.24 0.5326 12000 0.3404
10.6222 0.5769 13000 0.3380
10.9927 0.6213 14000 0.3354
10.8912 0.6657 15000 0.3330
10.7683 0.7101 16000 0.3311
10.4059 0.7545 17000 0.3286
10.4617 0.7988 18000 0.3258
10.5632 0.8432 19000 0.3247
9.9193 0.8876 20000 0.3231
9.7854 0.9320 21000 0.3205
10.3546 0.9764 22000 0.3204

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.3.1
  • Tokenizers 0.21.0
Downloads last month
27
Safetensors
Model size
184M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for CocoRoF/ModernBERT_category