moe_train_run

This model is a fine-tuned version of ModernBERT-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.9874
  • Model Preparation Time: 0.0047
  • F1: 0.8876
  • Precision: 0.8509
  • Recall: 0.9275
  • Threshold: 0.7668
  • Sim Ratio: 1.4762
  • Pos Sim: 0.8878
  • Neg Sim: 0.6014

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Model Preparation Time F1 Precision Recall Threshold Sim Ratio Pos Sim Neg Sim
0.7183 0.0821 10000 3.5469 0.0047 0.8386 0.7972 0.8846 0.8755 1.2415 0.9408 0.7578
0.7053 0.1643 20000 3.6924 0.0047 0.8496 0.7963 0.9104 0.8043 1.383 0.9156 0.6621
0.6003 0.2464 30000 3.9111 0.0047 0.862 0.8148 0.9151 0.7832 1.437 0.9048 0.6296
0.5856 0.3286 40000 3.9771 0.0047 0.8628 0.822 0.9079 0.7718 1.4877 0.894 0.6009
0.5801 0.4107 50000 3.9434 0.0047 0.8704 0.8277 0.9178 0.7749 1.4477 0.8995 0.6214
0.562 0.4929 60000 3.6962 0.0047 0.8685 0.8232 0.9192 0.7930 1.4037 0.9064 0.6457
0.5307 0.5750 70000 3.8964 0.0047 0.875 0.839 0.9142 0.7807 1.4542 0.8973 0.617
0.4793 0.6572 80000 4.0046 0.0047 0.8779 0.8429 0.916 0.7706 1.4946 0.8912 0.5963
0.4978 0.7393 90000 4.0062 0.0047 0.8796 0.8395 0.9239 0.7598 1.4979 0.8879 0.5927
0.4934 0.8215 100000 3.9771 0.0047 0.885 0.8522 0.9204 0.7734 1.478 0.89 0.6022
0.4757 0.9036 110000 4.0861 0.0047 0.884 0.8489 0.9221 0.7636 1.5028 0.8859 0.5895
0.4773 0.9858 120000 3.9877 0.0047 0.8874 0.8558 0.9215 0.7711 1.4765 0.8877 0.6012

Framework versions

  • Transformers 4.48.3
  • Pytorch 2.5.1
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
13
Safetensors
Model size
384M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.