MT5_large_A_art

This model is a fine-tuned version of ai-forever/sage-mt5-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2006

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3.83229e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3300

Training results

Training Loss Epoch Step Validation Loss
0.9979 0.0303 100 0.2649
0.5176 0.0606 200 0.2170
0.3916 0.0909 300 0.1973
0.3356 0.1212 400 0.1928
0.2993 0.1515 500 0.1937
0.2783 0.1818 600 0.1919
0.268 0.2121 700 0.1907
0.2697 0.2424 800 0.1914
0.2491 0.2726 900 0.1901
0.2488 0.3029 1000 0.1888
0.238 0.3332 1100 0.1861
0.2414 0.3635 1200 0.1872
0.2378 0.3938 1300 0.1857
0.2286 0.4241 1400 0.1842
0.2201 0.4544 1500 0.1849
0.2217 0.4847 1600 0.1845
0.2195 0.5150 1700 0.1835
0.2137 0.5453 1800 0.1818
0.2147 0.5756 1900 0.1822
0.2246 0.6059 2000 0.1806
0.2151 0.6362 2100 0.1806
0.2179 0.6665 2200 0.1805
0.2219 0.6968 2300 0.1806
0.2126 0.7271 2400 0.1808
0.2149 0.7573 2500 0.1802
0.2137 0.7876 2600 0.1806
0.2146 0.8179 2700 0.1803
0.2078 0.8482 2800 0.1803
0.2084 0.8785 2900 0.1805
0.2153 0.9088 3000 0.1801
0.2134 0.9391 3100 0.1799
0.2169 0.9694 3200 0.1799
0.2181 0.9997 3300 0.1799

Framework versions

  • Transformers 4.48.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.21.0
Downloads last month
8
Safetensors
Model size
1.23B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mika5883/MT5_large_A_art

Finetuned
(1)
this model