tst-translation

This model is a fine-tuned version of google/mt5-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8421
  • Bleu: 13.1948
  • Gen Len: 49.9179

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
3.4257 1.9900 400 2.1087 4.1008 77.8284
1.8571 3.9801 800 1.9292 8.6198 61.1418
1.2467 5.9701 1200 1.9779 10.7074 48.3184
0.8749 7.9602 1600 2.0539 11.8538 49.3483
0.6141 9.9502 2000 2.1948 12.4452 51.1269
0.4446 11.9403 2400 2.3902 12.3052 48.0995
0.3251 13.9303 2800 2.5698 12.5824 49.1244
0.2501 15.9204 3200 2.6631 13.0619 50.6095
0.1986 17.9104 3600 2.7877 13.0557 51.1443
0.1692 19.9005 4000 2.8421 13.1948 49.9179

Framework versions

  • Transformers 4.43.0.dev0
  • Pytorch 2.3.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
13
Safetensors
Model size
582M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pritamdeka/mt5-base-en-bgc-different-version

Base model

google/mt5-base
Finetuned
(161)
this model