iva_mt_wslot-m2m100_418M-en-zh

This model is a fine-tuned version of facebook/m2m100_418M on the iva_mt_wslot dataset. It achieves the following results on the evaluation set:

Loss: 0.0120
Bleu: 69.4383
Gen Len: 19.4038

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 7
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
0.0155	1.0	2109	0.0132	66.1893	19.117
0.011	2.0	4218	0.0120	66.5023	19.2003
0.0084	3.0	6327	0.0116	68.2038	19.4521
0.0061	4.0	8436	0.0115	69.129	19.2181
0.0046	5.0	10545	0.0117	69.3609	19.3212
0.0035	6.0	12654	0.0119	69.1841	19.3972
0.0028	7.0	14763	0.0120	69.4383	19.4038

Framework versions

Transformers 4.28.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

Citation

If you use this model, please cite the following:

@article{Sowanski2023SlotLI,
  title={Slot Lost in Translation? Not Anymore: A Machine Translation Model for Virtual Assistants with Type-Independent Slot Transfer},
  author={Marcin Sowanski and Artur Janicki},
  journal={2023 30th International Conference on Systems, Signals and Image Processing (IWSSIP)},
  year={2023},
  pages={1-5}
}

cartesinus
/

iva_mt_wslot-m2m100_418M-en-zh

iva_mt_wslot-m2m100_418M-en-zh

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Citation

Evaluation results