metadata

license: cc-by-nc-4.0
language:
  - hyw
datasets:
  - mozilla-foundation/common_voice_16_1
  - google/fleurs
  - ReRooted
pipeline_tag: automatic-speech-recognition
tags:
  - audio-to-audio
  - text-to-speech
  - seamless_communication

SeamlessM4T v2 ASR for Western Armenian

This model is a fine-tuned version of the facebook/seamless-m4t-v2-large. Initially, it was fine-tuned on the Common Voice 16.1 and Google Fleurs datasets. Subsequently, it was further fine-tuned on the ReRooted corpus. The model achieves the following results on the test sets:

CV_wer: 0.308
CV_cer: 0.07
GF_wer: 0.311
GF_cer: 0.094

After fine-tuning on Western Armenian data, the model occasionally translates Eastern Armenian speech into Western Armenian.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-6
train_batch_size: 4
eval_batch_size: 1
seed: 43
optimizer: Adam with betas=(0.9, 0.98) and epsilon=1e-08
lr_scheduler_type: MyleLR
lr_scheduler_warmup_steps: 100

Framework versions

Pytorch 2.1.1
fairseq2==0.2.0