Model description

We fine-tuned a wav2vec 2.0 large XLSR-53 checkpoint with 842h of unlabelled Luxembourgish speech collected from RTL.lu. Then the model was fine-tuned on 14h of labelled Luxembourgish speech from the same domain. Additionally, we rescore the output transcription with a 5-gram language model trained on text corpora from the same domain.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 7.5e-05
  • train_batch_size: 3
  • eval_batch_size: 3
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2000
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Framework versions

  • Transformers 4.20.0.dev0
  • Pytorch 1.11.0+cu113
  • Datasets 2.2.1
  • Tokenizers 0.12.1

Citation

This model is a result of our paper IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS submitted to the IEEE SLT 2022 workshop

@misc{lb-wav2vec2,
  author = {Nguyen, Le Minh and Nayak, Shekhar and Coler, Matt.},
  keywords = {Luxembourgish, multilingual speech recognition, language modelling, wav2vec 2.0 XLSR-53, under-resourced language},
  title = {IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS},
  year = {2022},
  copyright = {2023 IEEE}
}
Downloads last month
136
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using Lemswasabi/wav2vec2-large-xlsr-53-842h-luxembourgish-14h-with-lm 1

Evaluation results