Model description
We fine-tuned a wav2vec 2.0 large XLSR-53 checkpoint with 842h of unlabelled Luxembourgish speech collected from RTL.lu. Then the model was fine-tuned on 14h of labelled Luxembourgish speech from the same domain. Additionally, we rescore the output transcription with a 5-gram language model trained on text corpora from the same domain.
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7.5e-05
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2000
- num_epochs: 50.0
- mixed_precision_training: Native AMP
Framework versions
- Transformers 4.20.0.dev0
- Pytorch 1.11.0+cu113
- Datasets 2.2.1
- Tokenizers 0.12.1
Citation
This model is a result of our paper IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS
submitted to the IEEE SLT 2022 workshop
@misc{lb-wav2vec2,
author = {Nguyen, Le Minh and Nayak, Shekhar and Coler, Matt.},
keywords = {Luxembourgish, multilingual speech recognition, language modelling, wav2vec 2.0 XLSR-53, under-resourced language},
title = {IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS},
year = {2022},
copyright = {2023 IEEE}
}
- Downloads last month
- 160
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Space using Lemswasabi/wav2vec2-large-xlsr-53-842h-luxembourgish-14h-with-lm 1
Evaluation results
- Dev WERself-reported9.500
- Test WERself-reported9.300
- Dev CERself-reported2.170
- Test CERself-reported2.080