⚠️ FAILED TRAINING: Swapped Malay Tokenizer ASR Model

This repository contains a failed training run of fine-tuning nvidia/parakeet-tdt-0.6b-v3 on the Malay BabelSpeech dataset.

🔴 Failure Reason: Token Mapping Collapse

During initialization, the pre-trained English/multilingual SentencePiece tokenizer was swapped with a custom Malay SentencePiece Tokenizer using NeMo's change_vocabulary() method.

This action completely reset and randomly re-initialized the weights of the decoder and joint network. Because the training corpus was relatively small (50 hours of audio), the randomly initialized decoder could not align with the frozen pre-trained encoder features.

Consequently, the model collapsed during the second stage of fine-tuning (when the encoder was unfrozen) and got trapped in a local minimum, repeating the most common subword tokens indefinitely:

Validation WER: 109.5%
Output collapse pattern: 'so dia dia dia dia dia dia dia...'

💡 Recommendation

For speech corpuses smaller than 1000 hours, do not swap the pre-trained SentencePiece tokenizer vocabulary. Instead, keep the standard multilingual/English pre-trained tokenizer and fine-tune it directly. This preserves the pre-trained alignment weights and avoids representation collapse, as demonstrated in our successful run: Kim-el/parakeet-0.6-tdt-malay-english-vocab.

Downloads last month: -