Omni-temporal Classification (OTC)

We propose BTC/OTC to directly train an ASR system leveraging weak supervision, i.e., speech with non-verbatim transcripts. This is achieved by using a special token to model uncertainties (i.e., substitution errors, insertion errors, and deletion errors) within the WFST framework during training.

OTC maintains reasonable ASR performance even when the transcripts contain up to 70% errors of different types.

When transcript error rate = 0.5

Results (WER (%)) (ctc-greedy-search)

Training Criterion ssl fbank
test-clean test-other test-clean test-other
CTC 100.0 100.0 99.89 99.98
OTC 11.89 25.46 20.14 44.24

Results (WER (%)) (1best, blank_bias=-4)

Training Criterion ssl fbank
test-clean test-other test-clean test-other
CTC 98.40 98.68 99.79 99.86
OTC 6.59 15.98 11.78 32.38
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.