Edit model card

Fine-Tune Wav2Vec2 large model for English ASR

Data for fine-tune

Dataset Duration in hours
Common Voice 1667
Europarl 85
How2 356
Librispeech 936
MuST-C v1 407
MuST-C v2 482
Tedlium 482

Evaluation result

Dataset Duration in hours WER w/o LM WER with LM
Librispeech 5.4 2.9 1.1
Tedlium 2.6 7.9 5.4

Usage

Open In Colab

from transformers.file_utils import cached_path, hf_bucket_url
from importlib.machinery import SourceFileLoader
from transformers import Wav2Vec2ProcessorWithLM
from IPython.lib.display import Audio
import torchaudio
import torch

# Load model & processor
model_name = "nguyenvulebinh/iwslt-asr-wav2vec-large-4500h"
model = SourceFileLoader("model", cached_path(hf_bucket_url(model_name,filename="model_handling.py"))).load_module().Wav2Vec2ForCTC.from_pretrained(model_name)
processor = Wav2Vec2ProcessorWithLM.from_pretrained(model_name)

# Load an example audio (16k)
audio, sample_rate = torchaudio.load(cached_path(hf_bucket_url(model_name, filename="tst_2010_sample.wav")))
input_data = processor.feature_extractor(audio[0], sampling_rate=16000, return_tensors='pt')

# Infer
output = model(**input_data)

# Output transcript without LM
print(processor.tokenizer.decode(output.logits.argmax(dim=-1)[0].detach().cpu().numpy()))
# and of course there's teams that have a lot more tada structures and among the best are recent graduates of kindergarten

# Output transcript with LM
print(processor.decode(output.logits.cpu().detach().numpy()[0], beam_width=100).text)
# and of course there are teams that have a lot more ta da structures and among the best are recent graduates of kindergarten

Model Parameters License

The ASR model parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode

Contact

nguyenvulebinh@gmail.com

Follow

Downloads last month
19
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train nguyenvulebinh/iwslt-asr-wav2vec-large-4500h