--- language: en datasets: - timit_asr tags: - audio - automatic-speech-recognition license: apache-2.0 widget: - label: Sample 1 (from LibriSpeech) src: https://cdn-media.huggingface.co/speech_samples/sample1.flac --- # Wav2Vec2-Base-TIMIT Fine-tuned [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the [timit_asr dataset](https://huggingface.co/datasets/timit_asr). When using this model, make sure that your speech input is sampled at 16kHz. ## Usage The model can be used directly (without a language model) as follows: ```python import torch from datasets import load_dataset import soundfile as sf from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor model_name = "elgeish/wav2vec2-base-timit" processor = Wav2Vec2Processor.from_pretrained(model_name, do_lower_case=True) model = Wav2Vec2ForCTC.from_pretrained(model_name) dataset = load_dataset("timit_asr", split="test[:10]") def prepare_example(example): example["speech"], _ = sf.read(example["file"]) return example dataset = dataset.map(prepare_example, remove_columns=["file"]) inputs = processor(dataset["speech"], sampling_rate=16000, return_tensors="pt", padding="longest") with torch.no_grad(): predicted_ids = torch.argmax(model(inputs.input_values).logits, dim=-1) predicted_transcripts = processor.tokenizer.batch_decode(predicted_ids) for reference, predicted in zip(dataset["text"], predicted_transcripts): print("reference:", reference) print("predicted:", predicted) print("--") ``` Here's the output: ``` reference: The bungalow was pleasantly situated near the shore. predicted: the bunglow was plesntly situated near the shor -- reference: Don't ask me to carry an oily rag like that. predicted: don't ask me to carry an oily rag like that -- reference: Are you looking for employment? predicted: are you oking for employment -- reference: She had your dark suit in greasy wash water all year. predicted: she had your dark suit in greasy wash water all year -- reference: At twilight on the twelfth day we'll have Chablis. predicted: at twilight on the twelfth day we'll have shiple -- reference: Eating spinach nightly increases strength miraculously. predicted: eating spanage nightly increases strength moraculously -- reference: Got a heck of a buy on this, dirt cheap. predicted: got a heck of a by on this dert cheep -- reference: The scalloped edge is particularly appealing. predicted: the scaliped edge iuse particularly appeling -- reference: A big goat idly ambled through the farmyard. predicted: a big goat idely ambled through the farmyard -- reference: This group is secularist and their program tends to be technological. predicted: this croup is secularist and their program tens to be technological -- ``` ## Fine-Tuning Script You can find the script used to produce this model [here](https://github.com/elgeish/transformers/blob/f2b98f876b040bab3c3db8561ec39c1abb2c733c/examples/research_projects/wav2vec2/finetune_base_timit_asr.sh).