Whisper Model for Incorrect English Phrases

Overview

This fine-tuned version of OpenAI’s Whisper model is specifically trained to handle incorrect English phrases. It is designed to transcribe and process non-standard or erroneous English input, including mispronunciations, grammatical mistakes, slang, and non-native speaker errors. This model helps improve transcription accuracy in scenarios where speakers use incorrect or informal English, making it useful in language learning, transcription of casual conversations, or analyzing spoken communication from non-native English speakers.

Usage Guide

This project was executed on an Ubuntu 22.04.3 system running Linux kernel 6.8.0-40-generic.

Whisper large-v3 is supported in Hugging Face Transformers. To run the model, first install the Transformers library. For this example, we'll also install Hugging Face Datasets to load toy audio dataset from the Hugging Face Hub, and Hugging Face Accelerate to reduce the model loading time:

pip install --upgrade pip
pip install --upgrade transformers datasets[audio] accelerate

The model can be used with the pipeline class to transcribe audios of arbitrary length:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]

result = pipe(sample)
print(result["text"])

Transducens
/

distil_whisper-v3-LoRA-en_students

Whisper Model for Incorrect English Phrases

Overview

Usage Guide

Model tree for Transducens/distil_whisper-v3-LoRA-en_students

Collection including Transducens/distil_whisper-v3-LoRA-en_students

DeMINT