Whisper Model for Incorrect English Phrases

Overview

This fine-tuned version of OpenAI’s Whisper model is specifically trained to handle incorrect English phrases. It is designed to transcribe and process non-standard or erroneous English input, including mispronunciations, grammatical mistakes, slang, and non-native speaker errors. This model helps improve transcription accuracy in scenarios where speakers use incorrect or informal English, making it useful in language learning, transcription of casual conversations, or analyzing spoken communication from non-native English speakers.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 28
  • eval_batch_size: 28
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 50
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
1.5189 0.4444 500 1.1913 25.9108
1.1727 0.8889 1000 0.9531 24.5396
1.1341 1.3333 1500 0.8688 22.2761
1.0152 1.7778 2000 0.8174 20.8792
1.0589 2.2222 2500 0.7855 20.7595
0.9793 2.6667 3000 0.7611 22.2846
0.9594 3.1111 3500 0.7442 20.3860
1.0031 3.5556 4000 0.7303 18.5045
0.9525 4.0 4500 0.7199 18.1054
0.8729 4.4444 5000 0.7105 19.3170
1.0031 4.8889 5500 0.7028 19.7446
0.9273 5.3333 6000 0.6966 19.7189
0.9174 5.7778 6500 0.6896 18.4475
0.8842 6.2222 7000 0.6839 18.4361

Usage Guide

This project was executed on an Ubuntu 22.04.3 system running Linux kernel 6.8.0-40-generic.

Whisper large-v3 is supported in Hugging Face Transformers. To run the model, first install the Transformers library. For this example, we'll also install Hugging Face Datasets to load toy audio dataset from the Hugging Face Hub, and Hugging Face Accelerate to reduce the model loading time:

pip install --upgrade pip
pip install --upgrade transformers datasets[audio] accelerate

The model can be used with the pipeline class to transcribe audios of arbitrary length:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

def download_adapter_model():
  model_name = "whisper-v3-LoRA-en_students"
  print(f"Downloading the adapter model '{model_name}' from the Hugging Face Hub.", flush=True)

  # Define the path for the directory
  local_directory = os.path.expanduser("~/.cache/huggingface/hub")

  # Check if the directory exists
  if not os.path.exists(local_directory):
    # If it doesn't exist, create it
      os.makedirs(local_directory)
      print(f"Directory '{local_directory}' created.", flush=True)
  else:
    print(f"Directory '{local_directory}' already exists.", flush=True)

  repo_id = f"Transducens/{model_name}"
  repo_adapter_dir = f"{model_name}/checkpoint-5000/adapter_model"
  repo_filename_config = f"{repo_adapter_dir}/adapter_config.json"
  repo_filename_tensors = f"{repo_adapter_dir}/adapter_model.safetensors"

  adapter_config = hf_hub_download(repo_id=repo_id, filename=repo_filename_config, local_dir=local_directory)
  adapter_model_tensors = hf_hub_download(repo_id=repo_id, filename=repo_filename_tensors, local_dir=local_directory)

  print(f"Dowloaded the adapter model '{model_name}' from the Hugging Face Hub.", flush=True)

  return os.path.join(local_directory, repo_adapter_dir)

peft_model_id = adapter_path # Use the same model ID as before.
peft_config = PeftConfig.from_pretrained(peft_model_id)
model = WhisperForConditionalGeneration.from_pretrained(
peft_config.base_model_name_or_path, load_in_8bit=False)

model = PeftModel.from_pretrained(model, peft_model_id)
model.generation_config.language = "<|en|>"
model.generation_config.task = "transcribe"

tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-large-v3", task="transcribe")
feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-large-v3")

pipe = pipeline(model=model, tokenizer=tokenizer, feature_extractor=feature_extractor, task="automatic-speech-recognition", device=device)

Framework versions

  • PEFT 0.11.1
  • Transformers 4.42.4
  • Pytorch 2.1.0+cu118
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Model tree for Transducens/error-preserving-whisper-distilled

Finetuned
(11)
this model

Collection including Transducens/error-preserving-whisper-distilled