Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

JPS-ASR Gold v3 โ€” LoRA Adapter for H.H. Jayapataka Swami Gurumaharaj

A LoRA adapter for openai/whisper-large-v3, fine-tuned to transcribe the voice of H.H. Jayapataka Swami Gurumaharaj, a senior Vaishnava spiritual leader whose post-stroke speech patterns are challenging for standard ASR systems.

WER on gold validation set: 30.6% (vs. 42.5% for Gold v2, 100%+ for base whisper-large-v3 without fine-tuning)

Key Improvement over Gold v2

Gold v2 was trained on data where OCR subtitle timestamps (which appear 1-3 seconds after speech) were used as audio boundaries. This caused every training sample to pair the wrong audio with its text label.

Gold v3 was trained on the same corrected transcripts but with timestamps from stable_whisper.align() โ€” which uses forced alignment to find the precise moment each word was spoken. This eliminated the systematic audio-text mismatch and produced the WER improvement.

Model Details

  • Base model: openai/whisper-large-v3 (1.55B params)
  • Method: LoRA (r=16, alpha=32)
  • Target modules: q_proj, v_proj, k_proj, out_proj
  • Trainable parameters: ~15.7M of 1.55B
  • Training data: 116 YouTube shorts of Gurumaharaj with human-corrected transcripts and forced-alignment timestamps (~85 minutes)
  • Training precision: float32
  • Hardware: Google Colab T4 (16GB)

Usage

import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel

BASE_MODEL   = "openai/whisper-large-v3"
LORA_ADAPTER = "JPSProject/jps-asr-gold-v3"

processor = WhisperProcessor.from_pretrained(BASE_MODEL, language="en", task="transcribe")
base      = WhisperForConditionalGeneration.from_pretrained(BASE_MODEL, torch_dtype=torch.float16)
model     = PeftModel.from_pretrained(base, LORA_ADAPTER).eval()

audio, _ = librosa.load("voice_note.mp3", sr=16000)
inputs    = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

with torch.no_grad():
    ids  = model.generate(inputs, language="en", task="transcribe",
                          no_repeat_ngram_size=4, repetition_penalty=1.2)
text = processor.decode(ids[0], skip_special_tokens=True)
print(text)

Live Demo

JPS-ASR Space on HuggingFace

A Note on this Project

This is a seva (act of devotional service) for H.H. Jayapataka Swami Gurumaharaj. After a stroke in 2008, Gurumaharaj's speech patterns changed significantly, making standard ASR systems largely unusable. This model is a step toward enabling Gurumaharaj to send voice notes to disciples with accurate automatic transcription.

Jai Srila Prabhupada! Jai Srila Gurumaharaj! Haribol!

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for JPSProject/jps-asr-gold-v3

Adapter
(211)
this model