Instructions to use JPSProject/jps-asr-gold-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use JPSProject/jps-asr-gold-v3 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
JPS-ASR Gold v3 โ LoRA Adapter for H.H. Jayapataka Swami Gurumaharaj
A LoRA adapter for openai/whisper-large-v3, fine-tuned to transcribe the voice of
H.H. Jayapataka Swami Gurumaharaj, a senior Vaishnava spiritual leader whose
post-stroke speech patterns are challenging for standard ASR systems.
WER on gold validation set: 30.6% (vs. 42.5% for Gold v2, 100%+ for base whisper-large-v3 without fine-tuning)
Key Improvement over Gold v2
Gold v2 was trained on data where OCR subtitle timestamps (which appear 1-3 seconds after speech) were used as audio boundaries. This caused every training sample to pair the wrong audio with its text label.
Gold v3 was trained on the same corrected transcripts but with timestamps from
stable_whisper.align() โ which uses forced alignment to find the precise moment
each word was spoken. This eliminated the systematic audio-text mismatch and
produced the WER improvement.
Model Details
- Base model:
openai/whisper-large-v3(1.55B params) - Method: LoRA (r=16, alpha=32)
- Target modules:
q_proj,v_proj,k_proj,out_proj - Trainable parameters: ~15.7M of 1.55B
- Training data: 116 YouTube shorts of Gurumaharaj with human-corrected transcripts and forced-alignment timestamps (~85 minutes)
- Training precision: float32
- Hardware: Google Colab T4 (16GB)
Usage
import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
BASE_MODEL = "openai/whisper-large-v3"
LORA_ADAPTER = "JPSProject/jps-asr-gold-v3"
processor = WhisperProcessor.from_pretrained(BASE_MODEL, language="en", task="transcribe")
base = WhisperForConditionalGeneration.from_pretrained(BASE_MODEL, torch_dtype=torch.float16)
model = PeftModel.from_pretrained(base, LORA_ADAPTER).eval()
audio, _ = librosa.load("voice_note.mp3", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
with torch.no_grad():
ids = model.generate(inputs, language="en", task="transcribe",
no_repeat_ngram_size=4, repetition_penalty=1.2)
text = processor.decode(ids[0], skip_special_tokens=True)
print(text)
Live Demo
A Note on this Project
This is a seva (act of devotional service) for H.H. Jayapataka Swami Gurumaharaj. After a stroke in 2008, Gurumaharaj's speech patterns changed significantly, making standard ASR systems largely unusable. This model is a step toward enabling Gurumaharaj to send voice notes to disciples with accurate automatic transcription.
Jai Srila Prabhupada! Jai Srila Gurumaharaj! Haribol!
- Downloads last month
- 3
Model tree for JPSProject/jps-asr-gold-v3
Base model
openai/whisper-large-v3