Whisper fine-tuned for audios

Fine-tuned from Ari/whisper-small-es to transcribe.

  • Base model: Ari/whisper-small-es
  • Train / eval clips: 95 / 10

Eval

  • exact match: 0.900
  • char accuracy: 0.975

Usage

import numpy as np
from pydub import AudioSegment
from transformers import WhisperForConditionalGeneration, WhisperProcessor

processor = WhisperProcessor.from_pretrained("whisper-small-esl")
model = WhisperForConditionalGeneration.from_pretrained("whisper-small-esl")

audio = AudioSegment.from_file("audio.mp3").set_frame_rate(16000).set_channels(1)
samples = np.frombuffer(audio.raw_data, np.int16).astype(np.float32) / 32768.0
features = processor(samples, sampling_rate=16000, return_tensors="pt").input_features
text = processor.batch_decode(model.generate(features), skip_special_tokens=True)[0]
print(text)
Downloads last month
69
Safetensors
Model size
0.2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vk496/whisper-small-esl

Finetuned
(1)
this model