Whisper fine-tuned for audios

Fine-tuned from Ari/whisper-small-es to transcribe.

Base model: Ari/whisper-small-es
Train / eval clips: 95 / 10

Eval

exact match: 0.900
char accuracy: 0.975

Usage

import numpy as np
from pydub import AudioSegment
from transformers import WhisperForConditionalGeneration, WhisperProcessor

processor = WhisperProcessor.from_pretrained("whisper-small-esl")
model = WhisperForConditionalGeneration.from_pretrained("whisper-small-esl")

audio = AudioSegment.from_file("audio.mp3").set_frame_rate(16000).set_channels(1)
samples = np.frombuffer(audio.raw_data, np.int16).astype(np.float32) / 32768.0
features = processor(samples, sampling_rate=16000, return_tensors="pt").input_features
text = processor.batch_decode(model.generate(features), skip_special_tokens=True)[0]
print(text)

Downloads last month: 69

Safetensors

Model size

0.2B params

Tensor type

F16

Model tree for vk496/whisper-small-esl

Base model

Ari/whisper-small-es

Finetuned

(1)

this model