Introduction

This model is OpenAI Whisper large-v3-turbo, finetuned on ~770 hours of manually created subtitles from Estonian TV (ETV). Therefore, this model does not always create verbatim (word-by-word) subtitles but often rephrases the sentences and compresses text, especially in the case of spontaneous speech, hestitations, repetitions, etc. However, the length of the generated text chunks almost always conforms to the ETV subtitle requirements (48 characters per line).

Usage

It's a finetuned vesion of Whisper large-v3-turbo and can be therefore used via Hugging Face 🤗 Transformers. To run the model, first install the Transformers library. For this example, we'll also install 🤗 Accelerate to reduce the model loading time:

pip install --upgrade pip
pip install --upgrade transformers accelerate

The model can be used with the pipeline class to transcribe audios of arbitrary length:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "TalTechNLP/whisper-large-v3-turbo-et-subs"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

audio = "sample.mp3" 

result = pipe(sample, generate_kwargs={"task": "transcribe", "language": "et"})
print(result)

Evaluation results

TODO

Downloads last month
255
Safetensors
Model size
875M params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for TalTechNLP/whisper-large-v3-turbo-et-subs

Finetuned
(172)
this model

Space using TalTechNLP/whisper-large-v3-turbo-et-subs 1