Model Details

Model Description

This model is finetuned version of whisper-small on malayalam language for speech to text translation

Bias, Risks, and Limitations

This model is purely for learning purposes This model in its present form can't be used for live translation, as its only trained for 4000 steps due to compute limitations as its fine-tuned on 8 GB VRAM. For Better results its recomended to train with a larger dataset and possibly upto 10000 steps.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

How to Get Started with the Model

Use the code below to get started with the model.

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

model_id = "vishal98m/whisper-small-malayalam"

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load model and processor
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device)
processor = AutoProcessor.from_pretrained(model_id)

# Create pipeline
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    device=0 if device == "cuda" else -1,
)

# Run inference (Malayalam speech → English translation)
result = pipe(
    "sample_audio.wav",
    generate_kwargs={
        "task": "translate",
        "language": "malayalam"
    }
)

print("Translation:")
print(result["text"])

Evaluation

Eval Loss: 1.384
Word Error Rate (WER): 60.8
Orthographic WER: 63.29
BLEU Score: 31.77

Framework versions

Transformers 5.1.0
Pytorch 2.10.0+cu128
Datasets 2.16.1
Tokenizers 0.22.2

Downloads last month: 10

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for vishal98m/whisper-small-malayalam

Base model

openai/whisper-small

Finetuned

(3578)

this model

vishal98m
/

whisper-small-malayalam