Be-win/malayalam-speech-with-english-translation-10h
Viewer • Updated • 6.96k • 24
How to use vishal98m/whisper-small-malayalam with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="vishal98m/whisper-small-malayalam") # Load model directly
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("vishal98m/whisper-small-malayalam")
model = AutoModelForSpeechSeq2Seq.from_pretrained("vishal98m/whisper-small-malayalam")This model is finetuned version of whisper-small on malayalam language for speech to text translation
This model is purely for learning purposes This model in its present form can't be used for live translation, as its only trained for 4000 steps due to compute limitations as its fine-tuned on 8 GB VRAM. For Better results its recomended to train with a larger dataset and possibly upto 10000 steps.
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
Use the code below to get started with the model.
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
model_id = "vishal98m/whisper-small-malayalam"
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load model and processor
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device)
processor = AutoProcessor.from_pretrained(model_id)
# Create pipeline
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
device=0 if device == "cuda" else -1,
)
# Run inference (Malayalam speech → English translation)
result = pipe(
"sample_audio.wav",
generate_kwargs={
"task": "translate",
"language": "malayalam"
}
)
print("Translation:")
print(result["text"])
Base model
openai/whisper-small