Edit model card

Model Description

  • Developed by: Neura company
  • Funded by: Neura
  • Model type: Whisper Base
  • Language(s) (NLP): Persian

Model Architecture

Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model. It is a pre-trained model for automatic speech recognition (ASR) and speech translation.


Check out the Google Colab demo to run NeuraSpeech ASR on a free-tier Google Colab instance: Open In Colab

make sure these packages are installed:

from IPython.display import Audio, display
display(Audio('persian_audio.mp3', rate = 32_000,autoplay=True))
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

# load model and processor
processor = WhisperProcessor.from_pretrained("Neurai/NeuraSpeech_WhisperBase")
model = WhisperForConditionalGeneration.from_pretrained("Neurai/NeuraSpeech_WhisperBase")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="fa", task="transcribe")

array, sample_rate = librosa.load('persian_audio.mp3')
sr = 16000
array = librosa.to_mono(array)
array = librosa.resample(array, orig_sr=sample_rate, target_sr=16000)
input_features = processor(array, sampling_rate=sr, return_tensors="pt").input_features

# generate token ids
predicted_ids = model.generate(input_features)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids,)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

trascribed text :

او خواهان آزاد کردن بردگان بود

More Information


Model Card Authors

Esmaeil Zahedi, Mohsen Yazdinejad

Model Card Contact


Downloads last month
Model size
72.6M params
Tensor type
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using Neurai/NeuraSpeech_WhisperBase 1