Whisper without numbers
Whisper models with number tokens removed (for normalized recognition results). These models will recognize numbers as spoken words.
2 items
This is a version of openai/whisper-large-v3-turbo model without number tokens (token ids corresponding to numbers are excluded). NO fine-tuning was used.
Phrases with spoken numbers will be transcribed with numbers as words. It can be useful for TTS data preparation.
Example: Instead of "25" this model will transcribe phrase as "twenty five".
version 4.45.2
Model can be used as an original whisper:
>>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
>>> import torchaudio
>>> # load audio
>>> wav, sr = torchaudio.load("audio.wav")
>>> # load model and processor
>>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-turbo-no-numbers")
>>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-turbo-no-numbers")
>>> input_features = processor(wav[0], sampling_rate=sr, return_tensors="pt").input_features
>>> # generate token ids
>>> predicted_ids = model.generate(input_features)
>>> # decode token ids to text
>>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
['<|startoftranscript|><|en|><|transcribe|><|notimestamps|> Twenty seven years. <|endoftext|>']
The context tokens can be removed from the start of the transcription by setting skip_special_tokens=True
Base model