Whisper-small-ru-pruned

Model info

This is a pruned version of openai/whisper-small model with only russian tokens left. Pruning was made without any fine-tuning. Method from this post was used.

Size

Only 10% tokens was left including special whisper tokens (no language tokens except <|ru|> and <|en|>, no timestamp tokens), 200 most popular tokens from tokenizer and 4000 most popular Russian tokens computed by tokenization of russian text corpus.

Model size is 15% less then original whisper-small:

openai/whisper-small waveletdeboshir/whisper-small-ru-pruned
n of parameters 242 M 205 M
n of parameters (with proj_out layer) 281 M 208 M
model file size 967 Mb 834 Mb
vocab_size 51865 4207

Usage

Model can be used as an original whisper:

>>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
>>> import torchaudio

>>> # load audio
>>> wav, sr = torchaudio.load("audio.wav")

>>> # load model and processor
>>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-small-ru-pruned")
>>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-small-ru-pruned")

>>> input_features = processor(wav[0], sampling_rate=sr, return_tensors="pt").input_features 

>>> # generate token ids
>>> predicted_ids = model.generate(input_features)
>>> # decode token ids to text
>>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
['<|startoftranscript|><|ru|><|transcribe|><|notimestamps|> Начинаем работу.<|endoftext|>']

The context tokens can be removed from the start of the transcription by setting skip_special_tokens=True.

Other pruned whisper models

Metrics

metric dataset openai/whisper-small waveletdeboshir/whisper-small-ru-pruned
WER* golos-test-crowd 0.3358 0.3471
CER* golos-test-crowd 0.1561 0.1444
WER* common_voice_15_0_test 0.1749 0.1748
WER common_voice_15_0_test 0.2492 0.2498
*Metrics were computed after text normalization

You can fine-tune this model on your data to achive better performance.

Colab for vocab pruning

Check https://github.com/waveletdeboshir/whisper-lang-remover

Downloads last month
26
Safetensors
Model size
208M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for waveletdeboshir/whisper-small-ru-pruned

Finetunes
1 model

Collection including waveletdeboshir/whisper-small-ru-pruned

Evaluation results

  • WER on Common Voice 15.0 (Russian part, test)
    self-reported
    24.980
  • WER (without punctuation) on Common Voice 15.0 (Russian part, test)
    self-reported
    17.480