Whisper small adapters model for Greek transcription

We added adapters to whisper-small model then we finetuned it on Greek ASR. During training, the model is frozen and only the adapters are being trained. When trying to transcribe Greek, we need to activate the adapters, otherwise we can ignore the adapters and use the original whisper model.

How to use

Start by installing transformers with Whisper model with added adapters

git clone https://gitlab.com/horizon-europe-voxreality/multilingual-translation/speech-translation-demo.git
cd speech-translation-demo
# You might need to switch to dev branch
pip install -e transformers

The parameter use_adapters is used to decide whether we will use the adapters or not. It needs to be set to True only in the case of Greek.

from transformers import WhisperProcessor, WhisperForConditionalGenerationWithAdapters
from datasets import Audio, load_dataset

# load model and processor
processor = WhisperProcessor.from_pretrained("voxreality/whisper-small-el-adapters")
model = WhisperForConditionalGenerationWithAdapters.from_pretrained("voxreality/whisper-small-el-adapters")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="greek", task="transcribe")

# load streaming dataset and read first audio sample
ds = load_dataset("mozilla-foundation/common_voice_11_0", "el", split="test", streaming=True)
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
input_speech = next(iter(ds))["audio"]
input_features = processor(input_speech["array"], sampling_rate=input_speech["sampling_rate"], return_tensors="pt").input_features

# Set use_adapters to False for languages other than Greek.
# generate token ids
predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids, use_adapters=True)

# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

You can also use an HF pipeline:

from transformers import pipeline
from datasets import Audio, load_dataset

ds = load_dataset("mozilla-foundation/common_voice_11_0", "el", split="test", streaming=True)
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
input_speech = next(iter(ds))["audio"]

model = WhisperForConditionalGenerationWithAdapters.from_pretrained("voxreality/whisper-small-el-adapters")

pipe = pipeline("automatic-speech-recognition", model=model, tokenizer="voxreality/whisper-small-el-adapters",
                  "voxreality/whisper-small-el-adapters", device='cpu', batch_size=32)

transcription = pipe(input_speech['array'], generate_kwargs = {"language":f"<|el|>","task": "transcribe", "use_adapters": True})

voxreality
/

whisper-small-el-adapters

Whisper small adapters model for Greek transcription

How to use

Dataset used to train voxreality/whisper-small-el-adapters