File size: 1,023 Bytes

6d2579f
 
ac05f38
 
 
 
 
 
 
 
 
 
 
1938ded
 
 
c762e1c
6d2579f
ac05f38
 
 
 
 
 
 
 
 
 
314e564
ac05f38
db09d3d
 
ac05f38
db09d3d
 
 
ac05f38

---
license: mit
datasets:
- mozilla-foundation/common_voice_13_0
language:
- ca
- cs
- gl
- hu
- pl
- ta
- th
- uk
tags:
- automatic-speech-recognition
inference: false
pipeline_tag: automatic-speech-recognition
---

## About

Multilingual Distilwhisper allows for better ASR performance in target languages by adding lightweight CLSR modules on top of whisper-small. 
These modules are trained on a mix of cross-entropy (ASR) and knowledge distillation losses, where whisper-large-v2 is used as teacher. 

## Inference

Loader will be made available soon at https://github.com/naver

## Citation
```
@inproceedings{ferraz2024distilwhisper,
  title={Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts},
  author={Ferraz, Thomas Palmeira and Boito, Marcely Zanon and Brun, Caroline and Nikoulina, Vassilina},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2024},
  organization={IEEE}
}
```