--- license: apache-2.0 language: - zh - aa - af metrics: - accuracy library_name: diffusers pipeline_tag: text-to-image tags: - medical - code - suibian --- # Whisper Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains **without** the need for fine-tuning. Whisper was proposed in the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356) by Alec Radford et al. from OpenAI. The original code repository can be found [here](https://github.com/openai/whisper). Whisper `large-v3` has the same architecture as the previous large models except the following minor differences: 1. The input uses 128 Mel frequency bins instead of 80 2. A new language token for Cantonese