|
--- |
|
license: apache-2.0 |
|
language: |
|
- zh |
|
- aa |
|
- af |
|
metrics: |
|
- accuracy |
|
library_name: diffusers |
|
pipeline_tag: text-to-image |
|
tags: |
|
- medical |
|
- code |
|
- suibian |
|
--- |
|
|
|
# Whisper |
|
|
|
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours |
|
of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains **without** the need |
|
for fine-tuning. |
|
|
|
Whisper was proposed in the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356) |
|
by Alec Radford et al. from OpenAI. The original code repository can be found [here](https://github.com/openai/whisper). |
|
|
|
Whisper `large-v3` has the same architecture as the previous large models except the following minor differences: |
|
|
|
1. The input uses 128 Mel frequency bins instead of 80 |
|
2. A new language token for Cantonese |