metadata
language: ca
datasets:
- projecte-aina/whisper-large-v3-ca-3catparla
tags:
- audio
- automatic-speech-recognition
- catalan
- whisper-large-v3
- projecte-aina
- barcelona-supercomputing-center
- bsc
license: apache-2.0
model-index:
- name: whisper-large-v3-ca-3catparla
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: 3CatParla (Test)
type: projecte-aina/3catparla_asr
split: test
args:
language: ca
metrics:
- name: WER
type: wer
value: 0.96
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: 3CatParla (Dev)
type: projecte-aina/3catparla_asr
split: dev
args:
language: ca
metrics:
- name: WER
type: wer
value: 0.92
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Mozilla Common Voice 17.0 (Test)
type: mozilla-foundation/common_voice_17_0
split: test
args:
language: ca
metrics:
- name: WER
type: wer
value: 10.32
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Mozilla Common Voice 17.0 (Dev)
type: mozilla-foundation/common_voice_17_0
split: validation
args:
language: ca
metrics:
- name: WER
type: wer
value: 9.26
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Balearic fem)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Balearic female
args:
language: ca
metrics:
- name: WER
type: wer
value: 12.25
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Balearic male)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Balearic male
args:
language: ca
metrics:
- name: WER
type: wer
value: 12.18
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Central fem)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Central female
args:
language: ca
metrics:
- name: WER
type: wer
value: 8.51
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Central male)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Central male
args:
language: ca
metrics:
- name: WER
type: wer
value: 8.73
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Northern fem)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Northern female
args:
language: ca
metrics:
- name: WER
type: wer
value: 8.09
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Northern male)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Northern male
args:
language: ca
metrics:
- name: WER
type: wer
value: 8.28
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Northwestern fem)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Northwestern female
args:
language: ca
metrics:
- name: WER
type: wer
value: 7.88
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Northwestern male)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Northwestern male
args:
language: ca
metrics:
- name: WER
type: wer
value: 8.44
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Valencian fem)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Valencian female
args:
language: ca
metrics:
- name: WER
type: wer
value: 9.58
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CV Benchmark Catalan Accents (Valencian male)
type: projecte-aina/commonvoice_benchmark_catalan_accents
split: Valencian male
args:
language: ca
metrics:
- name: WER
type: wer
value: 9.1
whisper-large-v3-ca-3catparla
Paper: 3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition
The "whisper-large-v3-ca-3catparla" is an acoustic model suitable for Automatic Speech Recognition in Catalan. It is the result of finetuning the model "openai/whisper-large-v3" with 710 hours of Catalan data released by the Projecte AINA from Barcelona, Spain.
The specific dataset used to create the model is called "3CatParla".
The fine-tuning process was perform during July (2024) in the servers of the Barcelona Supercomputing Center by Carlos Daniel Hernández Mena.
Evaluation
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor
#Load the processor and model.
MODEL_NAME="projecte-aina/whisper-large-v3-ca-3catparla"
processor = WhisperProcessor.from_pretrained(MODEL_NAME)
model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
#Load the dataset
from datasets import load_dataset, load_metric, Audio
ds=load_dataset("projecte-aina/whisper-large-v3-ca-3catparla",split='test')
#Downsample to 16kHz
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
#Process the dataset
def map_to_pred(batch):
audio = batch["audio"]
input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])
with torch.no_grad():
predicted_ids = model.generate(input_features.to("cuda"))[0]
transcription = processor.decode(predicted_ids)
batch["prediction"] = processor.tokenizer._normalize(transcription)
return batch
#Do the evaluation
result = ds.map(map_to_pred)
#Compute the overall WER now.
from evaluate import load
wer = load("wer")
WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
print(WER)
Test Result: 0.96
BibTeX entry and citation info
- When publishing results based on these models please refer to:
@misc{mena2024whisperlarge3catparla,
title={Acoustic Model in Catalan: whisper-large-v3-ca-3catparla.},
author={Hernandez Mena, Carlos Daniel},
organization={Barcelona Supercomputing Center},
url={https://huggingface.co/projecte-aina/whisper-large-v3-ca-3catparla},
year={2024}
}
Acknowledgements
This model has been promoted and financed by the Government of Catalonia through the Aina project.