File size: 8,427 Bytes

9f80eba

---
language: ca
datasets:
  - projecte-aina/whisper-large-v3-ca-3catparla
tags:
  - audio
  - automatic-speech-recognition
  - catalan
  - whisper-large-v3
  - projecte-aina
  - barcelona-supercomputing-center
  - bsc
license: apache-2.0
model-index:
  - name: whisper-large-v3-ca-3catparla
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: 3CatParla (Test)
          type: projecte-aina/3catparla_asr
          split: test
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 0.96
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: 3CatParla (Dev)
          type: projecte-aina/3catparla_asr
          split: dev
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 0.92
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Mozilla Common Voice 17.0 (Test)
          type: mozilla-foundation/common_voice_17_0
          split: test
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 10.32
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Mozilla Common Voice 17.0 (Dev)
          type: mozilla-foundation/common_voice_17_0
          split: validation
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 9.26
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice Benchmark Catalan Accents
          type: projecte-aina/commonvoice_benchmark_catalan_accents
          split: Balearic female
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 12.25
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice Benchmark Catalan Accents
          type: projecte-aina/commonvoice_benchmark_catalan_accents
          split: Balearic male
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 12.18
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice Benchmark Catalan Accents
          type: projecte-aina/commonvoice_benchmark_catalan_accents
          split: Central female
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 8.51
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice Benchmark Catalan Accents
          type: projecte-aina/commonvoice_benchmark_catalan_accents
          split: Central male
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 8.73
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice Benchmark Catalan Accents
          type: projecte-aina/commonvoice_benchmark_catalan_accents
          split: Northern female
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 8.09
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice Benchmark Catalan Accents
          type: projecte-aina/commonvoice_benchmark_catalan_accents
          split: Northern male
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 8.28
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice Benchmark Catalan Accents
          type: projecte-aina/commonvoice_benchmark_catalan_accents
          split: Northwestern female
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 7.88
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice Benchmark Catalan Accents
          type: projecte-aina/commonvoice_benchmark_catalan_accents
          split: Northwestern male
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 8.44
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice Benchmark Catalan Accents
          type: projecte-aina/commonvoice_benchmark_catalan_accents
          split: Valencian female
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 9.58
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice Benchmark Catalan Accents
          type: projecte-aina/commonvoice_benchmark_catalan_accents
          split: Valencian male
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 9.10    
---
# whisper-large-v3-ca-3catparla
**Paper:** [3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition](https://iberspeech.tech/)

The "whisper-large-v3-ca-3catparla" is an acoustic model suitable for Automatic Speech Recognition in Catalan. It is the result of fine-tuning the model "openai/whisper-large-v3" with 710 hours of Catalan data released by the Projecte AINA (https://projecteaina.cat/) from Barcelona, Spain.

The specific dataset used to create the model is called ["3Catparla"](projecte-aina/whisper-large-v3-ca-3catparla).

The fine-tuning process was perform during July (2024) in the servers of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Carlos Daniel Hernández Mena](https://huggingface.co/carlosdanielhernandezmena).

# Evaluation
```python
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor

#Load the processor and model.
MODEL_NAME="projecte-aina/whisper-large-v3-ca-3catparla"
processor = WhisperProcessor.from_pretrained(MODEL_NAME)
model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")

#Load the dataset
from datasets import load_dataset, load_metric, Audio
ds=load_dataset("projecte-aina/whisper-large-v3-ca-3catparla",split='test')

#Downsample to 16kHz
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))

#Process the dataset
def map_to_pred(batch):
	audio = batch["audio"]
	input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
	batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])

	with torch.no_grad():
		predicted_ids = model.generate(input_features.to("cuda"))[0]
	
	transcription = processor.decode(predicted_ids)
	batch["prediction"] = processor.tokenizer._normalize(transcription)
	
	return batch
	
#Do the evaluation
result = ds.map(map_to_pred)

#Compute the overall WER now.
from evaluate import load

wer = load("wer")
WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
print(WER)
```
**Test Result**: 0.96

# BibTeX entry and citation info
* When publishing results based on these models please refer to:
```bibtex
@misc{mena2024whisperlarge3catparla,
      title={Acoustic Model in Catalan: whisper-large-v3-ca-3catparla.}, 
      author={Hernandez Mena, Carlos Daniel},
      organization={Barcelona Supercomputing Center},
      url={https://huggingface.co/projecte-aina/whisper-large-v3-ca-3catparla},
      year={2024}
}
```
# Acknowledgements

This model has been promoted and financed by the Government of Catalonia through the Aina project.