File size: 2,016 Bytes
0b5509c 43b9e58 0b5509c 43b9e58 0b5509c 43b9e58 0b5509c 43b9e58 0b5509c 43b9e58 0b5509c 43b9e58 2502577 43b9e58 0b5509c 43b9e58 0b5509c 43b9e58 0b5509c 43b9e58 2502577 0b5509c 43b9e58 0b5509c 43b9e58 0b5509c 2502577 0b5509c 43b9e58 0b5509c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
---
language: es
datasets:
- common_voice
metrics:
- wer
- cer
tags:
- audio
- automatic-speech-recognition
- speech
- xlsr-fine-tuning-week
license: apache-2.0
---
# Wav2Vec2-Large-XLSR-53-Spanish-With-LM
This is a model copy of [Wav2Vec2-Large-XLSR-53-Spanish](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-spanish)
that has language model support.
This model card can be seen as a demo for the [pyctcdecode](https://github.com/kensho-technologies/pyctcdecode) integration
with Transformers led by [this PR](https://github.com/huggingface/transformers/pull/14339). The PR explains in-detail how the
integration works.
In a nutshell: This PR adds a new Wav2Vec2WithLMProcessor class as drop-in replacement for Wav2Vec2Processor.
The only change from the existing ASR pipeline will be:
```diff
-from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
+from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
from datasets import load_dataset
ds = load_dataset("common_voice", "es", split="test", streaming=True)
sample = next(iter(ds))
model = Wav2Vec2ForCTC.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
-processor = Wav2Vec2Processor.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
+processor = Wav2Vec2ProcessorWithLM.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
input_values = processor(sample["audio"]["array"], return_tensors="pt").input_values
logits = model(input_values).logits
-prediction_ids = torch.argmax(logits, dim=-1)
-transcription = processor.batch_decode(prediction_ids)
+transcription = processor.batch_decode(logits)
print(transcription)
```
| Model | WER | CER |
| ------------- | ------------- | ------------- |
| jonatasgrosman/wav2vec2-large-xlsr-53-spanish | **8.81%** | **2.70%** |
| pcuenq/wav2vec2-large-xlsr-53-es | 10.55% | 3.20% |
| facebook/wav2vec2-large-xlsr-53-spanish | 16.99% | 5.40% |
| mrm8488/wav2vec2-large-xlsr-53-spanish | 19.20% | 5.96% |
|