|
--- |
|
language: en |
|
datasets: |
|
- librispeech_asr |
|
tags: |
|
- audio |
|
- automatic-speech-recognition |
|
license: apache-2.0 |
|
--- |
|
|
|
# Wav2Vec2-Base-100h |
|
|
|
This is a fork of [```facebook/wav2vec2-base-100h```](https://huggingface.co/facebook/wav2vec2-base-100h) |
|
|
|
### Changes & Notes |
|
1. Document reproducible evaluation (below) to new transformer and datasets version. |
|
2. Use batch size of 1 to reproduce results. |
|
3. Validated with ```transformers v4.15.0```, ```datasets 1.18.0``` |
|
4. You may need to manually install pypkg ```librosa```, ```jiwer``` |
|
|
|
|
|
## Evaluation |
|
|
|
This code snippet shows how to evaluate **facebook/wav2vec2-base-100h** on LibriSpeech's "clean" and "other" test data. |
|
|
|
```python |
|
from datasets import load_dataset |
|
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor |
|
import soundfile as sf |
|
import torch |
|
from jiwer import wer |
|
|
|
librispeech_eval = load_dataset("librispeech_asr", "clean", split="test") |
|
# librispeech_eval = load_dataset("librispeech_asr", "other", split="test") |
|
|
|
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-100h").to("cuda") |
|
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-100h") |
|
|
|
def map_to_array(batch): |
|
# speech, _ = sf.read(batch["file"]) |
|
# batch["speech"] = speech |
|
batch["speech"] = batch['audio']['array'] |
|
return batch |
|
|
|
librispeech_eval = librispeech_eval.map(map_to_array) |
|
|
|
def map_to_pred(batch): |
|
input_values = processor(batch["speech"], return_tensors="pt", padding="longest").input_values |
|
with torch.no_grad(): |
|
logits = model(input_values.to("cuda")).logits |
|
|
|
predicted_ids = torch.argmax(logits, dim=-1) |
|
transcription = processor.batch_decode(predicted_ids) |
|
batch["transcription"] = transcription |
|
return batch |
|
|
|
result = librispeech_eval.map(map_to_pred, batched=True, batch_size=1, remove_columns=["speech"]) |
|
|
|
print("WER:", wer(result["text"], result["transcription"])) |
|
``` |
|
|
|
*Result (WER)*: |
|
|
|
| "clean/test" | "other/test" | |
|
|--------------| ------------| |
|
| 6.1 | 13.5 | |
|
|