Amharic ASR using fine-tuned Wav2vec2 XLSR-53
This is a finetuned version of facebook/wav2vec2-large-xlsr-53 trained on the Amharic Speech Corpus. This corpus was produced by Abate et al. (2005) (10.21437/Interspeech.2005-467).
The model achieves a WER of 26% and a CER of 7% on the validation set of the Amharic Readspeech data.
Usage
The model can be used as follows:
import librosa
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
model = Wav2Vec2ForCTC.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic")
processor = Wav2Vec2Processor.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic")
audio, _ = librosa.load("/path/to/audio.wav", sr=16000)
input_values = processor(
audio.squeeze(),
sampling_rate=16000,
return_tensors="pt"
).input_values
model.eval()
with torch.no_grad():
logits = model(input_values).logits
preds = logits.argmax(-1)
texts = processor.batch_decode(preds)
print(texts[0])
Training
The code to train this model is available at https://github.com/agkphysics/amharic-asr.
- Downloads last month
- 233
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.