Model card error

There’s an error in the yaml metadata in this model card. If you’re the model author, please log in to check the list of errors and warnings.

Wav2Vec2-Large-XLSR-53-Swahili

Fine-tuned facebook/wav2vec2-large-xlsr-53 on Swahili using the following datasets:

When using this model, make sure that your speech input is sampled at 16kHz.

Usage

The model can be used directly (without a language model) as follows:

import torch
import torchaudio
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor


processor = Wav2Vec2Processor.from_pretrained("alokmatta/wav2vec2-large-xlsr-53-sw")

model = Wav2Vec2ForCTC.from_pretrained("alokmatta/wav2vec2-large-xlsr-53-sw").to("cuda")

resampler = torchaudio.transforms.Resample(48_000, 16_000)

resampler = torchaudio.transforms.Resample(orig_freq=48_000, new_freq=16_000)

def load_file_to_data(file):
    batch = {}
    speech, _ = torchaudio.load(file)
    batch["speech"] = resampler.forward(speech.squeeze(0)).numpy()
    batch["sampling_rate"] = resampler.new_freq
    return batch


def predict(data):
    features = processor(data["speech"], sampling_rate=data["sampling_rate"], padding=True, return_tensors="pt")
    input_values = features.input_values.to("cuda")
    attention_mask = features.attention_mask.to("cuda")
    with torch.no_grad():
        logits = model(input_values, attention_mask=attention_mask).logits
    pred_ids = torch.argmax(logits, dim=-1)
    return processor.batch_decode(pred_ids)

predict(load_file_to_data('./demo.wav'))

Test Result: 40 %

Training

The script used for training can be found here

Downloads last month
11
Hosted inference API
or or
This model can be loaded on the Inference API on-demand.

Evaluation results

Model card error

This model's model-index metadata is invalid: Schema validation error. properties must have property 'type'