Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Usage

The model can be used directly (without a language model) as follows:

language: - ne tags: - speech-to-text

import soundfile as sf
import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import argparse
def parse_transcription(wav_file):
    # load pretrained model
    processor = Wav2Vec2Processor.from_pretrained("shniranjan/wav2vec2-large-xlsr-300m-nepali")
    model = Wav2Vec2ForCTC.from_pretrained("shniranjan/wav2vec2-large-xlsr-300m-nepali")
    # load audio
    audio_input, sample_rate = sf.read(wav_file)
    # pad input values and return pt tensor
    input_values = processor(audio_input, sampling_rate=sample_rate, return_tensors="pt").input_values
    # INFERENCE
    # retrieve logits & take argmax
    logits = model(input_values).logits
    predicted_ids = torch.argmax(logits, dim=-1)
    # transcribe
    transcription = processor.decode(predicted_ids[0], skip_special_tokens=True)
    print(transcription)
Downloads last month
56
Safetensors
Model size
316M params
Tensor type
F32
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using shniranjan/wav2vec2-large-xlsr-300m-nepali 31