Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Usage

The model can be used directly (without a language model) as follows:

language: - ne tags: - speech-to-text

import soundfile as sf
import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import argparse
def parse_transcription(wav_file):
    # load pretrained model
    processor = Wav2Vec2Processor.from_pretrained("shniranjan/wav2vec2-large-xlsr-1b-nepali")
    model = Wav2Vec2ForCTC.from_pretrained("shniranjan/wav2vec2-large-xlsr-1b-nepali")
    # load audio
    audio_input, sample_rate = sf.read(wav_file)
    # pad input values and return pt tensor
    input_values = processor(audio_input, sampling_rate=sample_rate, return_tensors="pt").input_values
    # INFERENCE
    # retrieve logits & take argmax
    logits = model(input_values).logits
    predicted_ids = torch.argmax(logits, dim=-1)
    # transcribe
    transcription = processor.decode(predicted_ids[0], skip_special_tokens=True)
    print(transcription)
Downloads last month
46
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.