Edit model card

wav2vec2-large-slavic-parlaspeech-hr-lm

This model for Croatian ASR is based on the facebook/wav2vec2-large-slavic-voxpopuli-v2 model and was fine-tuned with 300 hours of recordings and transcripts from the ASR Croatian parliament dataset ParlaSpeech-HR v1.0 and enhanced with a 5-gram language model based on the ParlaMint dataset.

If you use this model, please cite the following paper:

Nikola Ljubešić, Danijel Koržinek, Peter Rupnik, Ivo-Pavao Jazbec. ParlaSpeech-HR -- a freely available ASR dataset for Croatian bootstrapped from the ParlaMint corpus. http://www.lrec-conf.org/proceedings/lrec2022/workshops/ParlaCLARINIII/pdf/2022.parlaclariniii-1.16.pdf

Metrics

Evaluation is performed on the dev and test portions of the ParlaSpeech-HR v1.0 dataset.

split CER WER
dev 0.0253 0.0556
test 0.0188 0.0430

Usage in transformers

Tested with transformers==4.18.0, torch==1.11.0, and SoundFile==0.10.3.post1.

from transformers import Wav2Vec2ProcessorWithLM, Wav2Vec2ForCTC
import soundfile as sf
import torch
import os
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# load model and tokenizer
processor = Wav2Vec2ProcessorWithLM.from_pretrained(
    "classla/wav2vec2-large-slavic-parlaspeech-hr-lm")
model = Wav2Vec2ForCTC.from_pretrained("classla/wav2vec2-large-slavic-parlaspeech-hr-lm")
# download the example wav files:
os.system("wget https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr-lm/raw/main/00020570a.flac.wav")
# read the wav file 
speech, sample_rate = sf.read("00020570a.flac.wav")
input_values = processor(speech, sampling_rate=sample_rate, return_tensors="pt").input_values.cuda()
inputs = processor(speech, sampling_rate=sample_rate, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
transcription = processor.batch_decode(logits.numpy()).text[0]

# remove the raw wav file
os.system("rm 00020570a.flac.wav")

transcription # 'velik broj poslovnih subjekata poslao je sa minusom velik dio'

Training hyperparameters

In fine-tuning, the following arguments were used:

arg value
per_device_train_batch_size 16
gradient_accumulation_steps 4
num_train_epochs 8
learning_rate 3e-4
warmup_steps 500
Downloads last month
32
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.