5roop commited on
Commit
51a86c4
1 Parent(s): 3b55dab

Add readme.

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: hr
3
+ datasets:
4
+ - parlaspeech-hr
5
+ tags:
6
+ - audio
7
+ - automatic-speech-recognition
8
+ - parlaspeech
9
+ widget:
10
+ - example_title: example 1
11
+ src: https://huggingface.co/classla/wav2vec2-xls-r-parlaspeech-hr/raw/main/1800.m4a
12
+ - example_title: example 2
13
+ src: https://huggingface.co/classla/wav2vec2-xls-r-parlaspeech-hr/raw/main/00020578b.flac.wav
14
+ - example_title: example 3
15
+ src: https://huggingface.co/classla/wav2vec2-xls-r-parlaspeech-hr/raw/main/00020570a.flac.wav
16
+ ---
17
+
18
+ # wav2vec2-large-slavic-parlaspeech-hr-lm
19
+
20
+ This model for Croatian ASR is based on the [facebook/wav2vec2-large-slavic-voxpopuli-v2 model](facebook/wav2vec2-large-slavic-voxpopuli-v2) and was fine-tuned with 300 hours of recordings and transcripts from the ASR Croatian parliament dataset [ParlaSpeech-HR v1.0](http://hdl.handle.net/11356/1494) and enhanced with a language model.
21
+
22
+ The efforts resulting in this model were coordinated by Nikola Ljubešić, the rough manual data alignment was performed by Ivo-Pavao Jazbec, the method for fine automatic data alignment from [Plüss et al.](https://arxiv.org/abs/2010.02810) was applied by Vuk Batanović and Lenka Bajčetić, the transcripts were normalised by Danijel Korzinek, while the final modelling was performed by Peter Rupnik.
23
+
24
+ If you use this model, please cite the following paper:
25
+
26
+ Nikola Ljubešić, Danijel Koržinek, Peter Rupnik, Ivo-Pavao Jazbec. ParlaSpeech-HR -- a freely available ASR dataset for Croatian bootstrapped from the ParlaMint corpus. Submitted to ParlaCLARIN@LREC.
27
+
28
+ ## Metrics
29
+
30
+ |split|CER|WER|
31
+ |---|---|---|
32
+ |dev|0.0253|0.0556|
33
+ |test|0.0188|0.0430|
34
+
35
+
36
+ ## Usage in `transformers`
37
+
38
+ ```python
39
+ from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
40
+ import soundfile as sf
41
+ import torch
42
+ import os
43
+ device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
44
+ # load model and tokenizer
45
+ processor = Wav2Vec2Processor.from_pretrained(
46
+ "classla/wav2vec2-large-slavic-parlaspeech-hr")
47
+ model = Wav2Vec2ForCTC.from_pretrained("classla/wav2vec2-large-slavic-parlaspeech-hr")
48
+ # download the example wav files:
49
+ os.system("wget https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr/raw/main/00020570a.flac.wav")
50
+ # read the wav file
51
+ speech, sample_rate = sf.read("00020570a.flac.wav")
52
+ input_values = processor(speech, sampling_rate=sample_rate, return_tensors="pt").input_values.to(device)
53
+ # remove the raw wav file
54
+ os.system("rm 00020570a.flac.wav")
55
+ # retrieve logits
56
+ logits = model.to(device)(input_values).logits
57
+ # take argmax and decode
58
+ predicted_ids = torch.argmax(logits, dim=-1)
59
+ transcription = processor.decode(predicted_ids[0]).lower()
60
+ # transcription: 'veliki broj poslovnih subjekata posluje sa minusom velik dio'
61
+ ```
62
+
63
+
64
+
65
+ ## Training hyperparameters
66
+
67
+ In fine-tuning, the following arguments were used:
68
+
69
+ | arg | value |
70
+ |-------------------------------|-------|
71
+ | `per_device_train_batch_size` | 16 |
72
+ | `gradient_accumulation_steps` | 4 |
73
+ | `num_train_epochs` | 8 |
74
+ | `learning_rate` | 3e-4 |
75
+ | `warmup_steps` | 500 |