5roop commited on
Commit
c41d4cd
·
1 Parent(s): 246b570

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: hr
3
+ datasets:
4
+ - parlaspeech-hr
5
+ tags:
6
+ - audio
7
+ - automatic-speech-recognition
8
+ - parlaspeech
9
+ widget:
10
+ - example_title: example 1
11
+ src: https://huggingface.co/classla/wav2vec2-xls-r-parlaspeech-hr/raw/main/1800.m4a
12
+ - example_title: example 2
13
+ src: https://huggingface.co/classla/wav2vec2-xls-r-parlaspeech-hr/raw/main/00020578b.flac.wav
14
+
15
+ ---
16
+
17
+ # wav2vec2-xls-r-parlaspeech-hr
18
+
19
+ This model for Croatian ASR is based on the [facebook/wav2vec2-large-slavic-voxpopuli-v2 model](facebook/wav2vec2-large-slavic-voxpopuli-v2) and was fine-tuned with 300 hours of recordings and transcripts from the ASR Croatian parliament dataset [ParlaSpeech-HR v1.0](http://hdl.handle.net/11356/1494).
20
+
21
+ The efforts resulting in this model were coordinated by Nikola Ljubešić, the rough manual data alignment was performed by Ivo-Pavao Jazbec, the method for fine automatic data alignment from [Plüss et al.](https://arxiv.org/abs/2010.02810) was applied by Vuk Batanović and Lenka Bajčetić, the transcripts were normalised by Danijel Korzinek, while the final modelling was performed by Peter Rupnik.
22
+
23
+ If you use this model, please cite the following paper:
24
+
25
+ Nikola Ljubešić, Danijel Koržinek, Peter Rupnik, Ivo-Pavao Jazbec. ParlaSpeech-HR -- a freely available ASR dataset for Croatian bootstrapped from the ParlaMint corpus. Submitted to ParlaCLARIN@LREC.
26
+
27
+ ## Metrics
28
+
29
+ |split|CER|WER|
30
+ |---|---|---|
31
+ |dev|0.0311|0.0921|
32
+ |test|0.0222|0.0679|
33
+
34
+ ## Usage in `transformers`
35
+
36
+ ```python
37
+ from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
38
+ import soundfile as sf
39
+ import torch
40
+ import os
41
+ device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
42
+ # load model and tokenizer
43
+ processor = Wav2Vec2Processor.from_pretrained(
44
+ "classla/wav2vec2-large-slavic-parlaspeech-hr")
45
+ model = Wav2Vec2ForCTC.from_pretrained("classla/wav2vec2-large-slavic-parlaspeech-hr")
46
+ # download the example wav files:
47
+ os.system("wget https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr/raw/main/00020570a.flac.wav")
48
+ # read the wav file
49
+ speech, sample_rate = sf.read("00020570a.flac.wav")
50
+ input_values = processor(speech, sampling_rate=sample_rate, return_tensors="pt").input_values.to(device)
51
+ # remove the raw wav file
52
+ os.system("rm 00020570a.flac.wav")
53
+ # retrieve logits
54
+ logits = model.to(device)(input_values).logits
55
+ # take argmax and decode
56
+ predicted_ids = torch.argmax(logits, dim=-1)
57
+ transcription = processor.decode(predicted_ids[0]).lower()
58
+ # transcription: 'veliki broj poslovnih subjekata posluje sa minusom velik dio'
59
+ ```
60
+
61
+
62
+
63
+ ## Training hyperparameters
64
+
65
+ In fine-tuning, the following arguments were used:
66
+
67
+ | arg | value |
68
+ |-------------------------------|-------|
69
+ | `per_device_train_batch_size` | 16 |
70
+ | `gradient_accumulation_steps` | 4 |
71
+ | `num_train_epochs` | 8 |
72
+ | `learning_rate` | 3e-4 |
73
+ | `warmup_steps` | 500 |