nljubesi commited on
Commit
5f334ec
1 Parent(s): 550c088

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -17,16 +17,18 @@ widget:
17
 
18
  # wav2vec2-large-slavic-parlaspeech-hr-lm
19
 
20
- This model for Croatian ASR is based on the [facebook/wav2vec2-large-slavic-voxpopuli-v2 model](facebook/wav2vec2-large-slavic-voxpopuli-v2) and was fine-tuned with 300 hours of recordings and transcripts from the ASR Croatian parliament dataset [ParlaSpeech-HR v1.0](http://hdl.handle.net/11356/1494) and enhanced with a language model.
21
-
22
- The efforts resulting in this model were coordinated by Nikola Ljubešić, the rough manual data alignment was performed by Ivo-Pavao Jazbec, the method for fine automatic data alignment from [Plüss et al.](https://arxiv.org/abs/2010.02810) was applied by Vuk Batanović and Lenka Bajčetić, the transcripts were normalised by Danijel Korzinek, while the final modelling was performed by Peter Rupnik.
23
 
24
  If you use this model, please cite the following paper:
25
 
26
- Nikola Ljubešić, Danijel Koržinek, Peter Rupnik, Ivo-Pavao Jazbec. ParlaSpeech-HR -- a freely available ASR dataset for Croatian bootstrapped from the ParlaMint corpus. Submitted to ParlaCLARIN@LREC.
 
 
27
 
28
  ## Metrics
29
 
 
 
30
  |split|CER|WER|
31
  |---|---|---|
32
  |dev|0.0253|0.0556|
 
17
 
18
  # wav2vec2-large-slavic-parlaspeech-hr-lm
19
 
20
+ This model for Croatian ASR is based on the [facebook/wav2vec2-large-slavic-voxpopuli-v2 model](https://huggingface.co/facebook/wav2vec2-large-slavic-voxpopuli-v2) and was fine-tuned with 300 hours of recordings and transcripts from the ASR Croatian parliament dataset [ParlaSpeech-HR v1.0](http://hdl.handle.net/11356/1494) and enhanced with a 5-gram language model based on the [ParlaMint dataset](http://hdl.handle.net/11356/1432).
 
 
21
 
22
  If you use this model, please cite the following paper:
23
 
24
+ Nikola Ljubešić, Danijel Koržinek, Peter Rupnik, Ivo-Pavao Jazbec. ParlaSpeech-HR -- a freely available ASR dataset for Croatian bootstrapped from the ParlaMint corpus. Accepted at ParlaCLARIN@LREC.
25
+
26
+ There are similarly performing models available, one [that does not use a language model](https://huggingface.co/classla/wav2vec2-slavic-parlaspeech-hr) and [another that is based on the XLS-R model](https://huggingface.co/classla/wav2vec2-xls-r-parlaspeech-hr).
27
 
28
  ## Metrics
29
 
30
+ Evaluation is performed on the dev and test portions of the [ParlaSpeech-HR v1.0](http://hdl.handle.net/11356/1494) dataset.
31
+
32
  |split|CER|WER|
33
  |---|---|---|
34
  |dev|0.0253|0.0556|