pklumpp commited on
Commit
a099aaa
1 Parent(s): ef2dd4d

Updated Readme

Browse files
Files changed (1) hide show
  1. README.md +45 -1
README.md CHANGED
@@ -13,4 +13,48 @@ tags:
13
  - International Phonetic Alphabet
14
  - CTC
15
  - multilingual
16
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - International Phonetic Alphabet
14
  - CTC
15
  - multilingual
16
+ ---
17
+ # Model Card for Wav2Vec2 Large with Common Phone
18
+
19
+ This is a multilingual phone recognition model optimized with the [Common Phone](https://zenodo.org/records/5846137) dataset.
20
+ It was created in the scope of the PhD thesis of [Philipp Klumpp](https://scholar.google.com/citations?user=IWvgno4AAAAJ) to analyze pathological speech signals.
21
+
22
+ ## Model Details
23
+
24
+ Wav2Vec2 model with linear projection to CTC blank token + 101 phone symbols from the International Phonetic Alphabet (IPA).
25
+ The model uses 16 kHz audio to predict the most probable sequence of uttered IPA phones.
26
+
27
+ ### Model Description
28
+
29
+ This model was created to analyze pathological speech signals. It was optimized with Common Phone, a multilingual corpus for robust acoustic modelling. It comprises more than 11.000 speakers which were carefully selected from Mozilla's Common Voice dataset.
30
+ Results in terms of phone error rate (PER) in percent:
31
+
32
+ | Language | Test PER |
33
+ |:---:|:---:|
34
+ | English | 11.0 |
35
+ | French | 9.9 |
36
+ | German | 9.8 |
37
+ | Italian | 9.1 |
38
+ | Russian | 6.6 |
39
+ | Spanish | 8.8 |
40
+ | **Average** | **9.2** |
41
+
42
+ - **Developed by:** [Philipp Klumpp](https://scholar.google.com/citations?user=IWvgno4AAAAJ)
43
+ - **Model type:** [Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)
44
+ - **Languages:** Multilingual (English, French, German, Italian, Russian, Spanish)
45
+ - **License:** [Creative Commons Zero 1.0 (CC0)](https://creativecommons.org/publicdomain/zero/1.0/deed.en)
46
+ - **Finetuned from model:** [Wav2Vec2 XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
47
+ - **Finetuning dataset:** [Common Phone](https://zenodo.org/records/5846137) as published in [**Common Phone: A Multilingual Dataset for Robust Acoustic Modelling**](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.81.pdf)
48
+
49
+ ### Model Sources [optional]
50
+
51
+ <!-- Provide the basic links for the model. -->
52
+
53
+ - **Repository:** [GitHub](https://github.com/PKlumpp/phd_model)
54
+ - **Paper:** The final print of the thesis will be linked here.
55
+
56
+ ## Contact
57
+
58
+ [Philipp Klumpp](mailto:philipp-klumpp@live.de)
59
+
60
+