Commit
•
0503bb2
1
Parent(s):
a0402ee
Update README.md
Browse files
README.md
CHANGED
@@ -35,19 +35,36 @@ model-index:
|
|
35 |
## Overview
|
36 |
|
37 |
This folder contains a fully trained German speech recognition pipeline
|
38 |
-
consisting of an acoustic model using the new wav2vec 2.0 XLS-R 1B TEVR architecture
|
39 |
and a 5-gram KenLM language model.
|
40 |
For an explanation of the TEVR enhancements and their motivation, please see our paper:
|
41 |
-
TEVR: Improving
|
42 |
-
(Krabbenhöft et al., 2022).
|
43 |
|
44 |
|
45 |
This pipeline scores a very competitive (as of June 2022) **word error rate of 3.64%** on CommonVoice German.
|
46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
To evalue this pipeline yourself and/or on your own data, see the `HF Eval Script.ipynb` Jupyter Notebook
|
48 |
or use the following python script:
|
49 |
|
50 |
-
|
51 |
|
52 |
```python
|
53 |
!pip install --quiet --root-user-action=ignore --upgrade pip
|
|
|
35 |
## Overview
|
36 |
|
37 |
This folder contains a fully trained German speech recognition pipeline
|
38 |
+
consisting of an acoustic model using the new wav2vec 2.0 XLS-R 1B **TEVR** architecture
|
39 |
and a 5-gram KenLM language model.
|
40 |
For an explanation of the TEVR enhancements and their motivation, please see our paper:
|
41 |
+
[TEVR: Improving Speech Recognition by Token Entropy Variance Reduction](https://arxiv.org/abs/2206.12693).
|
|
|
42 |
|
43 |
|
44 |
This pipeline scores a very competitive (as of June 2022) **word error rate of 3.64%** on CommonVoice German.
|
45 |
|
46 |
+
## Citation
|
47 |
+
|
48 |
+
If you use this ASR pipeline for research, please cite:
|
49 |
+
```bibtex
|
50 |
+
@misc{https://doi.org/10.48550/arxiv.2206.12693,
|
51 |
+
doi = {10.48550/ARXIV.2206.12693},
|
52 |
+
url = {https://arxiv.org/abs/2206.12693},
|
53 |
+
author = {Krabbenhöft, Hajo Nils and Barth, Erhardt},
|
54 |
+
keywords = {Computation and Language (cs.CL), Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, F.2.1; I.2.6; I.2.7},
|
55 |
+
title = {TEVR: Improving Speech Recognition by Token Entropy Variance Reduction},
|
56 |
+
publisher = {arXiv},
|
57 |
+
year = {2022},
|
58 |
+
copyright = {Creative Commons Attribution 4.0 International}
|
59 |
+
}
|
60 |
+
```
|
61 |
+
|
62 |
+
## Evaluation
|
63 |
+
|
64 |
To evalue this pipeline yourself and/or on your own data, see the `HF Eval Script.ipynb` Jupyter Notebook
|
65 |
or use the following python script:
|
66 |
|
67 |
+
|
68 |
|
69 |
```python
|
70 |
!pip install --quiet --root-user-action=ignore --upgrade pip
|