Update README.md
Browse files
README.md
CHANGED
@@ -9,39 +9,76 @@ datasets:
|
|
9 |
license: mit
|
10 |
---
|
11 |
|
12 |
-
##
|
13 |
|
14 |
-
### `speechcatcher/
|
15 |
|
16 |
-
This model was trained by bmilde using speechcatcher recipe in [espnet](https://github.com/
|
17 |
|
18 |
-
### Demo: How to use
|
19 |
|
20 |
-
|
21 |
-
if you haven't done that already.
|
22 |
|
23 |
```bash
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
```
|
30 |
|
31 |
-
<!-- Generated by scripts/utils/show_asr_result.sh -->
|
32 |
# RESULTS
|
33 |
-
## Environments
|
34 |
-
- date: `Mon Feb 20 01:09:18 UTC 2023`
|
35 |
-
- python version: `3.10.8 (main, Nov 4 2022, 13:48:29) [GCC 11.2.0]`
|
36 |
-
- espnet version: `espnet 202211`
|
37 |
-
- pytorch version: `pytorch 1.12.1+cu116`
|
38 |
-
- Git hash: `df10e664a3e1a3cbbe8363b1d93e94ad5d8b147f`
|
39 |
-
- Commit date: `Fri Feb 3 13:38:18 2023 +0000`
|
40 |
|
41 |
-
|
42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
## ASR config
|
47 |
|
|
|
9 |
license: mit
|
10 |
---
|
11 |
|
12 |
+
## Speechcatcher ESPnet streaming ASR model XL for German ASR
|
13 |
|
14 |
+
### `speechcatcher/speechcatcher_german_espnet_streaming_transformer_26k_train_size_xl_raw_de_bpe1024`
|
15 |
|
16 |
+
This model was trained by bmilde using speechcatcher recipe in [espnet](https://github.com/speechcatcher-asr/espnet/tree/egs2-speechcatcher-de).
|
17 |
|
18 |
+
### Demo: How to use the model
|
19 |
|
20 |
+
Global installation:
|
|
|
21 |
|
22 |
```bash
|
23 |
+
|
24 |
+
sudo apt-get install portaudio19-dev python3.10-dev ffmpeg
|
25 |
+
# on mac:
|
26 |
+
# brew install portaudio ffmpeg
|
27 |
+
pip3 install git+https://github.com/speechcatcher-asr/speechcatcher
|
28 |
+
speechcatcher -m de_streaming_transformer_xl mediafile.mp4
|
29 |
+
# or with a microphone:
|
30 |
+
speechcatcher -m de_streaming_transformer_xl -l
|
31 |
+
```
|
32 |
+
|
33 |
+
Virtual environment:
|
34 |
+
|
35 |
+
```bash
|
36 |
+
virtualenv -p python3.10 speechcatcher_env
|
37 |
+
source speechcatcher_env/bin/activate
|
38 |
+
pip3 install git+https://github.com/speechcatcher-asr/speechcatcher
|
39 |
+
speechcatcher -m de_streaming_transformer_xl mediafile.mp4
|
40 |
+
# or with a microphone:
|
41 |
+
speechcatcher -m de_streaming_transformer_xl -l
|
42 |
```
|
43 |
|
|
|
44 |
# RESULTS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
+
Tuda-de-raw: 2.76% CER (without LM)
|
47 |
+
|
48 |
+
Tuda-de-raw: 9.65% WER (without LM)
|
49 |
+
|
50 |
+
Note: Tuda-de-raw results are based on raw tuda-de test utterances without the normalization step. It may not be directly comparable to regular tuda-de results.
|
51 |
+
|
52 |
+
# Speechcatcher training
|
53 |
+
|
54 |
+
Speechcatcher models are trained by using Whisper large as a teacher model:
|
55 |
+
|
56 |
+
![Speechcatcher Teacher/student training](https://github.com/speechcatcher-asr/speechcatcher/raw/main/speechcatcher_training.svg)
|
57 |
|
58 |
+
See [speechcatcher-data](https://github.com/speechcatcher-asr/speechcatcher-data) for code and more info on replicating the training process.
|
59 |
+
|
60 |
+
# Sponsors
|
61 |
+
|
62 |
+
Speechcatcher was gracefully funded by
|
63 |
+
|
64 |
+
<a href="https://media-tech-lab.com">Media Tech Lab by Media Lab Bayern</a> (<a href="https://github.com/media-tech-lab">@media-tech-lab</a>)
|
65 |
+
|
66 |
+
<a href="https://media-tech-lab.com">
|
67 |
+
<img src="https://raw.githubusercontent.com/media-tech-lab/.github/main/assets/mtl-powered-by.png" width="240" title="Media Tech Lab powered by logo">
|
68 |
+
</a>
|
69 |
+
|
70 |
+
# Citing
|
71 |
+
|
72 |
+
```BibTex
|
73 |
+
@misc{milde2023speechcatcher,
|
74 |
+
author = {Milde, Benjamin},
|
75 |
+
title = {Speechcatcher},
|
76 |
+
year = {2023},
|
77 |
+
publisher = {GitHub},
|
78 |
+
journal = {GitHub repository},
|
79 |
+
howpublished = {\url{https://github.com/speechcatcher-asr/speechcatcher}},
|
80 |
+
}
|
81 |
+
```
|
82 |
|
83 |
## ASR config
|
84 |
|