milde commited on
Commit
758a005
1 Parent(s): c88533f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -22
README.md CHANGED
@@ -9,39 +9,76 @@ datasets:
9
  license: mit
10
  ---
11
 
12
- ## ESPnet2 ASR model
13
 
14
- ### `speechcatcher/speechcatcher_german_espnet_streaming_transformer_13k_train_size_l_raw_de_bpe1024`
15
 
16
- This model was trained by bmilde using speechcatcher recipe in [espnet](https://github.com/espnet/espnet/).
17
 
18
- ### Demo: How to use in ESPnet2
19
 
20
- Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
- if you haven't done that already.
22
 
23
  ```bash
24
- cd espnet
25
- git checkout df10e664a3e1a3cbbe8363b1d93e94ad5d8b147f
26
- pip install -e .
27
- cd egs2/speechcatcher/asr1
28
- ./run.sh --skip_data_prep false --skip_train true --download_model speechcatcher/speechcatcher_german_espnet_streaming_transformer_13k_train_size_l_raw_de_bpe1024
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ```
30
 
31
- <!-- Generated by scripts/utils/show_asr_result.sh -->
32
  # RESULTS
33
- ## Environments
34
- - date: `Mon Feb 20 01:09:18 UTC 2023`
35
- - python version: `3.10.8 (main, Nov 4 2022, 13:48:29) [GCC 11.2.0]`
36
- - espnet version: `espnet 202211`
37
- - pytorch version: `pytorch 1.12.1+cu116`
38
- - Git hash: `df10e664a3e1a3cbbe8363b1d93e94ad5d8b147f`
39
- - Commit date: `Fri Feb 3 13:38:18 2023 +0000`
40
 
41
- ## asr_train_asr_streaming_transformer_size_l_raw_de_bpe1024
42
- ### WER
 
 
 
 
 
 
 
 
 
43
 
44
- Benchmarks coming soon!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ## ASR config
47
 
 
9
  license: mit
10
  ---
11
 
12
+ ## Speechcatcher ESPnet streaming ASR model XL for German ASR
13
 
14
+ ### `speechcatcher/speechcatcher_german_espnet_streaming_transformer_26k_train_size_xl_raw_de_bpe1024`
15
 
16
+ This model was trained by bmilde using speechcatcher recipe in [espnet](https://github.com/speechcatcher-asr/espnet/tree/egs2-speechcatcher-de).
17
 
18
+ ### Demo: How to use the model
19
 
20
+ Global installation:
 
21
 
22
  ```bash
23
+
24
+ sudo apt-get install portaudio19-dev python3.10-dev ffmpeg
25
+ # on mac:
26
+ # brew install portaudio ffmpeg
27
+ pip3 install git+https://github.com/speechcatcher-asr/speechcatcher
28
+ speechcatcher -m de_streaming_transformer_xl mediafile.mp4
29
+ # or with a microphone:
30
+ speechcatcher -m de_streaming_transformer_xl -l
31
+ ```
32
+
33
+ Virtual environment:
34
+
35
+ ```bash
36
+ virtualenv -p python3.10 speechcatcher_env
37
+ source speechcatcher_env/bin/activate
38
+ pip3 install git+https://github.com/speechcatcher-asr/speechcatcher
39
+ speechcatcher -m de_streaming_transformer_xl mediafile.mp4
40
+ # or with a microphone:
41
+ speechcatcher -m de_streaming_transformer_xl -l
42
  ```
43
 
 
44
  # RESULTS
 
 
 
 
 
 
 
45
 
46
+ Tuda-de-raw: 2.76% CER (without LM)
47
+
48
+ Tuda-de-raw: 9.65% WER (without LM)
49
+
50
+ Note: Tuda-de-raw results are based on raw tuda-de test utterances without the normalization step. It may not be directly comparable to regular tuda-de results.
51
+
52
+ # Speechcatcher training
53
+
54
+ Speechcatcher models are trained by using Whisper large as a teacher model:
55
+
56
+ ![Speechcatcher Teacher/student training](https://github.com/speechcatcher-asr/speechcatcher/raw/main/speechcatcher_training.svg)
57
 
58
+ See [speechcatcher-data](https://github.com/speechcatcher-asr/speechcatcher-data) for code and more info on replicating the training process.
59
+
60
+ # Sponsors
61
+
62
+ Speechcatcher was gracefully funded by
63
+
64
+ <a href="https://media-tech-lab.com">Media Tech Lab by Media Lab Bayern</a> (<a href="https://github.com/media-tech-lab">@media-tech-lab</a>)
65
+
66
+ <a href="https://media-tech-lab.com">
67
+ <img src="https://raw.githubusercontent.com/media-tech-lab/.github/main/assets/mtl-powered-by.png" width="240" title="Media Tech Lab powered by logo">
68
+ </a>
69
+
70
+ # Citing
71
+
72
+ ```BibTex
73
+ @misc{milde2023speechcatcher,
74
+ author = {Milde, Benjamin},
75
+ title = {Speechcatcher},
76
+ year = {2023},
77
+ publisher = {GitHub},
78
+ journal = {GitHub repository},
79
+ howpublished = {\url{https://github.com/speechcatcher-asr/speechcatcher}},
80
+ }
81
+ ```
82
 
83
  ## ASR config
84