speechcatcher
/

speechcatcher_german_espnet_streaming_transformer_13k_train_size_l_raw_de_bpe1024

Automatic Speech Recognition

ESPnet

German

audio

Model card Files Files and versions Community

milde commited on Apr 18, 2023

Commit

758a005

•

1 Parent(s): c88533f

Update README.md

Browse files

Files changed (1) hide show

README.md +59 -22

README.md CHANGED Viewed

@@ -9,39 +9,76 @@ datasets:
 license: mit
 ---
-## ESPnet2 ASR model
-### `speechcatcher/speechcatcher_german_espnet_streaming_transformer_13k_train_size_l_raw_de_bpe1024`
-This model was trained by bmilde using speechcatcher recipe in [espnet](https://github.com/espnet/espnet/).
-### Demo: How to use in ESPnet2
-Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
-if you haven't done that already.
 ```bash
-cd espnet
-git checkout df10e664a3e1a3cbbe8363b1d93e94ad5d8b147f
-pip install -e .
-cd egs2/speechcatcher/asr1
-./run.sh --skip_data_prep false --skip_train true --download_model speechcatcher/speechcatcher_german_espnet_streaming_transformer_13k_train_size_l_raw_de_bpe1024
 ```
-<!-- Generated by scripts/utils/show_asr_result.sh -->
 # RESULTS
-## Environments
-- date: `Mon Feb 20 01:09:18 UTC 2023`
-- python version: `3.10.8 (main, Nov  4 2022, 13:48:29) [GCC 11.2.0]`
-- espnet version: `espnet 202211`
-- pytorch version: `pytorch 1.12.1+cu116`
-- Git hash: `df10e664a3e1a3cbbe8363b1d93e94ad5d8b147f`
-  - Commit date: `Fri Feb 3 13:38:18 2023 +0000`
-## asr_train_asr_streaming_transformer_size_l_raw_de_bpe1024
-### WER
-Benchmarks coming soon!
 ## ASR config

 license: mit
 ---
+## Speechcatcher ESPnet streaming ASR model XL for German ASR
+### `speechcatcher/speechcatcher_german_espnet_streaming_transformer_26k_train_size_xl_raw_de_bpe1024`
+This model was trained by bmilde using speechcatcher recipe in [espnet](https://github.com/speechcatcher-asr/espnet/tree/egs2-speechcatcher-de).
+### Demo: How to use the model
+Global installation:
 ```bash
+sudo apt-get install portaudio19-dev python3.10-dev ffmpeg
+# on mac:
+# brew install portaudio ffmpeg
+pip3 install git+https://github.com/speechcatcher-asr/speechcatcher
+speechcatcher -m de_streaming_transformer_xl mediafile.mp4
+# or with a microphone:
+speechcatcher -m de_streaming_transformer_xl -l
+```
+Virtual environment:
+```bash
+virtualenv -p python3.10 speechcatcher_env
+source speechcatcher_env/bin/activate
+pip3 install git+https://github.com/speechcatcher-asr/speechcatcher
+speechcatcher -m de_streaming_transformer_xl mediafile.mp4
+# or with a microphone:
+speechcatcher -m de_streaming_transformer_xl -l
 ```
 # RESULTS
+Tuda-de-raw: 2.76% CER (without LM)
+Tuda-de-raw: 9.65% WER (without LM)
+Note: Tuda-de-raw results are based on raw tuda-de test utterances without the normalization step. It may not be directly comparable to regular tuda-de results.
+# Speechcatcher training
+Speechcatcher models are trained by using Whisper large as a teacher model:
+![Speechcatcher Teacher/student training](https://github.com/speechcatcher-asr/speechcatcher/raw/main/speechcatcher_training.svg)
+See [speechcatcher-data](https://github.com/speechcatcher-asr/speechcatcher-data) for code and more info on replicating the training process.
+# Sponsors
+Speechcatcher was gracefully funded by
+<a href="https://media-tech-lab.com">Media Tech Lab by Media Lab Bayern</a> (<a href="https://github.com/media-tech-lab">@media-tech-lab</a>)
+<a href="https://media-tech-lab.com">
+    <img src="https://raw.githubusercontent.com/media-tech-lab/.github/main/assets/mtl-powered-by.png" width="240" title="Media Tech Lab powered by logo">
+</a>
+# Citing
+```BibTex
+@misc{milde2023speechcatcher,
+  author = {Milde, Benjamin},
+  title = {Speechcatcher},
+  year = {2023},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/speechcatcher-asr/speechcatcher}},
+}
+```
 ## ASR config