speechbrain
/

asr-crdnn-switchboard

Automatic Speech Recognition

speechbrain

PyTorch

English

CTC

Attention

Model card Files Files and versions Community

dwgnr commited on Sep 19, 2022

Commit

a5b3ccf

1 Parent(s): 18beeab

update readme

Browse files

Files changed (1) hide show

README.md +35 -27

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 language:
 - en
-thumbnail:
 tags:
 - automatic-speech-recognition
 - CTC
@@ -13,38 +13,37 @@ datasets:
 - switchboard
 metrics:
 - wer
-- ser
 ---
 <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
 <br/><br/>
-# CRDNN with CTC/Attention trained on Switchboard
-This repository provides all the necessary tools to perform automatic speech
-recognition from an end-to-end system pretrained on Switchboard (EN) within
-SpeechBrain. For a better experience we encourage you to learn more about
-[SpeechBrain](https://speechbrain.github.io).
 The performance of the model is the following:
-| Release  | Swbd SER | Callhome SER | Eval2000 SER | Swbd WER | Callhome WER | Eval2000 WER | GPUs        |
 |:--------:|:--------:|:------------:|:------------:|:--------:|:------------:|:------------:|:-----------:|
-| 17-09-22 |  61.93   |  65.89       |  64.44       |  16.01     |  25.12     |  20.71       | 1xA100 40GB |
 ## Pipeline description
 This ASR system is composed with 2 different but linked blocks:
-- Tokenizer (unigram) that transforms words into subword units and trained with
-the train transcriptions of Switchboard.
 - Acoustic model (CRDNN + CTC/Attention). The CRDNN architecture is made of
 N blocks of convolutional neural networks with normalisation and pooling on the
 frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
 the final acoustic representation that is given to the CTC and attention decoders.
 The system is trained with recordings sampled at 16kHz (single channel).
-The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
 ## Install SpeechBrain
@@ -54,10 +53,10 @@ First of all, please install SpeechBrain with the following command:
 pip install speechbrain
 ```
-Please notice that we encourage you to read our tutorials and learn more about
 [SpeechBrain](https://speechbrain.github.io).
-### Transcribing your own audio files (in English)
 ```python
 from speechbrain.pretrained import EncoderDecoderASR
@@ -67,21 +66,24 @@ asr_model.transcribe_file('path/to/your/audiofile')
 ```
-### Inference on GPU
 To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
 ## Parallel Inference on a Batch
-Please, [see this Colab notebook](https://colab.research.google.com/drive/1hX5ZI9S4jHIjahFCZnhwwQmFoGAi3tmu?usp=sharing) to figure out how to transcribe in parallel a batch of input sentences using a pre-trained model.
-### Training
-The model was trained with SpeechBrain (Commit hash: '2abd9f01').
 To train it from scratch follow these steps:
 1. Clone SpeechBrain:
 ```bash
 git clone https://github.com/speechbrain/speechbrain/
 ```
 2. Install it:
 ```bash
 cd speechbrain
@@ -91,24 +93,30 @@ pip install -e .
 3. Run Training:
 ```bash
-cd recipes/Switchboard/ASR/seq2seq/
-python train.py hparams/train_BPE_1000.yaml --data_folder=your_data_folder
 ```
-You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1SAndjcThdkO-YQF8kvwPOXlQ6LMT71vt?usp=sharing).
-### Limitations
-The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
-# **About SpeechBrain**
 - Website: https://speechbrain.github.io/
-- Code: https://github.com/speechbrain/speechbrain/
 - HuggingFace: https://huggingface.co/speechbrain/
-# **Citing SpeechBrain**
-Please, cite SpeechBrain if you use it for your research or business.
 ```bibtex
 @misc{speechbrain,

 ---
 language:
 - en
+thumbnail: null
 tags:
 - automatic-speech-recognition
 - CTC
 - switchboard
 metrics:
 - wer
+- cer
 ---
 <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
 <br/><br/>
+# CRDNN with CTC/Attention trained on Switchboard (No LM)
+This repository provides all the necessary tools to perform automatic speech recognition from an end-to-end system pretrained on Switchboard (EN) within SpeechBrain.
+For a better experience we encourage you to learn more about [SpeechBrain](https://speechbrain.github.io).
 The performance of the model is the following:
+| Release  | Swbd CER | Callhome CER | Eval2000 CER | Swbd WER | Callhome WER | Eval2000 WER | GPUs        |
 |:--------:|:--------:|:------------:|:------------:|:--------:|:------------:|:------------:|:-----------:|
+| 17-09-22 |  9.89   |  16.30       |  13.17       |  16.01     |  25.12     |  20.71       | 1xA100 40GB |
 ## Pipeline description
 This ASR system is composed with 2 different but linked blocks:
+- Tokenizer (unigram) that transforms words into subword units trained on
+the training transcriptions of the Switchboard and Fisher corpus.
 - Acoustic model (CRDNN + CTC/Attention). The CRDNN architecture is made of
 N blocks of convolutional neural networks with normalisation and pooling on the
 frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
 the final acoustic representation that is given to the CTC and attention decoders.
 The system is trained with recordings sampled at 16kHz (single channel).
+The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling `transcribe_file` if needed.
 ## Install SpeechBrain
 pip install speechbrain
 ```
+Note that we encourage you to read our tutorials and learn more about
 [SpeechBrain](https://speechbrain.github.io).
+## Transcribing Your Own Audio Files
 ```python
 from speechbrain.pretrained import EncoderDecoderASR
 ```
+## Inference on GPU
 To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
 ## Parallel Inference on a Batch
+Please, [see this Colab notebook](https://colab.research.google.com/drive/1hX5ZI9S4jHIjahFCZnhwwQmFoGAi3tmu?usp=sharing) to figure out how to transcribe in parallel a batch of input sentences using a pre-trained model.
+## Training
+The model was trained with SpeechBrain (commit hash: `70904d0`).
 To train it from scratch follow these steps:
 1. Clone SpeechBrain:
 ```bash
 git clone https://github.com/speechbrain/speechbrain/
 ```
 2. Install it:
 ```bash
 cd speechbrain
 3. Run Training:
 ```bash
+cd recipes/Switchboard/ASR/seq2seq
+python train.py hparams/train_BPE_2000.yaml --data_folder=your_data_folder
 ```
+## Limitations
+The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
+## Credits
+This model was trained with resources provided by the [THN Center for AI](https://www.th-nuernberg.de/en/kiz).
+# About SpeechBrain
+SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly.
+Competitive or state-of-the-art performance is obtained in various domains.
 - Website: https://speechbrain.github.io/
+- GitHub: https://github.com/speechbrain/speechbrain/
 - HuggingFace: https://huggingface.co/speechbrain/
+# Citing SpeechBrain
+Please cite SpeechBrain if you use it for your research or business.
 ```bibtex
 @misc{speechbrain,