sdelangen commited on
Commit
db437df
1 Parent(s): 320f41b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -80,7 +80,7 @@ With streaming, the results with different chunk sizes on test-clean are the fol
80
  This ASR system is a Conformer model trained with the RNN-T loss (with an auxiliary CTC loss to stabilize training). The model operates with a unigram tokenizer.
81
  Architecture details are described in the [training hyperparameters file](https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml).
82
 
83
- Streaming support makes use of Dynamic Chunk Training. Chunked attention is used for the multi-head attention module, and an implementation of [Dynamic Chunk Convolutions](https://www.amazon.science/publications/dynamic-chunk-convolution-for-unified-streaming-and-non-streaming-conformer-asr) were used.
84
  The model was trained with support for different chunk sizes (and even full context), and so is suitable for various chunk sizes and offline transcription.
85
 
86
  The system is trained with recordings sampled at 16kHz (single channel).
 
80
  This ASR system is a Conformer model trained with the RNN-T loss (with an auxiliary CTC loss to stabilize training). The model operates with a unigram tokenizer.
81
  Architecture details are described in the [training hyperparameters file](https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml).
82
 
83
+ Streaming support makes use of Dynamic Chunk Training. Chunked attention is used for the multi-head attention module, and an implementation of [Dynamic Chunk Convolutions](https://www.amazon.science/publications/dynamic-chunk-convolution-for-unified-streaming-and-non-streaming-conformer-asr) was used.
84
  The model was trained with support for different chunk sizes (and even full context), and so is suitable for various chunk sizes and offline transcription.
85
 
86
  The system is trained with recordings sampled at 16kHz (single channel).