speechbrainteam commited on
Commit
8e8e959
1 Parent(s): eca34f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -12
README.md CHANGED
@@ -6,6 +6,7 @@ tags:
6
  - CTC
7
  - Attention
8
  - Transformers
 
9
  - pytorch
10
  license: "apache-2.0"
11
  datasets:
@@ -18,10 +19,10 @@ metrics:
18
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
19
  <br/><br/>
20
 
21
- # Transformer for AISHELL (Mandarin Chinese)
22
 
23
  This repository provides all the necessary tools to perform automatic speech
24
- recognition from an end-to-end system pretrained on AISHELL (Mandarin Chinese)
25
  within SpeechBrain. For a better experience, we encourage you to learn more about
26
  [SpeechBrain](https://speechbrain.github.io).
27
 
@@ -29,7 +30,7 @@ The performance of the model is the following:
29
 
30
  | Release | Dev CER | Test CER | GPUs | Full Results |
31
  |:-------------:|:--------------:|:--------------:|:--------:|:--------:|
32
- | 05-03-21 | 5.60 | 6.04 | 2xV100 32GB | [Google Drive](https://drive.google.com/drive/folders/1zlTBib0XEwWeyhaXDXnkqtPsIBI18Uzs?usp=sharing)|
33
 
34
 
35
 
@@ -38,10 +39,10 @@ The performance of the model is the following:
38
  This ASR system is composed of 2 different but linked blocks:
39
  - Tokenizer (unigram) that transforms words into subword units and trained with
40
  the train transcriptions of LibriSpeech.
41
- - Acoustic model made of a transformer encoder and a joint decoder with CTC +
42
  transformer. Hence, the decoding also incorporates the CTC probabilities.
43
 
44
- To Train this system from scratch, [see our SpeechBrain recipe](https://github.com/speechbrain/speechbrain/tree/develop/recipes/AISHELL-1).
45
 
46
 
47
  ## Install SpeechBrain
@@ -59,17 +60,15 @@ Please notice that we encourage you to read our tutorials and learn more about
59
 
60
  ```python
61
  from speechbrain.pretrained import EncoderDecoderASR
62
-
63
- asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-transformer-aishell", savedir="pretrained_models/asr-transformer-aishell")
64
- asr_model.transcribe_file("speechbrain/asr-transformer-aishell/example_mandarin.wav")
65
-
66
  ```
67
 
68
  ### Inference on GPU
69
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
70
 
71
  ### Training
72
- The model was trained with SpeechBrain (Commit hash: '986a2175').
73
  To train it from scratch follow these steps:
74
  1. Clone SpeechBrain:
75
  ```bash
@@ -85,10 +84,10 @@ pip install -e .
85
  3. Run Training:
86
  ```bash
87
  cd recipes/AISHELL-1/ASR/transformer/
88
- python train.py hparams/train_ASR_transformer.yaml --data_folder=your_data_folder
89
  ```
90
 
91
- You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1QU18YoauzLOXueogspT0CgR5bqJ6zFfu?usp=sharing).
92
 
93
  ### Limitations
94
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
 
6
  - CTC
7
  - Attention
8
  - Transformers
9
+ - wav2vec2
10
  - pytorch
11
  license: "apache-2.0"
12
  datasets:
 
19
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
20
  <br/><br/>
21
 
22
+ # Transformer for AISHELL + wav2vec2 (Mandarin Chinese)
23
 
24
  This repository provides all the necessary tools to perform automatic speech
25
+ recognition from an end-to-end system pretrained on AISHELL +wav2vec2 (Mandarin Chinese)
26
  within SpeechBrain. For a better experience, we encourage you to learn more about
27
  [SpeechBrain](https://speechbrain.github.io).
28
 
 
30
 
31
  | Release | Dev CER | Test CER | GPUs | Full Results |
32
  |:-------------:|:--------------:|:--------------:|:--------:|:--------:|
33
+ | 05-03-21 | 5.19 | 5.58 | 2xV100 32GB | [Google Drive](https://drive.google.com/drive/folders/1zlTBib0XEwWeyhaXDXnkqtPsIBI18Uzs?usp=sharing)|
34
 
35
 
36
 
 
39
  This ASR system is composed of 2 different but linked blocks:
40
  - Tokenizer (unigram) that transforms words into subword units and trained with
41
  the train transcriptions of LibriSpeech.
42
+ - Acoustic model made of a wav2vec2 encoder and a joint decoder with CTC +
43
  transformer. Hence, the decoding also incorporates the CTC probabilities.
44
 
45
+ To Train this system from scratch, [see our SpeechBrain recipe](https://github.com/speechbrain/speechbrain/tree/develop/recipes/AISHELL-1/ASR/transformer).
46
 
47
 
48
  ## Install SpeechBrain
 
60
 
61
  ```python
62
  from speechbrain.pretrained import EncoderDecoderASR
63
+ asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-wav2vec2-transformer-aishell", savedir="pretrained_models/asr-wav2vec2-transformer-aishell")
64
+ asr_model.transcribe_file("speechbrain/asr-wav2vec2-transformer-aishell/example_mandarin.wav")
 
 
65
  ```
66
 
67
  ### Inference on GPU
68
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
69
 
70
  ### Training
71
+ The model was trained with SpeechBrain (Commit hash: '480dde87').
72
  To train it from scratch follow these steps:
73
  1. Clone SpeechBrain:
74
  ```bash
 
84
  3. Run Training:
85
  ```bash
86
  cd recipes/AISHELL-1/ASR/transformer/
87
+ python train.py hparams/train_ASR_transformer_with_wav2vect.yaml --data_folder=your_data_folder
88
  ```
89
 
90
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1P3w5BnwLDxMHFQrkCZ5RYBZ1WsQHKFZr?usp=sharing).
91
 
92
  ### Limitations
93
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.