File size: 5,018 Bytes
c4c0def
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
language: "en"
tags:
- icefall
- k2
- transducer
- aishell
- ASR
- stateless transducer
- PyTorch
license: "apache-2.0"
datasets:
- aishell
metrics:
- WER
---

# Introduction

This repo contains pre-trained model using
<https://github.com/k2-fsa/icefall/pull/219>.

It is trained on [AIShell](https://www.openslr.org/33/) dataset
using modified transducer from [optimized_transducer](https://github.com/csukuangfj/optimized_transducer).

## How to clone this repo
```
sudo apt-get install git-lfs
git clone https://huggingface.co/csukuangfj/icefall-aishell-transducer-stateless-modified-2022-03-01

cd icefall-aishell-transducer-stateless-modified-2022-03-01
git lfs pull
```

**Catuion**: You have to run `git lfs pull`. Otherwise, you will be SAD later.

The model in this repo is trained using the commit `TODO`.

You can use

```
git clone https://github.com/k2-fsa/icefall
cd icefall
git checkout TODO
```
to download `icefall`.

You can find the model information by visiting <https://github.com/k2-fsa/icefall/blob/TODO/egs/aishell/ASR/transducer_stateless_modified/train.py#L232>.


In short, the encoder is a Conformer model with 8 heads, 12 encoder layers, 512-dim attention, 2048-dim feedforward;
the decoder contains a 512-dim embedding layer and a Conv1d with kernel size 2.

The decoder architecture is modified from
[Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419).
A Conv1d layer is placed right after the input embedding layer.

-----

## Description

This repo provides pre-trained transducer Conformer model for the AIShell dataset
using [icefall][icefall]. There are no RNNs in the decoder. The decoder is stateless
and contains only an embedding layer and a Conv1d.

The commands for training are:

```bash
cd egs/aishell/ASR
./prepare.sh --stop-stage 6

export CUDA_VISIBLE_DEVICES="0,1,2"

./transducer_stateless_modified/train.py \
  --world-size 3 \
  --num-epochs 90 \
  --start-epoch 0 \
  --exp-dir transducer_stateless_modified/exp-4 \
  --max-duration 250 \
  --lr-factor 2.0 \
  --context-size 2 \
  --modified-transducer-prob 0.25
```

The tensorboard training log can be found at
<https://tensorboard.dev/experiment/C27M8YxRQCa1t2XglTqlWg>

The commands for decoding are

```bash
# greedy search
for epoch in 64; do
  for avg in 33; do
  ./transducer_stateless_modified-2/decode.py \
    --epoch $epoch \
    --avg $avg \
    --exp-dir transducer_stateless_modified/exp-4 \
    --max-duration 100 \
    --context-size 2 \
    --decoding-method greedy_search \
    --max-sym-per-frame 1
  done
done

# modified beam search
for epoch in 64; do
  for avg in 33; do
    ./transducer_stateless_modified/decode.py \
    --epoch $epoch \
    --avg $avg \
    --exp-dir transducer_stateless_modified/exp-4 \
    --max-duration 100 \
    --context-size 2 \
    --decoding-method modified_beam_search \
    --beam-size 4
  done
done
```

You can find the decoding log for the above command in this
repo (in the folder [log][log]).

The WER for the test dataset is

|                        | test |comment                                                         |
|------------------------|------|----------------------------------------------------------------|
| greedy search          | 5.22 |--epoch 64, --avg 33, --max-duration 100, --max-sym-per-frame 1 |
| modified beam search   | 5.02 |--epoch 64, --avg 33, --max-duration 100  --beam-size 4         |

# File description

- [log][log], this directory contains the decoding log and decoding results
- [test_wavs][test_wavs], this directory contains wave files for testing the pre-trained model
- [data][data], this directory contains files generated by [prepare.sh][prepare]
- [exp][exp], this directory contains only one file: `preprained.pt`

`exp/pretrained.pt` is generated by the following command:

```bash
epoch=64
avg=33

./transducer_stateless_modified/export.py \
  --exp-dir ./transducer_stateless_modified/exp-4 \
  --lang-dir ./data/lang_char \
  --epoch $epoch \
  --avg $avg
```

**HINT**: To use `pretrained.pt` to compute the WER for the `test` dataset,
just do the following:

```bash
cp icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt \
  /path/to/icefall/egs/aishell/ASR/transducer_stateless_modified/exp/epoch-999.pt
```
and pass `--epoch 999 --avg 1` to `transducer_stateless_modified/decode.py`.


[icefall]: https://github.com/k2-fsa/icefall
[prepare]: https://github.com/k2-fsa/icefall/blob/master/egs/aishell/ASR/prepare.sh
[exp]: https://huggingface.co/csukuangfj/icefall-aishell-transducer-stateless-modified-2022-03-01/tree/main/exp
[data]: https://huggingface.co/csukuangfj/icefall-aishell-transducer-stateless-modified-2022-03-01/tree/main/data
[test_wavs]: https://huggingface.co/csukuangfj/icefall-aishell-transducer-stateless-modified-2022-03-01/tree/main/test_wavs
[log]: https://huggingface.co/csukuangfj/icefall-aishell-transducer-stateless-modified-2022-03-01/tree/main/log
[icefall]: https://github.com/k2-fsa/icefall