File size: 3,196 Bytes
664aefc
 
564731a
 
 
 
 
 
664aefc
564731a
 
3774ffb
564731a
 
 
 
 
3774ffb
 
 
 
564731a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3774ffb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b882505
3774ffb
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
license: apache-2.0
datasets:
- tedlium3
language:
- en
metrics:
- wer
---
### TedLium3 Zipformer

**`rnnt_type=regular`**

The WERs are

|                                    |     dev    |    test    | comment                                  |
|------------------------------------|------------|------------|------------------------------------------|
|          greedy search             | 6.74       | 6.16       | --epoch 50, --avg 22, --max-duration 500 |
|      beam search (beam size 4)     | 6.56       | 5.95       | --epoch 50, --avg 22, --max-duration 500 |
| modified beam search (beam size 4) | 6.54       | 6.00       | --epoch 50, --avg 22, --max-duration 500 |
| fast beam search (set as default)  | 6.91       | 6.28       | --epoch 50, --avg 22, --max-duration 500 |

The training command for reproducing is given below:

```
export CUDA_VISIBLE_DEVICES="0,1,2,3"

./zipformer/train.py \
  --use-fp16 true \
  --world-size 4 \
  --num-epochs 50 \
  --start-epoch 0 \
  --exp-dir zipformer/exp \
  --max-duration 1000
```

The tensorboard training log can be found at
https://tensorboard.dev/experiment/AKXbJha0S9aXyfmuvG4h5A/#scalars

The decoding command is:
```
epoch=50
avg=22

## greedy search
./zipformer/decode.py \
  --epoch $epoch \
  --avg $avg \
  --exp-dir zipformer/exp \
  --bpe-model ./data/lang_bpe_500/bpe.model \
  --max-duration 500

## beam search
./zipformer/decode.py \
  --epoch $epoch \
  --avg $avg \
  --exp-dir zipformer/exp \
  --bpe-model ./data/lang_bpe_500/bpe.model \
  --max-duration 500 \
  --decoding-method beam_search \
  --beam-size 4

## modified beam search
./zipformer/decode.py \
  --epoch $epoch \
  --avg $avg \
  --exp-dir zipformer/exp \
  --bpe-model ./data/lang_bpe_500/bpe.model \
  --max-duration 500 \
  --decoding-method modified_beam_search \
  --beam-size 4

## fast beam search
./zipformer/decode.py \
  --epoch $epoch \
  --avg $avg \
  --exp-dir ./zipformer/exp \
  --bpe-model ./data/lang_bpe_500/bpe.model \
  --max-duration 1500 \
  --decoding-method fast_beam_search \
  --beam 4 \
  --max-contexts 4 \
  --max-states 8
```

**`rnnt_type=modified`**

Using the codes from this PR https://github.com/k2-fsa/icefall/pull/1125.

The WERs are

|                                    |     dev    |    test    | comment                                  |
|------------------------------------|------------|------------|------------------------------------------|
|          greedy search             | 6.32       | 5.83       | --epoch 50, --avg 22, --max-duration 500 |
| modified beam search (beam size 4) | 6.16       | 5.79       | --epoch 50, --avg 22, --max-duration 500 |
| fast beam search (set as default)  | 6.30       | 5.89       | --epoch 50, --avg 22, --max-duration 500 |

The training command for reproducing is given below:

```
export CUDA_VISIBLE_DEVICES="0,1,2,3"

./zipformer/train.py \
  --use-fp16 true \
  --world-size 4 \
  --num-epochs 50 \
  --start-epoch 0 \
  --exp-dir zipformer/exp \
  --max-duration 1000 \
  --rnnt-type modified
```

The tensorboard training log can be found at
https://tensorboard.dev/experiment/3d4bYmbJTGiWQQaW88CVEQ/#scalars

The decoding commands are same as above.