w11wo commited on
Commit
9a835b2
1 Parent(s): b82f66a

Added Model

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +142 -0
  2. data/lang_phone/L.pt +3 -0
  3. data/lang_phone/L_disambig.pt +3 -0
  4. data/lang_phone/Linv.pt +3 -0
  5. data/lang_phone/lexicon.txt +32 -0
  6. data/lang_phone/lexicon_disambig.txt +32 -0
  7. data/lang_phone/tokens.txt +34 -0
  8. data/lang_phone/words.txt +36 -0
  9. exp/cpu_jit.pt +3 -0
  10. exp/decoder_jit_trace-pnnx.pt +3 -0
  11. exp/decoder_jit_trace.pt +3 -0
  12. exp/encoder_jit_trace-pnnx.pt +3 -0
  13. exp/encoder_jit_trace.pt +3 -0
  14. exp/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
  15. exp/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
  16. exp/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
  17. exp/fast_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model-2023-06-21-09-40-15 +45 -0
  18. exp/fast_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
  19. exp/fast_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
  20. exp/fast_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
  21. exp/fast_beam_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
  22. exp/fast_beam_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
  23. exp/fast_beam_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
  24. exp/greedy_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
  25. exp/greedy_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
  26. exp/greedy_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
  27. exp/greedy_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model-2023-06-21-09-39-14 +39 -0
  28. exp/greedy_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
  29. exp/greedy_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
  30. exp/greedy_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
  31. exp/greedy_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
  32. exp/greedy_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
  33. exp/greedy_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
  34. exp/joiner_jit_trace-pnnx.pt +3 -0
  35. exp/joiner_jit_trace.pt +3 -0
  36. exp/modified_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
  37. exp/modified_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
  38. exp/modified_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
  39. exp/modified_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model-2023-06-21-09-41-35 +55 -0
  40. exp/modified_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
  41. exp/modified_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
  42. exp/modified_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
  43. exp/modified_beam_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
  44. exp/modified_beam_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
  45. exp/modified_beam_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
  46. exp/pretrained.pt +3 -0
  47. exp/streaming/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
  48. exp/streaming/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
  49. exp/streaming/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
  50. exp/streaming/fast_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model-2023-06-21-10-04-38 +136 -0
README.md CHANGED
@@ -1,3 +1,145 @@
1
  ---
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: id
3
  license: apache-2.0
4
+ tags:
5
+ - icefall
6
+ - phoneme-recognition
7
+ - automatic-speech-recognition
8
+ datasets:
9
+ - mozilla-foundation/common_voice_13_0
10
+ - indonesian-nlp/librivox-indonesia
11
+ - google/fleurs
12
  ---
13
+
14
+ # Pruned Stateless Zipformer RNN-T Streaming ID
15
+
16
+ Pruned Stateless Zipformer RNN-T Streaming ID is an automatic speech recognition model trained on the following datasets:
17
+
18
+ - [Common Voice ID](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0)
19
+ - [LibriVox Indonesia](https://huggingface.co/datasets/indonesian-nlp/librivox-indonesia)
20
+ - [FLEURS ID](https://huggingface.co/datasets/google/fleurs)
21
+
22
+ Instead of being trained to predict sequences of words, this model was trained to predict sequence of phonemes, e.g. `['p', 'ə', 'r', 'b', 'u', 'a', 't', 'a', 'n', 'ɲ', 'a']`. Therefore, the model's [vocabulary](https://huggingface.co/bookbot/pruned-transducer-stateless7-streaming-id/blob/main/data/lang_phone/tokens.txt) contains the different IPA phonemes found in [g2p ID](https://github.com/bookbot-kids/g2p_id).
23
+
24
+ This model was trained using [icefall](https://github.com/k2-fsa/icefall) framework. All training was done on a Google Cloud Engine VM with a Tesla A100 GPU. All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/bookbot/pruned-transducer-stateless7-streaming-id/tree/main) tab, as well as the [Training metrics](https://huggingface.co/bookbot/pruned-transducer-stateless7-streaming-id/tensorboard) logged via Tensorboard.
25
+
26
+ ## Evaluation Results
27
+
28
+ ### Simulated Streaming
29
+
30
+ ```sh
31
+ for m in greedy_search fast_beam_search modified_beam_search; do
32
+ ./pruned_transducer_stateless7_streaming/decode.py \
33
+ --epoch 30 \
34
+ --avg 9 \
35
+ --exp-dir ./pruned_transducer_stateless7_streaming/exp \
36
+ --max-duration 600 \
37
+ --decode-chunk-len 32 \
38
+ --decoding-method $m
39
+ done
40
+ ```
41
+
42
+ The model achieves the following phoneme error rates on the different test sets:
43
+
44
+ | Decoding | LibriVox | FLEURS | Common Voice |
45
+ | -------------------- | :------: | :----: | :----------: |
46
+ | Greedy Search | 4.87% | 11.45% | 14.97% |
47
+ | Modified Beam Search | 4.71% | 11.25% | 14.31% |
48
+ | Fast Beam Search | 4.85% | 12.55% | 14.89% |
49
+
50
+ ### Chunk-wise Streaming
51
+
52
+ ```sh
53
+ for m in greedy_search fast_beam_search modified_beam_search; do
54
+ ./pruned_transducer_stateless7_streaming/streaming_decode.py \
55
+ --epoch 30 \
56
+ --avg 9 \
57
+ --exp-dir ./pruned_transducer_stateless7_streaming/exp \
58
+ --decoding-method $m \
59
+ --decode-chunk-len 32 \
60
+ --num-decode-streams 1500
61
+ done
62
+ ```
63
+
64
+ The model achieves the following phoneme error rates on the different test sets:
65
+
66
+ | Decoding | LibriVox | FLEURS | Common Voice |
67
+ | -------------------- | :------: | :----: | :----------: |
68
+ | Greedy Search | 5.12% | 12.74% | 15.78% |
69
+ | Modified Beam Search | 4.78% | 11.83% | 14.54% |
70
+ | Fast Beam Search | 4.81% | 12.93% | 14.96% |
71
+
72
+ ## Usage
73
+
74
+ ### Download Pre-trained Model
75
+
76
+ ```sh
77
+ cd egs/bookbot/ASR
78
+ mkdir tmp
79
+ cd tmp
80
+ git lfs install
81
+ git clone https://huggingface.co/bookbot/pruned-transducer-stateless7-streaming-id
82
+ ```
83
+
84
+ ### Inference
85
+
86
+ To decode with greedy search, run:
87
+
88
+ ```sh
89
+ ./pruned_transducer_stateless7_streaming/jit_pretrained.py \
90
+ --nn-model-filename ./tmp/pruned-transducer-stateless7-streaming-id/exp/cpu_jit.pt \
91
+ --lang-dir ./tmp/pruned-transducer-stateless7-streaming-id/data/lang_phone \
92
+ ./tmp/pruned-transducer-stateless7-streaming-id/test_waves/sample1.wav
93
+ ```
94
+
95
+ <details>
96
+ <summary>Decoding Output</summary>
97
+
98
+ ```
99
+ 2023-06-21 10:19:18,563 INFO [jit_pretrained.py:217] device: cpu
100
+ 2023-06-21 10:19:19,231 INFO [lexicon.py:168] Loading pre-compiled tmp/pruned-transducer-stateless7-streaming-id/data/lang_phone/Linv.pt
101
+ 2023-06-21 10:19:19,232 INFO [jit_pretrained.py:228] Constructing Fbank computer
102
+ 2023-06-21 10:19:19,233 INFO [jit_pretrained.py:238] Reading sound files: ['./tmp/pruned-transducer-stateless7-streaming-id/test_waves/sample1.wav']
103
+ 2023-06-21 10:19:19,234 INFO [jit_pretrained.py:244] Decoding started
104
+ 2023-06-21 10:19:20,090 INFO [jit_pretrained.py:271]
105
+ ./tmp/pruned-transducer-stateless7-streaming-id/test_waves/sample1.wav:
106
+ p u l a ŋ | s ə k o l a h | p i t ə r i | s a ŋ a t | l a p a r
107
+
108
+
109
+ 2023-06-21 10:19:20,090 INFO [jit_pretrained.py:273] Decoding Done
110
+ ```
111
+
112
+ </details>
113
+
114
+ ## Training procedure
115
+
116
+ ### Install icefall
117
+
118
+ ```sh
119
+ git clone https://github.com/bookbot-hive/icefall
120
+ cd icefall
121
+ export PYTHONPATH=`pwd`:$PYTHONPATH
122
+ ```
123
+
124
+ ### Prepare Data
125
+
126
+ ```sh
127
+ cd egs/bookbot_id/ASR
128
+ ./prepare.sh
129
+ ```
130
+
131
+ ### Train
132
+
133
+ ```sh
134
+ export CUDA_VISIBLE_DEVICES="0"
135
+ ./pruned_transducer_stateless7_streaming/train.py \
136
+ --num-epochs 30 \
137
+ --use-fp16 1 \
138
+ --max-duration 400
139
+ ```
140
+
141
+ ## Frameworks
142
+
143
+ - [k2](https://github.com/k2-fsa/k2)
144
+ - [icefall](https://github.com/bookbot-hive/icefall)
145
+ - [lhotse](https://github.com/bookbot-hive/lhotse)
data/lang_phone/L.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e67299c15c8faa128dd7317d652619b51f28b431cec64fd3b8338daf9762fc4
3
+ size 1551
data/lang_phone/L_disambig.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42d1a58e242b3f7799fffda803fa17ada3112ae71be2556665c910051d25a7d7
3
+ size 1715
data/lang_phone/Linv.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88935261f84d15c344a6adc9ac289b6d58acd18085a6900d5e5124866b5dc0ee
3
+ size 1627
data/lang_phone/lexicon.txt ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ a a
2
+ b b
3
+ d d
4
+ dʒ dʒ
5
+ e e
6
+ f f
7
+ h h
8
+ i i
9
+ j j
10
+ k k
11
+ l l
12
+ m m
13
+ n n
14
+ o o
15
+ p p
16
+ r r
17
+ s s
18
+ t t
19
+ tʃ tʃ
20
+ u u
21
+ v v
22
+ w w
23
+ x x
24
+ z z
25
+ | |
26
+ ŋ ŋ
27
+ ə ə
28
+ ɡ ɡ
29
+ ɲ ɲ
30
+ ʃ ʃ
31
+ ʔ ʔ
32
+ <UNK> <UNK>
data/lang_phone/lexicon_disambig.txt ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ a a
2
+ b b
3
+ d d
4
+ dʒ dʒ
5
+ e e
6
+ f f
7
+ h h
8
+ i i
9
+ j j
10
+ k k
11
+ l l
12
+ m m
13
+ n n
14
+ o o
15
+ p p
16
+ r r
17
+ s s
18
+ t t
19
+ tʃ tʃ
20
+ u u
21
+ v v
22
+ w w
23
+ x x
24
+ z z
25
+ | |
26
+ ŋ ŋ
27
+ ə ə
28
+ ɡ ɡ
29
+ ɲ ɲ
30
+ ʃ ʃ
31
+ ʔ ʔ
32
+ <UNK> <UNK>
data/lang_phone/tokens.txt ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <eps> 0
2
+ ɡ 1
3
+ o 2
4
+ d 3
5
+ ʃ 4
6
+ v 5
7
+ t 6
8
+ <UNK> 7
9
+ x 8
10
+ r 9
11
+ ʔ 10
12
+ b 11
13
+ s 12
14
+ p 13
15
+ i 14
16
+ dʒ 15
17
+ | 16
18
+ ə 17
19
+ z 18
20
+ f 19
21
+ n 20
22
+ m 21
23
+ ɲ 22
24
+ tʃ 23
25
+ ŋ 24
26
+ k 25
27
+ j 26
28
+ l 27
29
+ h 28
30
+ w 29
31
+ a 30
32
+ u 31
33
+ e 32
34
+ #0 33
data/lang_phone/words.txt ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <eps> 0
2
+ <UNK> 1
3
+ a 2
4
+ b 3
5
+ d 4
6
+ dʒ 5
7
+ e 6
8
+ f 7
9
+ h 8
10
+ i 9
11
+ j 10
12
+ k 11
13
+ l 12
14
+ m 13
15
+ n 14
16
+ o 15
17
+ p 16
18
+ r 17
19
+ s 18
20
+ t 19
21
+ tʃ 20
22
+ u 21
23
+ v 22
24
+ w 23
25
+ x 24
26
+ z 25
27
+ | 26
28
+ ŋ 27
29
+ ə 28
30
+ ɡ 29
31
+ ɲ 30
32
+ ʃ 31
33
+ ʔ 32
34
+ #0 33
35
+ <s> 34
36
+ </s> 35
exp/cpu_jit.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1584f55881aead89f3bdd8d7dab007479a61e5cbf4eff83a4b95a68eba2b9160
3
+ size 354961726
exp/decoder_jit_trace-pnnx.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94e7e3bb9002ab8808c9d194a0cea7bb8bf1526f6ca0d8dcf9dcfd52229e4709
3
+ size 89773
exp/decoder_jit_trace.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a924947cdac6dd4d74cea0d5976637ed57c01950c543ba77f9417d3e5f35e23
3
+ size 89590
exp/encoder_jit_trace-pnnx.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f3e371e7c9fdfb44343e037fbfe7e4e1404a3d8e421ac17ddacbb58e3983a9d
3
+ size 278155657
exp/encoder_jit_trace.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d25b36b251e67544a850505a1655f3e26e1f309e43bc51f5ee10a7c510125ed7
3
+ size 354193226
exp/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/fast_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model-2023-06-21-09-40-15 ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-06-21 09:40:15,150 INFO [decode.py:654] Decoding started
2
+ 2023-06-21 09:40:15,151 INFO [decode.py:660] Device: cuda:0
3
+ 2023-06-21 09:40:15,152 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
4
+ 2023-06-21 09:40:15,155 INFO [decode.py:668] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9426c9f730820d291f5dcb06be337662595fa7b4', 'k2-git-date': 'Sun Feb 5 17:35:01 2023', 'lhotse-version': '1.15.0.dev+git.00d3e36.clean', 'torch-version': '1.13.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'd3f5d01-dirty', 'icefall-git-date': 'Wed May 31 04:15:45 2023', 'icefall-path': '/root/icefall', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/root/lhotse/lhotse/__init__.py', 'hostname': 'bookbot-k2', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 9, 'use_averaged_model': True, 'exp_dir': PosixPath('pruned_transducer_stateless7_streaming/exp'), 'lang_dir': 'data/lang_phone', 'decoding_method': 'fast_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('pruned_transducer_stateless7_streaming/exp/fast_beam_search'), 'suffix': 'epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model', 'blank_id': 0, 'unk_id': 7, 'vocab_size': 33}
5
+ 2023-06-21 09:40:15,155 INFO [decode.py:670] About to create model
6
+ 2023-06-21 09:40:15,733 INFO [zipformer.py:405] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
7
+ 2023-06-21 09:40:15,737 INFO [decode.py:741] Calculating the averaged model over epoch range from 21 (excluded) to 30
8
+ 2023-06-21 09:40:19,291 INFO [decode.py:774] Number of model parameters: 69471350
9
+ 2023-06-21 09:40:19,291 INFO [multidataset.py:122] About to get LibriVox test cuts
10
+ 2023-06-21 09:40:19,291 INFO [multidataset.py:124] Loading LibriVox in lazy mode
11
+ 2023-06-21 09:40:19,292 INFO [multidataset.py:133] About to get FLEURS test cuts
12
+ 2023-06-21 09:40:19,292 INFO [multidataset.py:135] Loading FLEURS in lazy mode
13
+ 2023-06-21 09:40:19,292 INFO [multidataset.py:144] About to get Common Voice test cuts
14
+ 2023-06-21 09:40:19,292 INFO [multidataset.py:146] Loading Common Voice in lazy mode
15
+ 2023-06-21 09:40:22,208 INFO [decode.py:565] batch 0/?, cuts processed until now is 44
16
+ 2023-06-21 09:40:28,732 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/fast_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
17
+ 2023-06-21 09:40:28,779 INFO [utils.py:561] [test-librivox-beam_20.0_max_contexts_8_max_states_64] %WER 4.85% [1773 / 36594, 295 ins, 904 del, 574 sub ]
18
+ 2023-06-21 09:40:28,860 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
19
+ 2023-06-21 09:40:28,860 INFO [decode.py:604]
20
+ For test-librivox, WER of different settings are:
21
+ beam_20.0_max_contexts_8_max_states_64 4.85 best for test-librivox
22
+
23
+ 2023-06-21 09:40:30,839 INFO [decode.py:565] batch 0/?, cuts processed until now is 38
24
+ 2023-06-21 09:41:00,055 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/fast_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
25
+ 2023-06-21 09:41:00,146 INFO [utils.py:561] [test-fleurs-beam_20.0_max_contexts_8_max_states_64] %WER 12.55% [11748 / 93580, 1672 ins, 5414 del, 4662 sub ]
26
+ 2023-06-21 09:41:00,362 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
27
+ 2023-06-21 09:41:00,362 INFO [decode.py:604]
28
+ For test-fleurs, WER of different settings are:
29
+ beam_20.0_max_contexts_8_max_states_64 12.55 best for test-fleurs
30
+
31
+ 2023-06-21 09:41:01,414 INFO [zipformer.py:2441] attn_weights_entropy = tensor([1.1632, 1.0353, 1.2741, 0.9735, 1.1847, 1.2830, 1.1450, 1.0967],
32
+ device='cuda:0'), covar=tensor([0.0547, 0.0601, 0.0483, 0.0755, 0.0373, 0.0368, 0.0490, 0.0569],
33
+ device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0019, 0.0019, 0.0021, 0.0018, 0.0017, 0.0019, 0.0019],
34
+ device='cuda:0'), out_proj_covar=tensor([1.3702e-05, 1.4294e-05, 1.3432e-05, 1.4389e-05, 1.2265e-05, 1.4168e-05,
35
+ 1.2323e-05, 1.3747e-05], device='cuda:0')
36
+ 2023-06-21 09:41:02,049 INFO [decode.py:565] batch 0/?, cuts processed until now is 121
37
+ 2023-06-21 09:41:22,562 INFO [decode.py:565] batch 20/?, cuts processed until now is 2809
38
+ 2023-06-21 09:41:31,340 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/fast_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
39
+ 2023-06-21 09:41:31,464 INFO [utils.py:561] [test-commonvoice-beam_20.0_max_contexts_8_max_states_64] %WER 14.89% [19770 / 132787, 2851 ins, 9210 del, 7709 sub ]
40
+ 2023-06-21 09:41:31,757 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
41
+ 2023-06-21 09:41:31,757 INFO [decode.py:604]
42
+ For test-commonvoice, WER of different settings are:
43
+ beam_20.0_max_contexts_8_max_states_64 14.89 best for test-commonvoice
44
+
45
+ 2023-06-21 09:41:31,758 INFO [decode.py:809] Done!
exp/fast_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/fast_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/fast_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/fast_beam_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ beam_20.0_max_contexts_8_max_states_64 14.89
exp/fast_beam_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ beam_20.0_max_contexts_8_max_states_64 12.55
exp/fast_beam_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ beam_20.0_max_contexts_8_max_states_64 4.85
exp/greedy_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/greedy_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/greedy_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/greedy_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model-2023-06-21-09-39-14 ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-06-21 09:39:14,130 INFO [decode.py:654] Decoding started
2
+ 2023-06-21 09:39:14,130 INFO [decode.py:660] Device: cuda:0
3
+ 2023-06-21 09:39:14,131 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
4
+ 2023-06-21 09:39:14,134 INFO [decode.py:668] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9426c9f730820d291f5dcb06be337662595fa7b4', 'k2-git-date': 'Sun Feb 5 17:35:01 2023', 'lhotse-version': '1.15.0.dev+git.00d3e36.clean', 'torch-version': '1.13.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'd3f5d01-dirty', 'icefall-git-date': 'Wed May 31 04:15:45 2023', 'icefall-path': '/root/icefall', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/root/lhotse/lhotse/__init__.py', 'hostname': 'bookbot-k2', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 9, 'use_averaged_model': True, 'exp_dir': PosixPath('pruned_transducer_stateless7_streaming/exp'), 'lang_dir': 'data/lang_phone', 'decoding_method': 'greedy_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('pruned_transducer_stateless7_streaming/exp/greedy_search'), 'suffix': 'epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model', 'blank_id': 0, 'unk_id': 7, 'vocab_size': 33}
5
+ 2023-06-21 09:39:14,135 INFO [decode.py:670] About to create model
6
+ 2023-06-21 09:39:14,915 INFO [zipformer.py:405] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
7
+ 2023-06-21 09:39:14,921 INFO [decode.py:741] Calculating the averaged model over epoch range from 21 (excluded) to 30
8
+ 2023-06-21 09:39:20,667 INFO [decode.py:774] Number of model parameters: 69471350
9
+ 2023-06-21 09:39:20,668 INFO [multidataset.py:122] About to get LibriVox test cuts
10
+ 2023-06-21 09:39:20,668 INFO [multidataset.py:124] Loading LibriVox in lazy mode
11
+ 2023-06-21 09:39:20,671 INFO [multidataset.py:133] About to get FLEURS test cuts
12
+ 2023-06-21 09:39:20,671 INFO [multidataset.py:135] Loading FLEURS in lazy mode
13
+ 2023-06-21 09:39:20,673 INFO [multidataset.py:144] About to get Common Voice test cuts
14
+ 2023-06-21 09:39:20,673 INFO [multidataset.py:146] Loading Common Voice in lazy mode
15
+ 2023-06-21 09:39:24,965 INFO [decode.py:565] batch 0/?, cuts processed until now is 44
16
+ 2023-06-21 09:39:29,616 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/greedy_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
17
+ 2023-06-21 09:39:29,662 INFO [utils.py:561] [test-librivox-greedy_search] %WER 4.87% [1783 / 36594, 317 ins, 868 del, 598 sub ]
18
+ 2023-06-21 09:39:29,742 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/greedy_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
19
+ 2023-06-21 09:39:29,742 INFO [decode.py:604]
20
+ For test-librivox, WER of different settings are:
21
+ greedy_search 4.87 best for test-librivox
22
+
23
+ 2023-06-21 09:39:31,511 INFO [decode.py:565] batch 0/?, cuts processed until now is 38
24
+ 2023-06-21 09:39:50,011 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/greedy_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
25
+ 2023-06-21 09:39:50,138 INFO [utils.py:561] [test-fleurs-greedy_search] %WER 11.45% [10718 / 93580, 1850 ins, 3733 del, 5135 sub ]
26
+ 2023-06-21 09:39:50,453 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/greedy_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
27
+ 2023-06-21 09:39:50,453 INFO [decode.py:604]
28
+ For test-fleurs, WER of different settings are:
29
+ greedy_search 11.45 best for test-fleurs
30
+
31
+ 2023-06-21 09:39:52,522 INFO [decode.py:565] batch 0/?, cuts processed until now is 121
32
+ 2023-06-21 09:40:11,369 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/greedy_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
33
+ 2023-06-21 09:40:11,489 INFO [utils.py:561] [test-commonvoice-greedy_search] %WER 14.97% [19873 / 132787, 3792 ins, 7589 del, 8492 sub ]
34
+ 2023-06-21 09:40:11,787 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/greedy_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
35
+ 2023-06-21 09:40:11,788 INFO [decode.py:604]
36
+ For test-commonvoice, WER of different settings are:
37
+ greedy_search 14.97 best for test-commonvoice
38
+
39
+ 2023-06-21 09:40:11,788 INFO [decode.py:809] Done!
exp/greedy_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/greedy_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/greedy_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/greedy_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ greedy_search 14.97
exp/greedy_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ greedy_search 11.45
exp/greedy_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ greedy_search 4.87
exp/joiner_jit_trace-pnnx.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2772b338d03c7ebea5247337cf50fffb91a7950c351622a320ad4fc38b393ec
3
+ size 1914564
exp/joiner_jit_trace.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1ffb093a638ecdd5a015aff5d9c6ae62a7dddc815e18dd46ca19a46976367ce
3
+ size 1914479
exp/modified_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/modified_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/modified_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/modified_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model-2023-06-21-09-41-35 ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-06-21 09:41:35,276 INFO [decode.py:654] Decoding started
2
+ 2023-06-21 09:41:35,276 INFO [decode.py:660] Device: cuda:0
3
+ 2023-06-21 09:41:35,277 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
4
+ 2023-06-21 09:41:35,280 INFO [decode.py:668] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9426c9f730820d291f5dcb06be337662595fa7b4', 'k2-git-date': 'Sun Feb 5 17:35:01 2023', 'lhotse-version': '1.15.0.dev+git.00d3e36.clean', 'torch-version': '1.13.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'd3f5d01-dirty', 'icefall-git-date': 'Wed May 31 04:15:45 2023', 'icefall-path': '/root/icefall', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/root/lhotse/lhotse/__init__.py', 'hostname': 'bookbot-k2', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 9, 'use_averaged_model': True, 'exp_dir': PosixPath('pruned_transducer_stateless7_streaming/exp'), 'lang_dir': 'data/lang_phone', 'decoding_method': 'modified_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('pruned_transducer_stateless7_streaming/exp/modified_beam_search'), 'suffix': 'epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model', 'blank_id': 0, 'unk_id': 7, 'vocab_size': 33}
5
+ 2023-06-21 09:41:35,281 INFO [decode.py:670] About to create model
6
+ 2023-06-21 09:41:35,838 INFO [zipformer.py:405] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
7
+ 2023-06-21 09:41:35,843 INFO [decode.py:741] Calculating the averaged model over epoch range from 21 (excluded) to 30
8
+ 2023-06-21 09:41:39,380 INFO [decode.py:774] Number of model parameters: 69471350
9
+ 2023-06-21 09:41:39,380 INFO [multidataset.py:122] About to get LibriVox test cuts
10
+ 2023-06-21 09:41:39,380 INFO [multidataset.py:124] Loading LibriVox in lazy mode
11
+ 2023-06-21 09:41:39,381 INFO [multidataset.py:133] About to get FLEURS test cuts
12
+ 2023-06-21 09:41:39,381 INFO [multidataset.py:135] Loading FLEURS in lazy mode
13
+ 2023-06-21 09:41:39,381 INFO [multidataset.py:144] About to get Common Voice test cuts
14
+ 2023-06-21 09:41:39,381 INFO [multidataset.py:146] Loading Common Voice in lazy mode
15
+ 2023-06-21 09:41:43,886 INFO [decode.py:565] batch 0/?, cuts processed until now is 44
16
+ 2023-06-21 09:41:46,269 INFO [zipformer.py:2441] attn_weights_entropy = tensor([1.3801, 1.7156, 1.0930, 1.5632, 1.3604, 1.3437, 1.7393, 0.6970],
17
+ device='cuda:0'), covar=tensor([0.4497, 0.2012, 0.2669, 0.2689, 0.2707, 0.2909, 0.1440, 0.5122],
18
+ device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0053, 0.0059, 0.0067, 0.0065, 0.0064, 0.0051, 0.0077],
19
+ device='cuda:0'), out_proj_covar=tensor([5.5637e-05, 3.5992e-05, 4.1115e-05, 4.8266e-05, 4.8700e-05, 4.4501e-05,
20
+ 3.4417e-05, 7.3250e-05], device='cuda:0')
21
+ 2023-06-21 09:42:00,403 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/modified_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
22
+ 2023-06-21 09:42:00,449 INFO [utils.py:561] [test-librivox-beam_size_4] %WER 4.71% [1725 / 36594, 309 ins, 836 del, 580 sub ]
23
+ 2023-06-21 09:42:00,531 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/modified_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
24
+ 2023-06-21 09:42:00,531 INFO [decode.py:604]
25
+ For test-librivox, WER of different settings are:
26
+ beam_size_4 4.71 best for test-librivox
27
+
28
+ 2023-06-21 09:42:01,464 INFO [zipformer.py:2441] attn_weights_entropy = tensor([2.1911, 1.2934, 2.0949, 2.2245, 2.1813, 2.1569, 1.7841, 1.7188],
29
+ device='cuda:0'), covar=tensor([0.1696, 0.4060, 0.1661, 0.1975, 0.1970, 0.2132, 0.1748, 0.3224],
30
+ device='cuda:0'), in_proj_covar=tensor([0.0029, 0.0040, 0.0028, 0.0028, 0.0029, 0.0030, 0.0027, 0.0034],
31
+ device='cuda:0'), out_proj_covar=tensor([1.8266e-05, 3.2097e-05, 1.7461e-05, 1.6755e-05, 1.8651e-05, 1.9838e-05,
32
+ 1.5794e-05, 2.3433e-05], device='cuda:0')
33
+ 2023-06-21 09:42:04,999 INFO [decode.py:565] batch 0/?, cuts processed until now is 38
34
+ 2023-06-21 09:43:09,460 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/modified_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
35
+ 2023-06-21 09:43:09,552 INFO [utils.py:561] [test-fleurs-beam_size_4] %WER 11.25% [10525 / 93580, 1811 ins, 3811 del, 4903 sub ]
36
+ 2023-06-21 09:43:09,853 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/modified_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
37
+ 2023-06-21 09:43:09,853 INFO [decode.py:604]
38
+ For test-fleurs, WER of different settings are:
39
+ beam_size_4 11.25 best for test-fleurs
40
+
41
+ 2023-06-21 09:43:14,023 INFO [decode.py:565] batch 0/?, cuts processed until now is 121
42
+ 2023-06-21 09:43:47,394 INFO [zipformer.py:2441] attn_weights_entropy = tensor([2.5738, 2.5492, 3.0284, 2.4510, 1.3782, 3.0004, 2.8027, 1.4081],
43
+ device='cuda:0'), covar=tensor([0.1153, 0.1301, 0.0459, 0.0990, 0.4808, 0.0525, 0.0757, 0.4425],
44
+ device='cuda:0'), in_proj_covar=tensor([0.0071, 0.0071, 0.0055, 0.0070, 0.0106, 0.0057, 0.0058, 0.0105],
45
+ device='cuda:0'), out_proj_covar=tensor([5.9638e-05, 6.0235e-05, 4.2007e-05, 5.4275e-05, 1.0845e-04, 4.2491e-05,
46
+ 4.5487e-05, 9.8369e-05], device='cuda:0')
47
+ 2023-06-21 09:44:30,935 INFO [decode.py:565] batch 20/?, cuts processed until now is 2809
48
+ 2023-06-21 09:44:57,467 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/modified_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
49
+ 2023-06-21 09:44:57,589 INFO [utils.py:561] [test-commonvoice-beam_size_4] %WER 14.31% [19002 / 132787, 3318 ins, 7575 del, 8109 sub ]
50
+ 2023-06-21 09:44:57,887 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/modified_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
51
+ 2023-06-21 09:44:57,888 INFO [decode.py:604]
52
+ For test-commonvoice, WER of different settings are:
53
+ beam_size_4 14.31 best for test-commonvoice
54
+
55
+ 2023-06-21 09:44:57,888 INFO [decode.py:809] Done!
exp/modified_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/modified_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/modified_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/modified_beam_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ beam_size_4 14.31
exp/modified_beam_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ beam_size_4 11.25
exp/modified_beam_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ beam_size_4 4.71
exp/pretrained.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d7fb8734cd4c8edd2c360ad93343bbbb755b3195eb27e2871e37dc7be6293a4f
3
+ size 278176561
exp/streaming/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/streaming/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/streaming/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp/streaming/fast_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model-2023-06-21-10-04-38 ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-06-21 10:04:38,023 INFO [streaming_decode.py:483] Decoding started
2
+ 2023-06-21 10:04:38,023 INFO [streaming_decode.py:489] Device: cuda:0
3
+ 2023-06-21 10:04:38,024 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
4
+ 2023-06-21 10:04:38,027 INFO [streaming_decode.py:497] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9426c9f730820d291f5dcb06be337662595fa7b4', 'k2-git-date': 'Sun Feb 5 17:35:01 2023', 'lhotse-version': '1.15.0.dev+git.00d3e36.clean', 'torch-version': '1.13.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'd3f5d01-dirty', 'icefall-git-date': 'Wed May 31 04:15:45 2023', 'icefall-path': '/root/icefall', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/root/lhotse/lhotse/__init__.py', 'hostname': 'bookbot-k2', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 9, 'use_averaged_model': True, 'exp_dir': PosixPath('pruned_transducer_stateless7_streaming/exp'), 'lang_dir': 'data/lang_phone', 'decoding_method': 'fast_beam_search', 'num_active_paths': 4, 'beam': 4, 'max_contexts': 4, 'max_states': 32, 'context_size': 2, 'num_decode_streams': 1500, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search'), 'suffix': 'epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model', 'blank_id': 0, 'unk_id': 7, 'vocab_size': 33}
5
+ 2023-06-21 10:04:38,027 INFO [streaming_decode.py:499] About to create model
6
+ 2023-06-21 10:04:38,604 INFO [zipformer.py:405] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
7
+ 2023-06-21 10:04:38,608 INFO [streaming_decode.py:566] Calculating the averaged model over epoch range from 21 (excluded) to 30
8
+ 2023-06-21 10:04:42,203 INFO [streaming_decode.py:588] Number of model parameters: 69471350
9
+ 2023-06-21 10:04:42,204 INFO [multidataset.py:122] About to get LibriVox test cuts
10
+ 2023-06-21 10:04:42,204 INFO [multidataset.py:124] Loading LibriVox in lazy mode
11
+ 2023-06-21 10:04:42,204 INFO [multidataset.py:133] About to get FLEURS test cuts
12
+ 2023-06-21 10:04:42,204 INFO [multidataset.py:135] Loading FLEURS in lazy mode
13
+ 2023-06-21 10:04:42,205 INFO [multidataset.py:144] About to get Common Voice test cuts
14
+ 2023-06-21 10:04:42,205 INFO [multidataset.py:146] Loading Common Voice in lazy mode
15
+ 2023-06-21 10:04:42,471 INFO [streaming_decode.py:380] Cuts processed until now is 0.
16
+ 2023-06-21 10:04:42,786 INFO [streaming_decode.py:380] Cuts processed until now is 50.
17
+ 2023-06-21 10:04:43,098 INFO [streaming_decode.py:380] Cuts processed until now is 100.
18
+ 2023-06-21 10:04:43,444 INFO [streaming_decode.py:380] Cuts processed until now is 150.
19
+ 2023-06-21 10:04:43,770 INFO [streaming_decode.py:380] Cuts processed until now is 200.
20
+ 2023-06-21 10:04:44,092 INFO [streaming_decode.py:380] Cuts processed until now is 250.
21
+ 2023-06-21 10:04:44,416 INFO [streaming_decode.py:380] Cuts processed until now is 300.
22
+ 2023-06-21 10:04:44,756 INFO [streaming_decode.py:380] Cuts processed until now is 350.
23
+ 2023-06-21 10:04:45,079 INFO [streaming_decode.py:380] Cuts processed until now is 400.
24
+ 2023-06-21 10:04:45,405 INFO [streaming_decode.py:380] Cuts processed until now is 450.
25
+ 2023-06-21 10:04:45,734 INFO [streaming_decode.py:380] Cuts processed until now is 500.
26
+ 2023-06-21 10:04:46,071 INFO [streaming_decode.py:380] Cuts processed until now is 550.
27
+ 2023-06-21 10:04:46,405 INFO [streaming_decode.py:380] Cuts processed until now is 600.
28
+ 2023-06-21 10:04:57,029 INFO [streaming_decode.py:425] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
29
+ 2023-06-21 10:04:57,063 INFO [utils.py:561] [test-librivox-beam_4_max_contexts_4_max_states_32] %WER 4.81% [1759 / 36594, 280 ins, 892 del, 587 sub ]
30
+ 2023-06-21 10:04:57,144 INFO [streaming_decode.py:436] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
31
+ 2023-06-21 10:04:57,145 INFO [streaming_decode.py:450]
32
+ For test-librivox, WER of different settings are:
33
+ beam_4_max_contexts_4_max_states_32 4.81 best for test-librivox
34
+
35
+ 2023-06-21 10:04:57,149 INFO [streaming_decode.py:380] Cuts processed until now is 0.
36
+ 2023-06-21 10:04:57,332 INFO [streaming_decode.py:380] Cuts processed until now is 50.
37
+ 2023-06-21 10:04:57,494 INFO [streaming_decode.py:380] Cuts processed until now is 100.
38
+ 2023-06-21 10:04:57,663 INFO [streaming_decode.py:380] Cuts processed until now is 150.
39
+ 2023-06-21 10:04:57,833 INFO [streaming_decode.py:380] Cuts processed until now is 200.
40
+ 2023-06-21 10:04:58,000 INFO [streaming_decode.py:380] Cuts processed until now is 250.
41
+ 2023-06-21 10:04:58,161 INFO [streaming_decode.py:380] Cuts processed until now is 300.
42
+ 2023-06-21 10:04:58,323 INFO [streaming_decode.py:380] Cuts processed until now is 350.
43
+ 2023-06-21 10:04:58,488 INFO [streaming_decode.py:380] Cuts processed until now is 400.
44
+ 2023-06-21 10:04:58,656 INFO [streaming_decode.py:380] Cuts processed until now is 450.
45
+ 2023-06-21 10:04:58,819 INFO [streaming_decode.py:380] Cuts processed until now is 500.
46
+ 2023-06-21 10:04:58,993 INFO [streaming_decode.py:380] Cuts processed until now is 550.
47
+ 2023-06-21 10:04:59,176 INFO [streaming_decode.py:380] Cuts processed until now is 600.
48
+ 2023-06-21 10:04:59,364 INFO [streaming_decode.py:380] Cuts processed until now is 650.
49
+ 2023-06-21 10:05:34,495 INFO [streaming_decode.py:425] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
50
+ 2023-06-21 10:05:34,590 INFO [utils.py:561] [test-fleurs-beam_4_max_contexts_4_max_states_32] %WER 12.93% [12100 / 93580, 1706 ins, 5594 del, 4800 sub ]
51
+ 2023-06-21 10:05:34,813 INFO [streaming_decode.py:436] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
52
+ 2023-06-21 10:05:34,814 INFO [streaming_decode.py:450]
53
+ For test-fleurs, WER of different settings are:
54
+ beam_4_max_contexts_4_max_states_32 12.93 best for test-fleurs
55
+
56
+ 2023-06-21 10:05:34,820 INFO [streaming_decode.py:380] Cuts processed until now is 0.
57
+ 2023-06-21 10:05:35,059 INFO [streaming_decode.py:380] Cuts processed until now is 50.
58
+ 2023-06-21 10:05:35,308 INFO [streaming_decode.py:380] Cuts processed until now is 100.
59
+ 2023-06-21 10:05:35,583 INFO [streaming_decode.py:380] Cuts processed until now is 150.
60
+ 2023-06-21 10:05:35,829 INFO [streaming_decode.py:380] Cuts processed until now is 200.
61
+ 2023-06-21 10:05:36,082 INFO [streaming_decode.py:380] Cuts processed until now is 250.
62
+ 2023-06-21 10:05:36,315 INFO [streaming_decode.py:380] Cuts processed until now is 300.
63
+ 2023-06-21 10:05:36,537 INFO [streaming_decode.py:380] Cuts processed until now is 350.
64
+ 2023-06-21 10:05:36,797 INFO [streaming_decode.py:380] Cuts processed until now is 400.
65
+ 2023-06-21 10:05:37,028 INFO [streaming_decode.py:380] Cuts processed until now is 450.
66
+ 2023-06-21 10:05:37,263 INFO [streaming_decode.py:380] Cuts processed until now is 500.
67
+ 2023-06-21 10:05:37,499 INFO [streaming_decode.py:380] Cuts processed until now is 550.
68
+ 2023-06-21 10:05:37,720 INFO [streaming_decode.py:380] Cuts processed until now is 600.
69
+ 2023-06-21 10:05:37,959 INFO [streaming_decode.py:380] Cuts processed until now is 650.
70
+ 2023-06-21 10:05:38,182 INFO [streaming_decode.py:380] Cuts processed until now is 700.
71
+ 2023-06-21 10:05:38,406 INFO [streaming_decode.py:380] Cuts processed until now is 750.
72
+ 2023-06-21 10:05:38,664 INFO [streaming_decode.py:380] Cuts processed until now is 800.
73
+ 2023-06-21 10:05:38,913 INFO [streaming_decode.py:380] Cuts processed until now is 850.
74
+ 2023-06-21 10:05:39,251 INFO [streaming_decode.py:380] Cuts processed until now is 900.
75
+ 2023-06-21 10:05:39,493 INFO [streaming_decode.py:380] Cuts processed until now is 950.
76
+ 2023-06-21 10:05:39,726 INFO [streaming_decode.py:380] Cuts processed until now is 1000.
77
+ 2023-06-21 10:05:39,959 INFO [streaming_decode.py:380] Cuts processed until now is 1050.
78
+ 2023-06-21 10:05:40,192 INFO [streaming_decode.py:380] Cuts processed until now is 1100.
79
+ 2023-06-21 10:05:40,436 INFO [streaming_decode.py:380] Cuts processed until now is 1150.
80
+ 2023-06-21 10:05:40,709 INFO [streaming_decode.py:380] Cuts processed until now is 1200.
81
+ 2023-06-21 10:05:40,959 INFO [streaming_decode.py:380] Cuts processed until now is 1250.
82
+ 2023-06-21 10:05:41,199 INFO [streaming_decode.py:380] Cuts processed until now is 1300.
83
+ 2023-06-21 10:05:41,448 INFO [streaming_decode.py:380] Cuts processed until now is 1350.
84
+ 2023-06-21 10:05:41,697 INFO [streaming_decode.py:380] Cuts processed until now is 1400.
85
+ 2023-06-21 10:05:41,938 INFO [streaming_decode.py:380] Cuts processed until now is 1450.
86
+ 2023-06-21 10:05:51,050 INFO [streaming_decode.py:380] Cuts processed until now is 1500.
87
+ 2023-06-21 10:05:53,941 INFO [streaming_decode.py:380] Cuts processed until now is 1550.
88
+ 2023-06-21 10:05:55,569 INFO [streaming_decode.py:380] Cuts processed until now is 1600.
89
+ 2023-06-21 10:05:55,799 INFO [streaming_decode.py:380] Cuts processed until now is 1650.
90
+ 2023-06-21 10:05:57,493 INFO [streaming_decode.py:380] Cuts processed until now is 1700.
91
+ 2023-06-21 10:05:57,735 INFO [streaming_decode.py:380] Cuts processed until now is 1750.
92
+ 2023-06-21 10:05:57,961 INFO [streaming_decode.py:380] Cuts processed until now is 1800.
93
+ 2023-06-21 10:05:59,694 INFO [streaming_decode.py:380] Cuts processed until now is 1850.
94
+ 2023-06-21 10:05:59,923 INFO [streaming_decode.py:380] Cuts processed until now is 1900.
95
+ 2023-06-21 10:06:00,151 INFO [streaming_decode.py:380] Cuts processed until now is 1950.
96
+ 2023-06-21 10:06:01,771 INFO [streaming_decode.py:380] Cuts processed until now is 2000.
97
+ 2023-06-21 10:06:01,997 INFO [streaming_decode.py:380] Cuts processed until now is 2050.
98
+ 2023-06-21 10:06:02,241 INFO [streaming_decode.py:380] Cuts processed until now is 2100.
99
+ 2023-06-21 10:06:02,465 INFO [streaming_decode.py:380] Cuts processed until now is 2150.
100
+ 2023-06-21 10:06:04,249 INFO [streaming_decode.py:380] Cuts processed until now is 2200.
101
+ 2023-06-21 10:06:04,478 INFO [streaming_decode.py:380] Cuts processed until now is 2250.
102
+ 2023-06-21 10:06:04,710 INFO [streaming_decode.py:380] Cuts processed until now is 2300.
103
+ 2023-06-21 10:06:06,461 INFO [streaming_decode.py:380] Cuts processed until now is 2350.
104
+ 2023-06-21 10:06:06,697 INFO [streaming_decode.py:380] Cuts processed until now is 2400.
105
+ 2023-06-21 10:06:06,931 INFO [streaming_decode.py:380] Cuts processed until now is 2450.
106
+ 2023-06-21 10:06:08,726 INFO [streaming_decode.py:380] Cuts processed until now is 2500.
107
+ 2023-06-21 10:06:08,950 INFO [streaming_decode.py:380] Cuts processed until now is 2550.
108
+ 2023-06-21 10:06:09,187 INFO [streaming_decode.py:380] Cuts processed until now is 2600.
109
+ 2023-06-21 10:06:10,940 INFO [streaming_decode.py:380] Cuts processed until now is 2650.
110
+ 2023-06-21 10:06:11,165 INFO [streaming_decode.py:380] Cuts processed until now is 2700.
111
+ 2023-06-21 10:06:12,942 INFO [streaming_decode.py:380] Cuts processed until now is 2750.
112
+ 2023-06-21 10:06:13,183 INFO [streaming_decode.py:380] Cuts processed until now is 2800.
113
+ 2023-06-21 10:06:14,919 INFO [streaming_decode.py:380] Cuts processed until now is 2850.
114
+ 2023-06-21 10:06:16,667 INFO [streaming_decode.py:380] Cuts processed until now is 2900.
115
+ 2023-06-21 10:06:18,270 INFO [streaming_decode.py:380] Cuts processed until now is 2950.
116
+ 2023-06-21 10:06:19,990 INFO [streaming_decode.py:380] Cuts processed until now is 3000.
117
+ 2023-06-21 10:06:20,222 INFO [streaming_decode.py:380] Cuts processed until now is 3050.
118
+ 2023-06-21 10:06:21,952 INFO [streaming_decode.py:380] Cuts processed until now is 3100.
119
+ 2023-06-21 10:06:22,202 INFO [streaming_decode.py:380] Cuts processed until now is 3150.
120
+ 2023-06-21 10:06:23,959 INFO [streaming_decode.py:380] Cuts processed until now is 3200.
121
+ 2023-06-21 10:06:24,183 INFO [streaming_decode.py:380] Cuts processed until now is 3250.
122
+ 2023-06-21 10:06:25,951 INFO [streaming_decode.py:380] Cuts processed until now is 3300.
123
+ 2023-06-21 10:06:26,203 INFO [streaming_decode.py:380] Cuts processed until now is 3350.
124
+ 2023-06-21 10:06:27,984 INFO [streaming_decode.py:380] Cuts processed until now is 3400.
125
+ 2023-06-21 10:06:28,228 INFO [streaming_decode.py:380] Cuts processed until now is 3450.
126
+ 2023-06-21 10:06:28,468 INFO [streaming_decode.py:380] Cuts processed until now is 3500.
127
+ 2023-06-21 10:06:30,266 INFO [streaming_decode.py:380] Cuts processed until now is 3550.
128
+ 2023-06-21 10:06:30,497 INFO [streaming_decode.py:380] Cuts processed until now is 3600.
129
+ 2023-06-21 10:06:45,693 INFO [streaming_decode.py:425] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
130
+ 2023-06-21 10:06:45,825 INFO [utils.py:561] [test-commonvoice-beam_4_max_contexts_4_max_states_32] %WER 14.96% [19859 / 132787, 3004 ins, 8788 del, 8067 sub ]
131
+ 2023-06-21 10:06:46,126 INFO [streaming_decode.py:436] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
132
+ 2023-06-21 10:06:46,126 INFO [streaming_decode.py:450]
133
+ For test-commonvoice, WER of different settings are:
134
+ beam_4_max_contexts_4_max_states_32 14.96 best for test-commonvoice
135
+
136
+ 2023-06-21 10:06:46,127 INFO [streaming_decode.py:618] Done!