automatic-speech-recognition-with-next-gen-kaldi

Build error

App Files Files Community

peteralexandercharles

csukuangfj commited on Jan 7, 2023

Commit

2fe5632

0 Parent(s):

Duplicate from EuroPython2022/automatic-speech-recognition-with-next-gen-kaldi

Browse files

Files changed (44) hide show

.gitattributes +27 -0
README.md +14 -0
app.py +331 -0
decode.py +121 -0
examples.py +256 -0
giga-tokens.txt +500 -0
model.py +585 -0
requirements.txt +11 -0
test_wavs/aidatatang_200zh/README.md +2 -0
test_wavs/aidatatang_200zh/T0055G0036S0002.wav +0 -0
test_wavs/aidatatang_200zh/T0055G0036S0003.wav +0 -0
test_wavs/aidatatang_200zh/T0055G0036S0004.wav +0 -0
test_wavs/aishell2/ID0012W0030.wav +0 -0
test_wavs/aishell2/ID0012W0162.wav +0 -0
test_wavs/aishell2/ID0012W0215.wav +0 -0
test_wavs/aishell2/README.md +2 -0
test_wavs/aishell2/trans.txt +3 -0
test_wavs/arabic/a.wav +0 -0
test_wavs/arabic/b.wav +0 -0
test_wavs/arabic/c.wav +0 -0
test_wavs/arabic/trans.txt +3 -0
test_wavs/german/20120315-0900-PLENARY-14-de_20120315.wav +0 -0
test_wavs/german/20170517-0900-PLENARY-16-de_20170517.wav +0 -0
test_wavs/gigaspeech/1-minute-audiobook.opus +0 -0
test_wavs/gigaspeech/100-seconds-podcast.opus +0 -0
test_wavs/gigaspeech/100-seconds-youtube.opus +0 -0
test_wavs/librispeech/1089-134686-0001.wav +0 -0
test_wavs/librispeech/1221-135766-0001.wav +0 -0
test_wavs/librispeech/1221-135766-0002.wav +0 -0
test_wavs/librispeech/README.md +2 -0
test_wavs/librispeech/trans.txt +3 -0
test_wavs/tal_csasr/0.wav +0 -0
test_wavs/tal_csasr/210_36476_210_8341_1_1533271973_7057520_132.wav +0 -0
test_wavs/tal_csasr/210_36476_210_8341_1_1533271973_7057520_138.wav +0 -0
test_wavs/tal_csasr/210_36476_210_8341_1_1533271973_7057520_145.wav +0 -0
test_wavs/tal_csasr/README.md +2 -0
test_wavs/tibetan/a_0_cacm-A70_31116.wav +0 -0
test_wavs/tibetan/a_0_cacm-A70_31117.wav +0 -0
test_wavs/tibetan/a_0_cacm-A70_31118.wav +0 -0
test_wavs/tibetan/trans.txt +3 -0
test_wavs/wenetspeech/DEV_T0000000000.opus +0 -0
test_wavs/wenetspeech/DEV_T0000000001.opus +0 -0
test_wavs/wenetspeech/DEV_T0000000002.opus +0 -0
test_wavs/wenetspeech/README.md +2 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,27 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zstandard filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,14 @@

+---
+title: Automatic Speech Recognition
+emoji: 🌖
+colorFrom: yellow
+colorTo: green
+sdk: gradio
+sdk_version: 3.0.26
+app_file: app.py
+pinned: false
+license: apache-2.0
+duplicated_from: EuroPython2022/automatic-speech-recognition-with-next-gen-kaldi
+---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py ADDED Viewed

	@@ -0,0 +1,331 @@

+#!/usr/bin/env python3
+#
+# Copyright      2022  Xiaomi Corp.        (authors: Fangjun Kuang)
+#
+# See LICENSE for clarification regarding multiple authors
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# References:
+# https://gradio.app/docs/#dropdown
+import logging
+import os
+import time
+from datetime import datetime
+import gradio as gr
+import torch
+import torchaudio
+from examples import examples
+from model import get_pretrained_model, language_to_models, sample_rate
+languages = list(language_to_models.keys())
+def convert_to_wav(in_filename: str) -> str:
+    """Convert the input audio file to a wave file"""
+    out_filename = in_filename + ".wav"
+    logging.info(f"Converting '{in_filename}' to '{out_filename}'")
+    _ = os.system(f"ffmpeg -hide_banner -i '{in_filename}' -ar 16000 '{out_filename}'")
+    return out_filename
+def build_html_output(s: str, style: str = "result_item_success"):
+    return f"""
+    <div class='result'>
+        <div class='result_item {style}'>
+          {s}
+        </div>
+    </div>
+    """
+def process_uploaded_file(
+    language: str,
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+    in_filename: str,
+):
+    if in_filename is None or in_filename == "":
+        return "", build_html_output(
+            "Please first upload a file and then click "
+            'the button "submit for recognition"',
+            "result_item_error",
+        )
+    logging.info(f"Processing uploaded file: {in_filename}")
+    try:
+        return process(
+            in_filename=in_filename,
+            language=language,
+            repo_id=repo_id,
+            decoding_method=decoding_method,
+            num_active_paths=num_active_paths,
+        )
+    except Exception as e:
+        logging.info(str(e))
+        return "", build_html_output(str(e), "result_item_error")
+def process_microphone(
+    language: str,
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+    in_filename: str,
+):
+    if in_filename is None or in_filename == "":
+        return "", build_html_output(
+            "Please first click 'Record from microphone', speak, "
+            "click 'Stop recording', and then "
+            "click the button 'submit for recognition'",
+            "result_item_error",
+        )
+    logging.info(f"Processing microphone: {in_filename}")
+    try:
+        return process(
+            in_filename=in_filename,
+            language=language,
+            repo_id=repo_id,
+            decoding_method=decoding_method,
+            num_active_paths=num_active_paths,
+        )
+    except Exception as e:
+        logging.info(str(e))
+        return "", build_html_output(str(e), "result_item_error")
+@torch.no_grad()
+def process(
+    language: str,
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+    in_filename: str,
+):
+    logging.info(f"language: {language}")
+    logging.info(f"repo_id: {repo_id}")
+    logging.info(f"decoding_method: {decoding_method}")
+    logging.info(f"num_active_paths: {num_active_paths}")
+    logging.info(f"in_filename: {in_filename}")
+    filename = convert_to_wav(in_filename)
+    now = datetime.now()
+    date_time = now.strftime("%Y-%m-%d %H:%M:%S.%f")
+    logging.info(f"Started at {date_time}")
+    start = time.time()
+    recognizer = get_pretrained_model(
+        repo_id,
+        decoding_method=decoding_method,
+        num_active_paths=num_active_paths,
+    )
+    s = recognizer.create_stream()
+    s.accept_wave_file(filename)
+    recognizer.decode_stream(s)
+    text = s.result.text
+    date_time = now.strftime("%Y-%m-%d %H:%M:%S.%f")
+    end = time.time()
+    metadata = torchaudio.info(filename)
+    duration = metadata.num_frames / sample_rate
+    rtf = (end - start) / duration
+    logging.info(f"Finished at {date_time} s. Elapsed: {end - start: .3f} s")
+    info = f"""
+    Wave duration  : {duration: .3f} s <br/>
+    Processing time: {end - start: .3f} s <br/>
+    RTF: {end - start: .3f}/{duration: .3f} = {rtf:.3f} <br/>
+    """
+    if rtf > 1:
+        info += (
+            "<br/>We are loading the model for the first run. "
+            "Please run again to measure the real RTF.<br/>"
+        )
+    logging.info(info)
+    logging.info(f"\nrepo_id: {repo_id}\nhyp: {text}")
+    return text, build_html_output(info)
+title = "# Automatic Speech Recognition with Next-gen Kaldi"
+description = """
+This space shows how to do automatic speech recognition with Next-gen Kaldi.
+It is running on CPU within a docker container provided by Hugging Face.
+See more information by visiting the following links:
+- <https://github.com/k2-fsa/icefall>
+- <https://github.com/k2-fsa/sherpa>
+- <https://github.com/k2-fsa/k2>
+- <https://github.com/lhotse-speech/lhotse>
+If you want to deploy it locally, please see
+<https://k2-fsa.github.io/sherpa/>
+"""
+# css style is copied from
+# https://huggingface.co/spaces/alphacep/asr/blob/main/app.py#L113
+css = """
+.result {display:flex;flex-direction:column}
+.result_item {padding:15px;margin-bottom:8px;border-radius:15px;width:100%}
+.result_item_success {background-color:mediumaquamarine;color:white;align-self:start}
+.result_item_error {background-color:#ff7070;color:white;align-self:start}
+"""
+def update_model_dropdown(language: str):
+    if language in language_to_models:
+        choices = language_to_models[language]
+        return gr.Dropdown.update(choices=choices, value=choices[0])
+    raise ValueError(f"Unsupported language: {language}")
+demo = gr.Blocks(css=css)
+with demo:
+    gr.Markdown(title)
+    language_choices = list(language_to_models.keys())
+    language_radio = gr.Radio(
+        label="Language",
+        choices=language_choices,
+        value=language_choices[0],
+    )
+    model_dropdown = gr.Dropdown(
+        choices=language_to_models[language_choices[0]],
+        label="Select a model",
+        value=language_to_models[language_choices[0]][0],
+    )
+    language_radio.change(
+        update_model_dropdown,
+        inputs=language_radio,
+        outputs=model_dropdown,
+    )
+    decoding_method_radio = gr.Radio(
+        label="Decoding method",
+        choices=["greedy_search", "modified_beam_search"],
+        value="greedy_search",
+    )
+    num_active_paths_slider = gr.Slider(
+        minimum=1,
+        value=4,
+        step=1,
+        label="Number of active paths for modified_beam_search",
+    )
+    with gr.Tabs():
+        with gr.TabItem("Upload from disk"):
+            uploaded_file = gr.Audio(
+                source="upload",  # Choose between "microphone", "upload"
+                type="filepath",
+                optional=False,
+                label="Upload from disk",
+            )
+            upload_button = gr.Button("Submit for recognition")
+            uploaded_output = gr.Textbox(label="Recognized speech from uploaded file")
+            uploaded_html_info = gr.HTML(label="Info")
+            gr.Examples(
+                examples=examples,
+                inputs=[
+                    language_radio,
+                    model_dropdown,
+                    decoding_method_radio,
+                    num_active_paths_slider,
+                    uploaded_file,
+                ],
+                outputs=[uploaded_output, uploaded_html_info],
+                fn=process_uploaded_file,
+            )
+        with gr.TabItem("Record from microphone"):
+            microphone = gr.Audio(
+                source="microphone",  # Choose between "microphone", "upload"
+                type="filepath",
+                optional=False,
+                label="Record from microphone",
+            )
+            record_button = gr.Button("Submit for recognition")
+            recorded_output = gr.Textbox(label="Recognized speech from recordings")
+            recorded_html_info = gr.HTML(label="Info")
+            gr.Examples(
+                examples=examples,
+                inputs=[
+                    language_radio,
+                    model_dropdown,
+                    decoding_method_radio,
+                    num_active_paths_slider,
+                    microphone,
+                ],
+                outputs=[recorded_output, recorded_html_info],
+                fn=process_microphone,
+            )
+        upload_button.click(
+            process_uploaded_file,
+            inputs=[
+                language_radio,
+                model_dropdown,
+                decoding_method_radio,
+                num_active_paths_slider,
+                uploaded_file,
+            ],
+            outputs=[uploaded_output, uploaded_html_info],
+        )
+        record_button.click(
+            process_microphone,
+            inputs=[
+                language_radio,
+                model_dropdown,
+                decoding_method_radio,
+                num_active_paths_slider,
+                microphone,
+            ],
+            outputs=[recorded_output, recorded_html_info],
+        )
+    gr.Markdown(description)
+torch.set_num_threads(1)
+torch.set_num_interop_threads(1)
+torch._C._jit_set_profiling_executor(False)
+torch._C._jit_set_profiling_mode(False)
+torch._C._set_graph_executor_optimize(False)
+if __name__ == "__main__":
+    formatter = "%(asctime)s %(levelname)s [%(filename)s:%(lineno)d] %(message)s"
+    logging.basicConfig(format=formatter, level=logging.INFO)
+    demo.launch()

decode.py ADDED Viewed

	@@ -0,0 +1,121 @@

+# Copyright      2022  Xiaomi Corp.        (authors: Fangjun Kuang)
+#
+# Copied from https://github.com/k2-fsa/sherpa/blob/master/sherpa/bin/conformer_rnnt/decode.py
+#
+# See LICENSE for clarification regarding multiple authors
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import math
+from typing import List
+import torch
+from sherpa import RnntConformerModel, greedy_search, modified_beam_search
+from torch.nn.utils.rnn import pad_sequence
+LOG_EPS = math.log(1e-10)
+@torch.no_grad()
+def run_model_and_do_greedy_search(
+    model: RnntConformerModel,
+    features: List[torch.Tensor],
+) -> List[List[int]]:
+    """Run RNN-T model with the given features and use greedy search
+    to decode the output of the model.
+    Args:
+      model:
+        The RNN-T model.
+      features:
+        A list of 2-D tensors. Each entry is of shape
+        (num_frames, feature_dim).
+    Returns:
+      Return a list-of-list containing the decoding token IDs.
+    """
+    features_length = torch.tensor(
+        [f.size(0) for f in features],
+        dtype=torch.int64,
+    )
+    features = pad_sequence(
+        features,
+        batch_first=True,
+        padding_value=LOG_EPS,
+    )
+    device = model.device
+    features = features.to(device)
+    features_length = features_length.to(device)
+    encoder_out, encoder_out_length = model.encoder(
+        features=features,
+        features_length=features_length,
+    )
+    hyp_tokens = greedy_search(
+        model=model,
+        encoder_out=encoder_out,
+        encoder_out_length=encoder_out_length.cpu(),
+    )
+    return hyp_tokens
+@torch.no_grad()
+def run_model_and_do_modified_beam_search(
+    model: RnntConformerModel,
+    features: List[torch.Tensor],
+    num_active_paths: int,
+) -> List[List[int]]:
+    """Run RNN-T model with the given features and use greedy search
+    to decode the output of the model.
+    Args:
+      model:
+        The RNN-T model.
+      features:
+        A list of 2-D tensors. Each entry is of shape
+        (num_frames, feature_dim).
+      num_active_paths:
+        Used only when decoding_method is modified_beam_search.
+        It specifies number of active paths for each utterance. Due to
+        merging paths with identical token sequences, the actual number
+        may be less than "num_active_paths".
+    Returns:
+      Return a list-of-list containing the decoding token IDs.
+    """
+    features_length = torch.tensor(
+        [f.size(0) for f in features],
+        dtype=torch.int64,
+    )
+    features = pad_sequence(
+        features,
+        batch_first=True,
+        padding_value=LOG_EPS,
+    )
+    device = model.device
+    features = features.to(device)
+    features_length = features_length.to(device)
+    encoder_out, encoder_out_length = model.encoder(
+        features=features,
+        features_length=features_length,
+    )
+    hyp_tokens = modified_beam_search(
+        model=model,
+        encoder_out=encoder_out,
+        encoder_out_length=encoder_out_length.cpu(),
+        num_active_paths=num_active_paths,
+    )
+    return hyp_tokens

examples.py ADDED Viewed

	@@ -0,0 +1,256 @@

+#!/usr/bin/env python3
+#
+# Copyright      2022  Xiaomi Corp.        (authors: Fangjun Kuang)
+#
+# See LICENSE for clarification regarding multiple authors
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+examples = [
+    [
+        "Chinese+English",
+        "ptrnull/icefall-asr-conv-emformer-transducer-stateless2-zh",
+        "greedy_search",
+        4,
+        "./test_wavs/tal_csasr/0.wav",
+    ],
+    [
+        "English",
+        "csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13",
+        "greedy_search",
+        4,
+        "./test_wavs/librispeech/1089-134686-0001.wav",
+    ],
+    [
+        "Chinese",
+        "luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2",
+        "greedy_search",
+        4,
+        "./test_wavs/wenetspeech/DEV_T0000000000.opus",
+    ],
+    [
+        "German",
+        "csukuangfj/wav2vec2.0-torchaudio",
+        "greedy_search",
+        4,
+        "./test_wavs/german/20170517-0900-PLENARY-16-de_20170517.wav",
+    ],
+    [
+        "Arabic",
+        "AmirHussein/icefall-asr-mgb2-conformer_ctc-2022-27-06",
+        "greedy_search",
+        4,
+        "./test_wavs/arabic/a.wav",
+    ],
+    [
+        "Tibetan",
+        "syzym/icefall-asr-xbmu-amdo31-pruned-transducer-stateless7-2022-12-02",
+        "greedy_search",
+        4,
+        "./test_wavs/tibetan/a_0_cacm-A70_31117.wav",
+    ],
+    # librispeech
+    # https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13/tree/main/test_wavs
+    [
+        "English",
+        "csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13",
+        "greedy_search",
+        4,
+        "./test_wavs/librispeech/1089-134686-0001.wav",
+    ],
+    [
+        "English",
+        "csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13",
+        "greedy_search",
+        4,
+        "./test_wavs/librispeech/1221-135766-0001.wav",
+    ],
+    [
+        "English",
+        "csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13",
+        "greedy_search",
+        4,
+        "./test_wavs/librispeech/1221-135766-0002.wav",
+    ],
+    # gigaspeech
+    [
+        "English",
+        "wgb14/icefall-asr-gigaspeech-pruned-transducer-stateless2",
+        "greedy_search",
+        4,
+        "./test_wavs/gigaspeech/1-minute-audiobook.opus",
+    ],
+    [
+        "English",
+        "wgb14/icefall-asr-gigaspeech-pruned-transducer-stateless2",
+        "greedy_search",
+        4,
+        "./test_wavs/gigaspeech/100-seconds-podcast.opus",
+    ],
+    [
+        "English",
+        "wgb14/icefall-asr-gigaspeech-pruned-transducer-stateless2",
+        "greedy_search",
+        4,
+        "./test_wavs/gigaspeech/100-seconds-youtube.opus",
+    ],
+    # wenetspeech
+    # https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2/tree/main/test_wavs
+    [
+        "Chinese",
+        "luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2",
+        "greedy_search",
+        4,
+        "./test_wavs/wenetspeech/DEV_T0000000000.opus",
+    ],
+    [
+        "Chinese",
+        "luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2",
+        "greedy_search",
+        4,
+        "./test_wavs/wenetspeech/DEV_T0000000001.opus",
+    ],
+    [
+        "Chinese",
+        "luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2",
+        "greedy_search",
+        4,
+        "./test_wavs/wenetspeech/DEV_T0000000002.opus",
+    ],
+    # aishell2-A
+    # https://huggingface.co/yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-A-2022-07-12/tree/main/test_wavs
+    [
+        "Chinese",
+        "yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-A-2022-07-12",
+        "greedy_search",
+        4,
+        "./test_wavs/aishell2/ID0012W0030.wav",
+    ],
+    [
+        "Chinese",
+        "yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-A-2022-07-12",
+        "greedy_search",
+        4,
+        "./test_wavs/aishell2/ID0012W0162.wav",
+    ],
+    [
+        "Chinese",
+        "yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-A-2022-07-12",
+        "greedy_search",
+        4,
+        "./test_wavs/aishell2/ID0012W0215.wav",
+    ],
+    # aishell2-B
+    # https://huggingface.co/yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-A-2022-07-12/tree/main/test_wavs
+    [
+        "Chinese",
+        "yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-B-2022-07-12",
+        "greedy_search",
+        4,
+        "./test_wavs/aishell2/ID0012W0030.wav",
+    ],
+    [
+        "Chinese",
+        "yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-B-2022-07-12",
+        "greedy_search",
+        4,
+        "./test_wavs/aishell2/ID0012W0162.wav",
+    ],
+    [
+        "Chinese",
+        "yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-B-2022-07-12",
+        "greedy_search",
+        4,
+        "./test_wavs/aishell2/ID0012W0215.wav",
+    ],
+    # aishell2-B
+    # https://huggingface.co/luomingshuang/icefall_asr_aidatatang-200zh_pruned_transducer_stateless2/tree/main/test_wavs
+    [
+        "Chinese",
+        "luomingshuang/icefall_asr_aidatatang-200zh_pruned_transducer_stateless2",
+        "greedy_search",
+        4,
+        "./test_wavs/aidatatang_200zh/T0055G0036S0002.wav",
+    ],
+    [
+        "Chinese",
+        "luomingshuang/icefall_asr_aidatatang-200zh_pruned_transducer_stateless2",
+        "greedy_search",
+        4,
+        "./test_wavs/aidatatang_200zh/T0055G0036S0003.wav",
+    ],
+    [
+        "Chinese",
+        "luomingshuang/icefall_asr_aidatatang-200zh_pruned_transducer_stateless2",
+        "greedy_search",
+        4,
+        "./test_wavs/aidatatang_200zh/T0055G0036S0004.wav",
+    ],
+    # tal_csasr
+    [
+        "Chinese+English",
+        "ptrnull/icefall-asr-conv-emformer-transducer-stateless2-zh",
+        "greedy_search",
+        4,
+        "./test_wavs/tal_csasr/210_36476_210_8341_1_1533271973_7057520_132.wav",
+    ],
+    [
+        "Chinese+English",
+        "ptrnull/icefall-asr-conv-emformer-transducer-stateless2-zh",
+        "greedy_search",
+        4,
+        "./test_wavs/tal_csasr/210_36476_210_8341_1_1533271973_7057520_138.wav",
+    ],
+    [
+        "Chinese+English",
+        "ptrnull/icefall-asr-conv-emformer-transducer-stateless2-zh",
+        "greedy_search",
+        4,
+        "./test_wavs/tal_csasr/210_36476_210_8341_1_1533271973_7057520_145.wav",
+    ],
+    [
+        "Tibetan",
+        "syzym/icefall-asr-xbmu-amdo31-pruned-transducer-stateless7-2022-12-02",
+        "greedy_search",
+        4,
+        "./test_wavs/tibetan/a_0_cacm-A70_31116.wav",
+    ],
+    [
+        "Tibetan",
+        "syzym/icefall-asr-xbmu-amdo31-pruned-transducer-stateless7-2022-12-02",
+        "greedy_search",
+        4,
+        "./test_wavs/tibetan/a_0_cacm-A70_31118.wav",
+    ],
+    # arabic
+    [
+        "Arabic",
+        "AmirHussein/icefall-asr-mgb2-conformer_ctc-2022-27-06",
+        "greedy_search",
+        4,
+        "./test_wavs/arabic/b.wav",
+    ],
+    [
+        "Arabic",
+        "AmirHussein/icefall-asr-mgb2-conformer_ctc-2022-27-06",
+        "greedy_search",
+        4,
+        "./test_wavs/arabic/c.wav",
+    ],
+    [
+        "German",
+        "csukuangfj/wav2vec2.0-torchaudio",
+        "greedy_search",
+        4,
+        "./test_wavs/german/20120315-0900-PLENARY-14-de_20120315.wav",
+    ],
+]

giga-tokens.txt ADDED Viewed

	@@ -0,0 +1,500 @@

+<blk> 0
+<sos/eos> 1
+<unk> 2
+S 3
+T 4
+▁THE 5
+▁A 6
+E 7
+▁AND 8
+▁TO 9
+N 10
+D 11
+▁OF 12
+' 13
+ING 14
+▁I 15
+Y 16
+▁IN 17
+ED 18
+▁THAT 19
+▁ 20
+P 21
+R 22
+▁YOU 23
+M 24
+RE 25
+ER 26
+C 27
+O 28
+▁IT 29
+L 30
+A 31
+U 32
+G 33
+▁WE 34
+▁IS 35
+▁SO 36
+AL 37
+I 38
+▁S 39
+▁RE 40
+AR 41
+B 42
+▁FOR 43
+▁C 44
+▁BE 45
+LE 46
+F 47
+W 48
+▁E 49
+▁HE 50
+LL 51
+▁WAS 52
+LY 53
+OR 54
+IN 55
+▁F 56
+VE 57
+▁THIS 58
+TH 59
+K 60
+▁ON 61
+IT 62
+▁B 63
+▁WITH 64
+▁BUT 65
+EN 66
+CE 67
+RI 68
+▁DO 69
+UR 70
+▁HAVE 71
+▁DE 72
+▁ME 73
+▁T 74
+ENT 75
+CH 76
+▁THEY 77
+▁NOT 78
+ES 79
+V 80
+▁AS 81
+RA 82
+▁P 83
+ON 84
+TER 85
+▁ARE 86
+▁WHAT 87
+IC 88
+▁ST 89
+▁LIKE 90
+ATION 91
+▁OR 92
+▁CA 93
+▁AT 94
+H 95
+▁KNOW 96
+▁G 97
+AN 98
+▁CON 99
+IL 100
+ND 101
+RO 102
+▁HIS 103
+▁CAN 104
+▁ALL 105
+TE 106
+▁THERE 107
+▁SU 108
+▁MO 109
+▁MA 110
+LI 111
+▁ONE 112
+▁ABOUT 113
+LA 114
+▁CO 115
+- 116
+▁MY 117
+▁HAD 118
+CK 119
+NG 120
+▁NO 121
+MENT 122
+AD 123
+LO 124
+ME 125
+▁AN 126
+▁FROM 127
+NE 128
+▁IF 129
+VER 130
+▁JUST 131
+▁PRO 132
+ION 133
+▁PA 134
+▁WHO 135
+▁SE 136
+EL 137
+IR 138
+▁US 139
+▁UP 140
+▁YOUR 141
+CI 142
+RY 143
+▁GO 144
+▁SHE 145
+▁LE 146
+▁OUT 147
+▁PO 148
+▁HO 149
+ATE 150
+▁BO 151
+▁BY 152
+▁FA 153
+▁MI 154
+AS 155
+MP 156
+▁HER 157
+VI 158
+▁THINK 159
+▁SOME 160
+▁WHEN 161
+▁AH 162
+▁PEOPLE 163
+IG 164
+▁WA 165
+▁TE 166
+▁LA 167
+▁WERE 168
+▁LI 169
+▁WOULD 170
+▁SEE 171
+▁WHICH 172
+DE 173
+GE 174
+▁K 175
+IGHT 176
+▁HA 177
+▁OUR 178
+UN 179
+▁HOW 180
+▁GET 181
+IS 182
+UT 183
+Z 184
+CO 185
+ET 186
+UL 187
+IES 188
+IVE 189
+AT 190
+▁O 191
+▁DON 192
+LU 193
+▁TIME 194
+▁WILL 195
+▁MORE 196
+▁SP 197
+▁NOW 198
+RU 199
+▁THEIR 200
+▁UN 201
+ITY 202
+OL 203
+X 204
+TI 205
+US 206
+▁VERY 207
+TION 208
+▁FI 209
+▁SAY 210
+▁BECAUSE 211
+▁EX 212
+▁RO 213
+ERS 214
+IST 215
+▁DA 216
+TING 217
+▁EN 218
+OM 219
+▁BA 220
+▁BEEN 221
+▁LO 222
+▁UM 223
+AGE 224
+ABLE 225
+▁WO 226
+▁RA 227
+▁OTHER 228
+▁REALLY 229
+ENCE 230
+▁GOING 231
+▁HIM 232
+▁HAS 233
+▁THEM 234
+▁DIS 235
+▁WANT 236
+ID 237
+TA 238
+▁LOOK 239
+KE 240
+▁DID 241
+▁SA 242
+▁VI 243
+▁SAID 244
+▁RIGHT 245
+▁THESE 246
+▁WORK 247
+▁COM 248
+ALLY 249
+FF 250
+QU 251
+AC 252
+▁DR 253
+▁WAY 254
+▁INTO 255
+MO 256
+TED 257
+EST 258
+▁HERE 259
+OK 260
+▁COULD 261
+▁WELL 262
+MA 263
+▁PRE 264
+▁DI 265
+MAN 266
+▁COMP 267
+▁THEN 268
+IM 269
+▁PER 270
+▁NA 271
+▁WHERE 272
+▁TWO 273
+▁WI 274
+▁FE 275
+INE 276
+▁ANY 277
+TURE 278
+▁OVER 279
+BO 280
+ACH 281
+OW 282
+▁MAKE 283
+▁TRA 284
+HE 285
+UND 286
+▁EVEN 287
+ANCE 288
+▁YEAR 289
+HO 290
+AM 291
+▁CHA 292
+▁BACK 293
+VO 294
+ANT 295
+DI 296
+▁ALSO 297
+▁THOSE 298
+▁MAN 299
+CTION 300
+ICAL 301
+▁JO 302
+▁OP 303
+▁NEW 304
+▁MU 305
+▁HU 306
+▁KIND 307
+▁NE 308
+CA 309
+END 310
+TIC 311
+FUL 312
+▁YEAH 313
+SH 314
+▁APP 315
+▁THINGS 316
+SIDE 317
+▁GOOD 318
+ONE 319
+▁TAKE 320
+CU 321
+▁EVERY 322
+▁MEAN 323
+▁FIRST 324
+OP 325
+▁TH 326
+▁MUCH 327
+▁PART 328
+UGH 329
+▁COME 330
+J 331
+▁THAN 332
+▁EXP 333
+▁AGAIN 334
+▁LITTLE 335
+MB 336
+▁NEED 337
+▁TALK 338
+IF 339
+FOR 340
+▁SH 341
+ISH 342
+▁STA 343
+ATED 344
+▁GU 345
+▁LET 346
+IA 347
+▁MAR 348
+▁DOWN 349
+▁DAY 350
+▁GA 351
+▁SOMETHING 352
+▁BU 353
+DUC 354
+HA 355
+▁LOT 356
+▁RU 357
+▁THOUGH 358
+▁GREAT 359
+AIN 360
+▁THROUGH 361
+▁THING 362
+OUS 363
+▁PRI 364
+▁GOT 365
+▁SHOULD 366
+▁AFTER 367
+▁HEAR 368
+▁TA 369
+▁ONLY 370
+▁CHI 371
+IOUS 372
+▁SHA 373
+▁MOST 374
+▁ACTUALLY 375
+▁START 376
+LIC 377
+▁VA 378
+▁RI 379
+DAY 380
+IAN 381
+▁DOES 382
+ROW 383
+▁GRA 384
+ITION 385
+▁MANY 386
+▁BEFORE 387
+▁GIVE 388
+PORT 389
+QUI 390
+▁LIFE 391
+▁WORLD 392
+▁PI 393
+▁LONG 394
+▁THREE 395
+IZE 396
+NESS 397
+▁SHOW 398
+PH 399
+▁WHY 400
+▁QUESTION 401
+WARD 402
+▁THANK 403
+▁PH 404
+▁DIFFERENT 405
+▁OWN 406
+▁FEEL 407
+▁MIGHT 408
+▁HAPPEN 409
+▁MADE 410
+▁BRO 411
+IBLE 412
+▁HI 413
+▁STATE 414
+▁HAND 415
+▁NEVER 416
+▁PLACE 417
+▁LOVE 418
+▁DU 419
+▁POINT 420
+▁HELP 421
+▁COUNT 422
+▁STILL 423
+▁MR 424
+▁FIND 425
+▁PERSON 426
+▁CAME 427
+▁SAME 428
+▁LAST 429
+▁HIGH 430
+▁OLD 431
+▁UNDER 432
+▁FOUR 433
+▁AROUND 434
+▁SORT 435
+▁CHANGE 436
+▁YES 437
+SHIP 438
+▁ANOTHER 439
+ATIVE 440
+▁FOUND 441
+▁JA 442
+▁ALWAYS 443
+▁NEXT 444
+▁TURN 445
+▁JU 446
+▁SIX 447
+▁FACT 448
+▁INTEREST 449
+▁WORD 450
+▁THOUSAND 451
+▁HUNDRED 452
+▁NUMBER 453
+▁IDEA 454
+▁PLAN 455
+▁COURSE 456
+▁SCHOOL 457
+▁HOUSE 458
+▁TWENTY 459
+▁JE 460
+▁PLAY 461
+▁AWAY 462
+▁LEARN 463
+▁HARD 464
+▁WEEK 465
+▁BETTER 466
+▁WHILE 467
+▁FRIEND 468
+▁OKAY 469
+▁NINE 470
+▁UNDERSTAND 471
+▁KEEP 472
+▁GONNA 473
+▁SYSTEM 474
+▁AMERICA 475
+▁POWER 476
+▁IMPORTANT 477
+▁WITHOUT 478
+▁MAYBE 479
+▁SEVEN 480
+▁BETWEEN 481
+▁BUILD 482
+▁CERTAIN 483
+▁PROBLEM 484
+▁MONEY 485
+▁BELIEVE 486
+▁SECOND 487
+▁REASON 488
+▁TOGETHER 489
+▁PUBLIC 490
+▁ANYTHING 491
+▁SPEAK 492
+▁BUSINESS 493
+▁EVERYTHING 494
+▁CLOSE 495
+▁QUITE 496
+▁ANSWER 497
+▁ENOUGH 498
+Q 499

model.py ADDED Viewed

	@@ -0,0 +1,585 @@

+# Copyright      2022  Xiaomi Corp.        (authors: Fangjun Kuang)
+#
+# See LICENSE for clarification regarding multiple authors
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from huggingface_hub import hf_hub_download
+from functools import lru_cache
+import os
+os.system(
+    "cp -v /home/user/.local/lib/python3.8/site-packages/k2/lib/*.so /home/user/.local/lib/python3.8/site-packages/sherpa/lib/"
+)
+import k2
+import sherpa
+sample_rate = 16000
+@lru_cache(maxsize=30)
+def get_pretrained_model(
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+) -> sherpa.OfflineRecognizer:
+    if repo_id in chinese_models:
+        return chinese_models[repo_id](
+            repo_id, decoding_method=decoding_method, num_active_paths=num_active_paths
+        )
+    elif repo_id in english_models:
+        return english_models[repo_id](
+            repo_id, decoding_method=decoding_method, num_active_paths=num_active_paths
+        )
+    elif repo_id in chinese_english_mixed_models:
+        return chinese_english_mixed_models[repo_id](
+            repo_id, decoding_method=decoding_method, num_active_paths=num_active_paths
+        )
+    elif repo_id in tibetan_models:
+        return tibetan_models[repo_id](
+            repo_id, decoding_method=decoding_method, num_active_paths=num_active_paths
+        )
+    elif repo_id in arabic_models:
+        return arabic_models[repo_id](
+            repo_id, decoding_method=decoding_method, num_active_paths=num_active_paths
+        )
+    elif repo_id in german_models:
+        return german_models[repo_id](
+            repo_id, decoding_method=decoding_method, num_active_paths=num_active_paths
+        )
+    else:
+        raise ValueError(f"Unsupported repo_id: {repo_id}")
+def _get_nn_model_filename(
+    repo_id: str,
+    filename: str,
+    subfolder: str = "exp",
+) -> str:
+    nn_model_filename = hf_hub_download(
+        repo_id=repo_id,
+        filename=filename,
+        subfolder=subfolder,
+    )
+    return nn_model_filename
+def _get_bpe_model_filename(
+    repo_id: str,
+    filename: str = "bpe.model",
+    subfolder: str = "data/lang_bpe_500",
+) -> str:
+    bpe_model_filename = hf_hub_download(
+        repo_id=repo_id,
+        filename=filename,
+        subfolder=subfolder,
+    )
+    return bpe_model_filename
+def _get_token_filename(
+    repo_id: str,
+    filename: str = "tokens.txt",
+    subfolder: str = "data/lang_char",
+) -> str:
+    token_filename = hf_hub_download(
+        repo_id=repo_id,
+        filename=filename,
+        subfolder=subfolder,
+    )
+    return token_filename
+@lru_cache(maxsize=10)
+def _get_aishell2_pretrained_model(
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+) -> sherpa.OfflineRecognizer:
+    assert repo_id in [
+        # context-size 1
+        "yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-A-2022-07-12",  # noqa
+        # context-size 2
+        "yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-B-2022-07-12",  # noqa
+    ], repo_id
+    nn_model = _get_nn_model_filename(
+        repo_id=repo_id,
+        filename="cpu_jit.pt",
+    )
+    tokens = _get_token_filename(repo_id=repo_id)
+    feat_config = sherpa.FeatureConfig()
+    feat_config.fbank_opts.frame_opts.samp_freq = sample_rate
+    feat_config.fbank_opts.mel_opts.num_bins = 80
+    feat_config.fbank_opts.frame_opts.dither = 0
+    config = sherpa.OfflineRecognizerConfig(
+        nn_model=nn_model,
+        tokens=tokens,
+        use_gpu=False,
+        feat_config=feat_config,
+        decoding_method=decoding_method,
+        num_active_paths=num_active_paths,
+    )
+    recognizer = sherpa.OfflineRecognizer(config)
+    return recognizer
+@lru_cache(maxsize=10)
+def _get_gigaspeech_pre_trained_model(
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+) -> sherpa.OfflineRecognizer:
+    assert repo_id in [
+        "wgb14/icefall-asr-gigaspeech-pruned-transducer-stateless2",
+    ], repo_id
+    nn_model = _get_nn_model_filename(
+        repo_id=repo_id,
+        filename="cpu_jit-iter-3488000-avg-20.pt",
+    )
+    tokens = "./giga-tokens.txt"
+    feat_config = sherpa.FeatureConfig()
+    feat_config.fbank_opts.frame_opts.samp_freq = sample_rate
+    feat_config.fbank_opts.mel_opts.num_bins = 80
+    feat_config.fbank_opts.frame_opts.dither = 0
+    config = sherpa.OfflineRecognizerConfig(
+        nn_model=nn_model,
+        tokens=tokens,
+        use_gpu=False,
+        feat_config=feat_config,
+        decoding_method=decoding_method,
+        num_active_paths=num_active_paths,
+    )
+    recognizer = sherpa.OfflineRecognizer(config)
+    return recognizer
+@lru_cache(maxsize=10)
+def _get_librispeech_pre_trained_model(
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+) -> sherpa.OfflineRecognizer:
+    assert repo_id in [
+        "WeijiZhuang/icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02",  # noqa
+        "csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13",  # noqa
+        "csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11",  # noqa
+        "csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless8-2022-11-14",  # noqa
+    ], repo_id
+    filename = "cpu_jit.pt"
+    if (
+        repo_id
+        == "csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11"
+    ):
+        filename = "cpu_jit-torch-1.10.0.pt"
+    if (
+        repo_id
+        == "WeijiZhuang/icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02"
+    ):
+        filename = "cpu_jit-torch-1.10.pt"
+    nn_model = _get_nn_model_filename(
+        repo_id=repo_id,
+        filename=filename,
+    )
+    tokens = _get_token_filename(repo_id=repo_id, subfolder="data/lang_bpe_500")
+    feat_config = sherpa.FeatureConfig()
+    feat_config.fbank_opts.frame_opts.samp_freq = sample_rate
+    feat_config.fbank_opts.mel_opts.num_bins = 80
+    feat_config.fbank_opts.frame_opts.dither = 0
+    config = sherpa.OfflineRecognizerConfig(
+        nn_model=nn_model,
+        tokens=tokens,
+        use_gpu=False,
+        feat_config=feat_config,
+        decoding_method=decoding_method,
+        num_active_paths=num_active_paths,
+    )
+    recognizer = sherpa.OfflineRecognizer(config)
+    return recognizer
+@lru_cache(maxsize=10)
+def _get_wenetspeech_pre_trained_model(
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+):
+    assert repo_id in [
+        "luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2",
+    ], repo_id
+    nn_model = _get_nn_model_filename(
+        repo_id=repo_id,
+        filename="cpu_jit_epoch_10_avg_2_torch_1.7.1.pt",
+    )
+    tokens = _get_token_filename(repo_id=repo_id)
+    feat_config = sherpa.FeatureConfig()
+    feat_config.fbank_opts.frame_opts.samp_freq = sample_rate
+    feat_config.fbank_opts.mel_opts.num_bins = 80
+    feat_config.fbank_opts.frame_opts.dither = 0
+    config = sherpa.OfflineRecognizerConfig(
+        nn_model=nn_model,
+        tokens=tokens,
+        use_gpu=False,
+        feat_config=feat_config,
+        decoding_method=decoding_method,
+        num_active_paths=num_active_paths,
+    )
+    recognizer = sherpa.OfflineRecognizer(config)
+    return recognizer
+@lru_cache(maxsize=10)
+def _get_chinese_english_mixed_model(
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+):
+    assert repo_id in [
+        "luomingshuang/icefall_asr_tal-csasr_pruned_transducer_stateless5",
+        "ptrnull/icefall-asr-conv-emformer-transducer-stateless2-zh",
+    ], repo_id
+    if repo_id == "luomingshuang/icefall_asr_tal-csasr_pruned_transducer_stateless5":
+        filename = "cpu_jit.pt"
+        subfolder = "data/lang_char"
+    elif repo_id == "ptrnull/icefall-asr-conv-emformer-transducer-stateless2-zh":
+        filename = "cpu_jit-epoch-11-avg-1.pt"
+        subfolder = "data/lang_char_bpe"
+    nn_model = _get_nn_model_filename(
+        repo_id=repo_id,
+        filename=filename,
+    )
+    tokens = _get_token_filename(repo_id=repo_id, subfolder=subfolder)
+    feat_config = sherpa.FeatureConfig()
+    feat_config.fbank_opts.frame_opts.samp_freq = sample_rate
+    feat_config.fbank_opts.mel_opts.num_bins = 80
+    feat_config.fbank_opts.frame_opts.dither = 0
+    config = sherpa.OfflineRecognizerConfig(
+        nn_model=nn_model,
+        tokens=tokens,
+        use_gpu=False,
+        feat_config=feat_config,
+        decoding_method=decoding_method,
+        num_active_paths=num_active_paths,
+    )
+    recognizer = sherpa.OfflineRecognizer(config)
+    return recognizer
+@lru_cache(maxsize=10)
+def _get_alimeeting_pre_trained_model(
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+):
+    assert repo_id in [
+        "luomingshuang/icefall_asr_alimeeting_pruned_transducer_stateless2",
+    ], repo_id
+    nn_model = _get_nn_model_filename(
+        repo_id=repo_id,
+        filename="cpu_jit_torch_1.7.1.pt",
+    )
+    tokens = _get_token_filename(repo_id=repo_id)
+    feat_config = sherpa.FeatureConfig()
+    feat_config.fbank_opts.frame_opts.samp_freq = sample_rate
+    feat_config.fbank_opts.mel_opts.num_bins = 80
+    feat_config.fbank_opts.frame_opts.dither = 0
+    config = sherpa.OfflineRecognizerConfig(
+        nn_model=nn_model,
+        tokens=tokens,
+        use_gpu=False,
+        feat_config=feat_config,
+        decoding_method=decoding_method,
+        num_active_paths=num_active_paths,
+    )
+    recognizer = sherpa.OfflineRecognizer(config)
+    return recognizer
+@lru_cache(maxsize=10)
+def _get_wenet_model(
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+):
+    assert repo_id in [
+        "csukuangfj/wenet-chinese-model",
+        "csukuangfj/wenet-english-model",
+    ], repo_id
+    nn_model = _get_nn_model_filename(
+        repo_id=repo_id,
+        filename="final.zip",
+        subfolder=".",
+    )
+    tokens = _get_token_filename(
+        repo_id=repo_id,
+        filename="units.txt",
+        subfolder=".",
+    )
+    feat_config = sherpa.FeatureConfig(normalize_samples=False)
+    feat_config.fbank_opts.frame_opts.samp_freq = sample_rate
+    feat_config.fbank_opts.mel_opts.num_bins = 80
+    feat_config.fbank_opts.frame_opts.dither = 0
+    config = sherpa.OfflineRecognizerConfig(
+        nn_model=nn_model,
+        tokens=tokens,
+        use_gpu=False,
+        feat_config=feat_config,
+        decoding_method=decoding_method,
+        num_active_paths=num_active_paths,
+    )
+    recognizer = sherpa.OfflineRecognizer(config)
+    return recognizer
+@lru_cache(maxsize=10)
+def _get_aidatatang_200zh_pretrained_mode(
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+):
+    assert repo_id in [
+        "luomingshuang/icefall_asr_aidatatang-200zh_pruned_transducer_stateless2",
+    ], repo_id
+    nn_model = _get_nn_model_filename(
+        repo_id=repo_id,
+        filename="cpu_jit_torch.1.7.1.pt",
+    )
+    tokens = _get_token_filename(repo_id=repo_id)
+    feat_config = sherpa.FeatureConfig()
+    feat_config.fbank_opts.frame_opts.samp_freq = sample_rate
+    feat_config.fbank_opts.mel_opts.num_bins = 80
+    feat_config.fbank_opts.frame_opts.dither = 0
+    config = sherpa.OfflineRecognizerConfig(
+        nn_model=nn_model,
+        tokens=tokens,
+        use_gpu=False,
+        feat_config=feat_config,
+        decoding_method=decoding_method,
+        num_active_paths=num_active_paths,
+    )
+    recognizer = sherpa.OfflineRecognizer(config)
+    return recognizer
+@lru_cache(maxsize=10)
+def _get_tibetan_pre_trained_model(
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+):
+    assert repo_id in [
+        "syzym/icefall-asr-xbmu-amdo31-pruned-transducer-stateless7-2022-12-02",
+        "syzym/icefall-asr-xbmu-amdo31-pruned-transducer-stateless5-2022-11-29",
+    ], repo_id
+    filename = "cpu_jit.pt"
+    if (
+        repo_id
+        == "syzym/icefall-asr-xbmu-amdo31-pruned-transducer-stateless5-2022-11-29"
+    ):
+        filename = "cpu_jit-epoch-28-avg-23-torch-1.10.0.pt"
+    nn_model = _get_nn_model_filename(
+        repo_id=repo_id,
+        filename=filename,
+    )
+    tokens = _get_token_filename(repo_id=repo_id, subfolder="data/lang_bpe_500")
+    feat_config = sherpa.FeatureConfig()
+    feat_config.fbank_opts.frame_opts.samp_freq = sample_rate
+    feat_config.fbank_opts.mel_opts.num_bins = 80
+    feat_config.fbank_opts.frame_opts.dither = 0
+    config = sherpa.OfflineRecognizerConfig(
+        nn_model=nn_model,
+        tokens=tokens,
+        use_gpu=False,
+        feat_config=feat_config,
+        decoding_method=decoding_method,
+        num_active_paths=num_active_paths,
+    )
+    recognizer = sherpa.OfflineRecognizer(config)
+    return recognizer
+@lru_cache(maxsize=10)
+def _get_arabic_pre_trained_model(
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+):
+    assert repo_id in [
+        "AmirHussein/icefall-asr-mgb2-conformer_ctc-2022-27-06",
+    ], repo_id
+    nn_model = _get_nn_model_filename(
+        repo_id=repo_id,
+        filename="cpu_jit.pt",
+    )
+    tokens = _get_token_filename(repo_id=repo_id, subfolder="data/lang_bpe_5000")
+    feat_config = sherpa.FeatureConfig()
+    feat_config.fbank_opts.frame_opts.samp_freq = sample_rate
+    feat_config.fbank_opts.mel_opts.num_bins = 80
+    feat_config.fbank_opts.frame_opts.dither = 0
+    config = sherpa.OfflineRecognizerConfig(
+        nn_model=nn_model,
+        tokens=tokens,
+        use_gpu=False,
+        feat_config=feat_config,
+        decoding_method=decoding_method,
+        num_active_paths=num_active_paths,
+    )
+    recognizer = sherpa.OfflineRecognizer(config)
+    return recognizer
+@lru_cache(maxsize=10)
+def _get_german_pre_trained_model(
+    repo_id: str,
+    decoding_method: str,
+    num_active_paths: int,
+):
+    assert repo_id in [
+        "csukuangfj/wav2vec2.0-torchaudio",
+    ], repo_id
+    nn_model = _get_nn_model_filename(
+        repo_id=repo_id,
+        filename="voxpopuli_asr_base_10k_de.pt",
+        subfolder=".",
+    )
+    tokens = _get_token_filename(
+        repo_id=repo_id,
+        filename="tokens-de.txt",
+        subfolder=".",
+    )
+    config = sherpa.OfflineRecognizerConfig(
+        nn_model=nn_model,
+        tokens=tokens,
+        use_gpu=False,
+        decoding_method=decoding_method,
+        num_active_paths=num_active_paths,
+    )
+    recognizer = sherpa.OfflineRecognizer(config)
+    return recognizer
+chinese_models = {
+    "luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2": _get_wenetspeech_pre_trained_model,  # noqa
+    "yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-A-2022-07-12": _get_aishell2_pretrained_model,  # noqa
+    "yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-B-2022-07-12": _get_aishell2_pretrained_model,  # noqa
+    "luomingshuang/icefall_asr_aidatatang-200zh_pruned_transducer_stateless2": _get_aidatatang_200zh_pretrained_mode,  # noqa
+    "luomingshuang/icefall_asr_alimeeting_pruned_transducer_stateless2": _get_alimeeting_pre_trained_model,  # noqa
+    "csukuangfj/wenet-chinese-model": _get_wenet_model,
+}
+english_models = {
+    "wgb14/icefall-asr-gigaspeech-pruned-transducer-stateless2": _get_gigaspeech_pre_trained_model,  # noqa
+    "WeijiZhuang/icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02": _get_librispeech_pre_trained_model,  # noqa
+    "csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless8-2022-11-14": _get_librispeech_pre_trained_model,  # noqa
+    "csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11": _get_librispeech_pre_trained_model,  # noqa
+    "csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13": _get_librispeech_pre_trained_model,  # noqa
+    "csukuangfj/wenet-english-model": _get_wenet_model,
+}
+chinese_english_mixed_models = {
+    "ptrnull/icefall-asr-conv-emformer-transducer-stateless2-zh": _get_chinese_english_mixed_model,
+    "luomingshuang/icefall_asr_tal-csasr_pruned_transducer_stateless5": _get_chinese_english_mixed_model,  # noqa
+}
+tibetan_models = {
+    "syzym/icefall-asr-xbmu-amdo31-pruned-transducer-stateless7-2022-12-02": _get_tibetan_pre_trained_model,  # noqa
+    "syzym/icefall-asr-xbmu-amdo31-pruned-transducer-stateless5-2022-11-29": _get_tibetan_pre_trained_model,  # noqa
+}
+arabic_models = {
+    "AmirHussein/icefall-asr-mgb2-conformer_ctc-2022-27-06": _get_arabic_pre_trained_model,  # noqa
+}
+german_models = {
+    "csukuangfj/wav2vec2.0-torchaudio": _get_german_pre_trained_model,
+}
+all_models = {
+    **chinese_models,
+    **english_models,
+    **chinese_english_mixed_models,
+    **tibetan_models,
+    **arabic_models,
+    **german_models,
+}
+language_to_models = {
+    "Chinese": list(chinese_models.keys()),
+    "English": list(english_models.keys()),
+    "Chinese+English": list(chinese_english_mixed_models.keys()),
+    "Tibetan": list(tibetan_models.keys()),
+    "Arabic": list(arabic_models.keys()),
+    "German": list(german_models.keys()),
+}

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+https://download.pytorch.org/whl/cpu/torch-1.13.0%2Bcpu-cp38-cp38-linux_x86_64.whl
+https://download.pytorch.org/whl/cpu/torchaudio-0.13.0%2Bcpu-cp38-cp38-linux_x86_64.whl
+https://huggingface.co/csukuangfj/wheels/resolve/main/k2-1.23.2.dev20221204%2Bcpu.torch1.13.0-cp38-cp38-linux_x86_64.whl
+https://huggingface.co/csukuangfj/wheels/resolve/main/k2_sherpa-1.1-cp38-cp38-linux_x86_64.whl
+https://huggingface.co/csukuangfj/wheels/resolve/main/kaldifeat-1.22-cp38-cp38-linux_x86_64.whl
+sentencepiece>=0.1.96
+numpy
+huggingface_hub

test_wavs/aidatatang_200zh/README.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ Files are downloaded from
2	+ https://huggingface.co/luomingshuang/icefall_asr_aidatatang-200zh_pruned_transducer_stateless2/tree/main/test_wavs

test_wavs/aidatatang_200zh/T0055G0036S0002.wav ADDED Viewed

Binary file (67.6 kB). View file

test_wavs/aidatatang_200zh/T0055G0036S0003.wav ADDED Viewed

Binary file (94.2 kB). View file

test_wavs/aidatatang_200zh/T0055G0036S0004.wav ADDED Viewed

Binary file (70.5 kB). View file

test_wavs/aishell2/ID0012W0030.wav ADDED Viewed

Binary file (113 kB). View file

test_wavs/aishell2/ID0012W0162.wav ADDED Viewed

Binary file (114 kB). View file

test_wavs/aishell2/ID0012W0215.wav ADDED Viewed

Binary file (104 kB). View file

test_wavs/aishell2/README.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ Files are downloaded from
2	+ https://huggingface.co/yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-B-2022-07-12/tree/main/test_wavs

test_wavs/aishell2/trans.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+ID0012W0162 立法机关采纳了第二种意见
+ID0012W0215 大家都愿意牺牲自己的生命
+ID0012W0030 完全是典型的军事侵略

test_wavs/arabic/a.wav ADDED Viewed

Binary file (253 kB). View file

test_wavs/arabic/b.wav ADDED Viewed

Binary file (243 kB). View file

test_wavs/arabic/c.wav ADDED Viewed

Binary file (150 kB). View file

test_wavs/arabic/trans.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+94D37D38-B203-4FC0-9F3A-538F5C174920_spk-0001_seg-0053813:0054281 بعد أن عجز وبدأ يصدر مشكلات شعبه ومشكلات مصر
+94D37D38-B203-4FC0-9F3A-538F5C174920_spk-0001_seg-0051454:0052244 وهؤلاء أولياء الشيطان ها هو ذا أحدهم الآن ضيفا عليكم على قناة الجزيرة ولا يستحي في ذلك
+94D37D38-B203-4FC0-9F3A-538F5C174920_spk-0001_seg-0052244:0053004 عندما استغاث الليبيون بالعالم استغاثوا لرفع الظلم وليس لقهر إرادة الأمة ومصادرة الحياة الدستورية

test_wavs/german/20120315-0900-PLENARY-14-de_20120315.wav ADDED Viewed

Binary file (381 kB). View file

test_wavs/german/20170517-0900-PLENARY-16-de_20170517.wav ADDED Viewed

Binary file (282 kB). View file

test_wavs/gigaspeech/1-minute-audiobook.opus ADDED Viewed

Binary file (580 kB). View file

test_wavs/gigaspeech/100-seconds-podcast.opus ADDED Viewed

Binary file (955 kB). View file

test_wavs/gigaspeech/100-seconds-youtube.opus ADDED Viewed

Binary file (948 kB). View file

test_wavs/librispeech/1089-134686-0001.wav ADDED Viewed

Binary file (212 kB). View file

test_wavs/librispeech/1221-135766-0001.wav ADDED Viewed

Binary file (535 kB). View file

test_wavs/librispeech/1221-135766-0002.wav ADDED Viewed

Binary file (154 kB). View file

test_wavs/librispeech/README.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ Files are downloaded from
2	+ https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13/tree/main/test_wavs

test_wavs/librispeech/trans.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+1089-134686-0001 AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
+1221-135766-0001 GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONOURED BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
+1221-135766-0002 YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION

test_wavs/tal_csasr/0.wav ADDED Viewed

Binary file (259 kB). View file

test_wavs/tal_csasr/210_36476_210_8341_1_1533271973_7057520_132.wav ADDED Viewed

Binary file (163 kB). View file

test_wavs/tal_csasr/210_36476_210_8341_1_1533271973_7057520_138.wav ADDED Viewed

Binary file (150 kB). View file

test_wavs/tal_csasr/210_36476_210_8341_1_1533271973_7057520_145.wav ADDED Viewed

Binary file (283 kB). View file

test_wavs/tal_csasr/README.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ Files are downloaded from
2	+ https://huggingface.co/luomingshuang/icefall_asr_tal-csasr_pruned_transducer_stateless5/tree/main/test_wavs

test_wavs/tibetan/a_0_cacm-A70_31116.wav ADDED Viewed

Binary file (97.4 kB). View file

test_wavs/tibetan/a_0_cacm-A70_31117.wav ADDED Viewed

Binary file (128 kB). View file

test_wavs/tibetan/a_0_cacm-A70_31118.wav ADDED Viewed

Binary file (87.1 kB). View file

test_wavs/tibetan/trans.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+a_0_cacm-A70_31116.wav ལོ བཅུ ཙམ མ འདང བའི དུས སྐབས ནང
+a_0_cacm-A70_31117.wav དྲག པོའི ངོ ལོག ཟིང འཁྲུག སྒྲིག འཛུགས དང ངན བཀོད བྱས ཡོད
+a_0_cacm-A70_31118.wav གནས བབ འདིའི རིགས གང མགྱོགས འགྱུར བ གཏོང དགོས

test_wavs/wenetspeech/DEV_T0000000000.opus ADDED Viewed

Binary file (23.1 kB). View file

test_wavs/wenetspeech/DEV_T0000000001.opus ADDED Viewed

Binary file (21.5 kB). View file

test_wavs/wenetspeech/DEV_T0000000002.opus ADDED Viewed

Binary file (18.8 kB). View file

test_wavs/wenetspeech/README.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ Files are downloaded from
2	+ https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2/tree/main/test_wavs