|
--- |
|
license: apache-2.0 |
|
tags: |
|
- speech |
|
- audio |
|
- lang-id |
|
- langid |
|
- language-recognition |
|
- language-identification |
|
- language-detection |
|
- tflite |
|
library_name: sidlingvo |
|
--- |
|
|
|
# Conformer based spoken language identification model |
|
|
|
## Summary |
|
|
|
This is a conformer-based streaming language identification model with attentive temporal pooling. |
|
|
|
The model was trained with public data only. |
|
|
|
The paper: https://arxiv.org/abs/2202.12163 |
|
|
|
``` |
|
@inproceedings{wang2022attentive, |
|
title={Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech}, |
|
author={Quan Wang and Yang Yu and Jason Pelecanos and Yiling Huang and Ignacio Lopez Moreno}, |
|
booktitle={Odyssey: The Speaker and Language Recognition Workshop}, |
|
year={2022} |
|
} |
|
``` |
|
|
|
## Usage |
|
|
|
Run use this model, you will need to use the `siglingvo` library: https://github.com/google/speaker-id/tree/master/lingvo |
|
|
|
Since lingvo does not support Python 3.11 yet, make sure your Python is up to 3.10. |
|
|
|
Install the library: |
|
|
|
``` |
|
pip install sidlingvo |
|
``` |
|
|
|
Example usage: |
|
|
|
```Python |
|
import os |
|
from sidlingvo import wav_to_lang |
|
from huggingface_hub import hf_hub_download |
|
|
|
repo_id = "tflite-hub/conformer-lang-id" |
|
model_path = "models" |
|
hf_hub_download(repo_id=repo_id, filename="vad_short_model.tflite", local_dir=model_path) |
|
hf_hub_download(repo_id=repo_id, filename="vad_short_mean_stddev.csv", local_dir=model_path) |
|
hf_hub_download(repo_id=repo_id, filename="conformer_langid_medium.tflite", local_dir=model_path) |
|
|
|
wav_file = "your_wav_file.wav" |
|
runner = wav_to_lang.WavToLangRunner( |
|
vad_model_file=os.path.join(model_path, "vad_short_model.tflite"), |
|
vad_mean_stddev_file=os.path.join(model_path, "vad_short_mean_stddev.csv"), |
|
langid_model_file=os.path.join(model_path, "conformer_langid_medium.tflite")) |
|
top_lang, _ = runner.wav_to_lang(wav_file) |
|
print("Predicted language:", top_lang) |
|
``` |