tflite-hub
/

conformer-speaker-encoder

speaker-recognition

speaker-embedding

speaker-verification

speaker-identification

speaker-encoder

Model card Files Files and versions Community

wq2012 commited on Sep 15

Commit

09597a3

•

1 Parent(s): 0bbf2b2

Update README.md

Files changed (1) hide show

README.md +57 -3

README.md CHANGED Viewed

@@ -1,3 +1,57 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# Conformer based multilingual speaker encoder
+## Summary
+This is a massively multilingual conformer-based speaker recognition model.
+The model was trained with public data only.
+The paper: https://arxiv.org/abs/2104.02125
+```
+@inproceedings{chojnacka2021speakerstew,
+  title={{SpeakerStew: Scaling to many languages with a triaged multilingual text-dependent and text-independent speaker verification system}},
+  author={Chojnacka, Roza and Pelecanos, Jason and Wang, Quan and Moreno, Ignacio Lopez},
+  booktitle={Prod. Interspeech},
+  year={2021}
+}
+```
+## Usage
+Run use this model, you will need to use the `siglingvo` library: https://github.com/google/speaker-id/tree/master/lingvo
+Since lingvo does not support Python 3.11 yet, make sure your Python is up to 3.10.
+Install the library:
+```
+pip install sidlingvo
+```
+Example usage:
+```Python
+import os
+from sidlingvo import wav_to_dvector
+from huggingface_hub import hf_hub_download
+repo_id = "tflite-hub/conformer-speaker-encoder"
+model_path = "models"
+hf_hub_download(repo_id=repo_id, filename="vad_long_model.tflite", local_dir=model_path)
+hf_hub_download(repo_id=repo_id, filename="vad_long_mean_stddev.csv", local_dir=model_path)
+hf_hub_download(repo_id=repo_id, filename="conformer_tisid_medium..tflite", local_dir=model_path)
+enroll_wav_files = ["your_first_wav_file.wav"]
+test_wav_file = "your_second_wav_file.wav"
+runner = wav_to_dvector.WavToDvectorRunner(
+    vad_model_file=os.path.join(model_path, "vad_long_model.tflite"),
+    vad_mean_stddev_file=os.path.join(model_path, "vad_long_mean_stddev.csv"),
+    tisid_model_file=os.path.join(model_path, "conformer_tisid_medium.tflite"))
+score = runner.compute_score(enroll_wav_files, test_wav_file)
+print("Speaker similarity score:", score)
+```