MMS-TTS Fine-tuned for Kabardian (Speaker: Sokhov Murat)
This repository contains a fine-tuned version of Facebook's MMS-TTS model, adapted for generating speech in the Kabardian language. The model is trained on a dataset of audio recordings by the speaker Sokhov Murat.
Model Details
- Base Model: facebook/mms-tts
- Fine-tuned on: anzorq/kbd_speech dataset
- Training steps: 5,100
- Speaker: Sokhov Murat
- Language: Circassian (Kabardian)
Usage
To use this model for text-to-speech generation, you can leverage the pipeline
functionality from the Transformers library. Here's an example:
from transformers import pipeline
import scipy
model_id = "anzorq/mms_finetune_kbd_murat"
synthesiser = pipeline("text-to-speech", model_id, device=0) # add device=0 if you want to use a GPU
text = "дауэ ущыт?"
speech = synthesiser(text)
# Save the generated audio to a file
scipy.io.wavfile.write("finetuned_output.wav", rate=speech["sampling_rate"], data=speech["audio"][0])
This code will generate an audio file finetuned_output.wav
containing the speech synthesis for the provided Kabardian text.
Notes
- Fine-tuned following the guide at https://github.com/ylacombe/finetune-hf-vits
- Since no pre-trained MMS-TTS model was available for Kabardian, we fine-tuned a model for Chechen, which has the closest character set to Kabardian.
- Do not use in production. This model's performance is considerably worse than that of the fine-tuned VITS model anzorq/kbd-vits-tts-male for Kabardian text-to-speech.
License
The original MMS-TTS model by Meta is licensed under the CC-BY-NC-4.0 License. This fine-tuned version inherits the same license.
Acknowledgments
- AI at Meta for the original MMS-TTS model.
- Sokhov Murat for providing the audio recordings used for fine-tuning.
- Downloads last month
- 13
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.