haitian_creole
This model is a fine-tuned version of microsoft/speecht5_tts on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.4117
Usage
import gradio as gr
import torch
import soundfile as sf
from IPython.display import Audio
from transformers import pipeline
from datasets import load_dataset
from IPython.display import Audio
import re
number_words = {
0: "zewo", 1: "en", 2: "de", 3: "twa", 4: "kat", 5: "senk", 6: "sis", 7: "sèt", 8: "uit", 9: "nèf",
10: "dis", 11: "onz", 12: "douz", 13: "trez", 14: "katorz", 15: "kenz", 16: "sèz", 17: "dis sèt",
18: "dis uit", 19: "dis nèf", 20: "vent", 30: "trant", 40: "karant", 50: "senkant", 60: "swasant",
70: "swasant diz", 80: "katreven", 90: "katreven diz", 100: "san", 1000: "mil"
}
def number_to_words(number):
if number < 20:
return number_words[number]
elif number < 100:
tens, unit = divmod(number, 10)
return number_words[tens * 10] + (" " + number_words[unit] if unit else "")
elif number < 1000:
hundreds, remainder = divmod(number, 100)
return (number_words[hundreds] + " san" if hundreds > 1 else "san") + (" " + number_to_words(remainder) if remainder else "")
elif number < 1000000:
thousands, remainder = divmod(number, 1000)
return (number_to_words(thousands) + " mil" if thousands > 1 else "mil") + (" " + number_to_words(remainder) if remainder else "")
elif number < 1000000000:
millions, remainder = divmod(number, 1000000)
return number_to_words(millions) + " milyon" + (" " + number_to_words(remainder) if remainder else "")
elif number < 1000000000000:
billions, remainder = divmod(number, 1000000000)
return number_to_words(billions) + " milya" + (" " + number_to_words(remainder) if remainder else "")
else:
return str(number)
def replace_numbers_with_words2(text):
def replace(match):
number = int(match.group())
return number_to_words(number)
return re.sub(r'\b\d+\b', replace, text)
# Function to clean up text using the replacement pairs
def normalize_text2(text):
# Convert to lowercase
text = text.lower()
# Remove punctuation (except apostrophes)
text = re.sub(r'[^\w\s\']', '', text)
# Remove extra whitespace
text = ' '.join(text.split())
return text
replacements = [
("b", "b"), ("d", "d"), ("f", "f"), ("g", "ɡ"), ("h", "h"),
("j", "ʒ"), ("k", "k"), ("l", "l"), ("m", "m"), ("n", "n"),
("p", "p"), ("r", "r"), ("s", "s"), ("t", "t"), ("v", "v"),
("w", "w"), ("y", "y"), ("z", "z"),
("a", "a"), ("e", "e"), ("è", "ɛ"), ("i", "i"), ("o", "o"),
("ò", "ɔ")
]
def cleanup_text2(cleaned_text):
for src, dst in replacements:
cleaned_text = cleaned_text.replace(src, dst)
return cleaned_text
# Load the text-to-speech pipeline and speaker embedding
synthesiser = pipeline("text-to-speech", "jsbeaudry/haitian_creole")
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embedding = torch.tensor(embeddings_dataset[7106]["xvector"]).unsqueeze(0)
def generate_audio(text):
converted_text = replace_numbers_with_words2(text)
cleaned_text = cleanup_text2(converted_text)
final_text = normalize_text2(cleaned_text)
print(final_text)
speech = synthesiser(final_text, forward_params={"speaker_embeddings": speaker_embedding})
sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])
return "speech.wav"
generate_audio("Kalkile koefisyan regresyon ak entèsepsyon yo lè l sèvi avèk metòd kare ki pi piti.")
Audio("speech.wav")
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- training_steps: 2000
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.5635 | 2.5552 | 100 | 0.4883 |
0.4911 | 5.1262 | 200 | 0.4521 |
0.4715 | 7.6814 | 300 | 0.4418 |
0.4615 | 10.2524 | 400 | 0.4246 |
0.4358 | 12.8076 | 500 | 0.4190 |
0.4323 | 15.3785 | 600 | 0.4205 |
0.4161 | 17.9338 | 700 | 0.4242 |
0.4196 | 20.5047 | 800 | 0.4156 |
0.4122 | 23.0757 | 900 | 0.4154 |
0.4102 | 25.6309 | 1000 | 0.4192 |
0.4005 | 28.2019 | 1100 | 0.4121 |
0.3891 | 30.7571 | 1200 | 0.4159 |
0.3878 | 33.3281 | 1300 | 0.4216 |
0.3816 | 35.8833 | 1400 | 0.4113 |
0.3827 | 38.4543 | 1500 | 0.4059 |
0.388 | 41.0252 | 1600 | 0.4036 |
0.379 | 43.5804 | 1700 | 0.4157 |
0.3758 | 46.1514 | 1800 | 0.4081 |
0.3659 | 48.7066 | 1900 | 0.4068 |
0.3714 | 51.2776 | 2000 | 0.4117 |
Framework versions
- Transformers 4.50.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 82
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for jsbeaudry/haitian_creole
Base model
microsoft/speecht5_tts