haitian_creole

This model is a fine-tuned version of microsoft/speecht5_tts on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4117

Usage

import gradio as gr
import torch
import soundfile as sf
from IPython.display import Audio
from transformers import pipeline
from datasets import load_dataset
from IPython.display import Audio
import re

number_words = {
    0: "zewo", 1: "en", 2: "de", 3: "twa", 4: "kat", 5: "senk", 6: "sis", 7: "sèt", 8: "uit", 9: "nèf",
    10: "dis", 11: "onz", 12: "douz", 13: "trez", 14: "katorz", 15: "kenz", 16: "sèz", 17: "dis sèt",
    18: "dis uit", 19: "dis nèf", 20: "vent", 30: "trant", 40: "karant", 50: "senkant", 60: "swasant",
    70: "swasant diz", 80: "katreven", 90: "katreven diz", 100: "san", 1000: "mil"
}

def number_to_words(number):
    if number < 20:
        return number_words[number]
    elif number < 100:
        tens, unit = divmod(number, 10)
        return number_words[tens * 10] + (" " + number_words[unit] if unit else "")
    elif number < 1000:
        hundreds, remainder = divmod(number, 100)
        return (number_words[hundreds] + " san" if hundreds > 1 else "san") + (" " + number_to_words(remainder) if remainder else "")
    elif number < 1000000:
        thousands, remainder = divmod(number, 1000)
        return (number_to_words(thousands) + " mil" if thousands > 1 else "mil") + (" " + number_to_words(remainder) if remainder else "")
    elif number < 1000000000:
        millions, remainder = divmod(number, 1000000)
        return number_to_words(millions) + " milyon" + (" " + number_to_words(remainder) if remainder else "")
    elif number < 1000000000000:
        billions, remainder = divmod(number, 1000000000)
        return number_to_words(billions) + " milya" + (" " + number_to_words(remainder) if remainder else "")
    else:
        return str(number)

def replace_numbers_with_words2(text):
    def replace(match):
        number = int(match.group())
        return number_to_words(number)

    return re.sub(r'\b\d+\b', replace, text)
# Function to clean up text using the replacement pairs
def normalize_text2(text):
    # Convert to lowercase
    text = text.lower()

    # Remove punctuation (except apostrophes)
    text = re.sub(r'[^\w\s\']', '', text)

    # Remove extra whitespace
    text = ' '.join(text.split())

    return text
replacements = [ 
    ("b", "b"), ("d", "d"), ("f", "f"), ("g", "ɡ"), ("h", "h"),
    ("j", "ʒ"), ("k", "k"), ("l", "l"), ("m", "m"), ("n", "n"),
    ("p", "p"), ("r", "r"), ("s", "s"), ("t", "t"), ("v", "v"),
    ("w", "w"), ("y", "y"), ("z", "z"),
    
    ("a", "a"), ("e", "e"), ("è", "ɛ"), ("i", "i"), ("o", "o"),
    ("ò", "ɔ")
]

def cleanup_text2(cleaned_text):
    for src, dst in replacements:
        cleaned_text = cleaned_text.replace(src, dst)
    return cleaned_text

# Load the text-to-speech pipeline and speaker embedding
synthesiser = pipeline("text-to-speech", "jsbeaudry/haitian_creole")
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embedding = torch.tensor(embeddings_dataset[7106]["xvector"]).unsqueeze(0)

def generate_audio(text):
    converted_text = replace_numbers_with_words2(text)
    cleaned_text = cleanup_text2(converted_text)
    final_text = normalize_text2(cleaned_text)
    print(final_text)
    speech = synthesiser(final_text, forward_params={"speaker_embeddings": speaker_embedding})
    sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])
    return "speech.wav"

generate_audio("Kalkile koefisyan regresyon ak entèsepsyon yo lè l sèvi avèk metòd kare ki pi piti.")
  
Audio("speech.wav")

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 2000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.5635 2.5552 100 0.4883
0.4911 5.1262 200 0.4521
0.4715 7.6814 300 0.4418
0.4615 10.2524 400 0.4246
0.4358 12.8076 500 0.4190
0.4323 15.3785 600 0.4205
0.4161 17.9338 700 0.4242
0.4196 20.5047 800 0.4156
0.4122 23.0757 900 0.4154
0.4102 25.6309 1000 0.4192
0.4005 28.2019 1100 0.4121
0.3891 30.7571 1200 0.4159
0.3878 33.3281 1300 0.4216
0.3816 35.8833 1400 0.4113
0.3827 38.4543 1500 0.4059
0.388 41.0252 1600 0.4036
0.379 43.5804 1700 0.4157
0.3758 46.1514 1800 0.4081
0.3659 48.7066 1900 0.4068
0.3714 51.2776 2000 0.4117

Framework versions

  • Transformers 4.50.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
82
Safetensors
Model size
144M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsbeaudry/haitian_creole

Finetuned
(1046)
this model

Dataset used to train jsbeaudry/haitian_creole

Space using jsbeaudry/haitian_creole 1