keithito/lj_speech
Updated • 1.51k • 62
BananaMind TTS V1 is a small from-scratch English single-speaker text-to-speech acoustic model trained on LJSpeech. It is a fixed-voice TTS model, not a voice-cloning system.
safetensorsmodel.safetensors: Tacotron-lite acoustic model weightsconfig.json: Hugging Face custom model configconfiguration_bananamind_tts.py: custom AutoConfig implementationmodeling_bananamind_tts.py: custom AutoModel implementationmodel_config.json: sidecar metadata with model config, tokenizer, epoch, and stepUse with Transformers remote code:
from transformers import AutoModel
model = AutoModel.from_pretrained(
"Banaxi-Tech/BananaMind-TTS-V1",
trust_remote_code=True,
)
out = model.tts(
"Hello from Banana TTS. This is a simple speech test.",
normalize_wav=True,
)
model.save_wav("sample.wav", out.waveform, out.sample_rate)
Install runtime dependencies:
pip install torch numpy safetensors transformers
Digits are stripped by the current tokenizer. Write numbers as words:
one plus one is two1 plus 1 is 2This model is intended for English single-speaker text-to-speech generation with the included local inference script.
Do not present this model as a voice cloning model or use it to impersonate any person. It has no voice-cloning capability.