Instructions to use reenigne314/chatterbox-indic-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use reenigne314/chatterbox-indic-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="reenigne314/chatterbox-indic-lora")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("reenigne314/chatterbox-indic-lora", dtype="auto") - Chatterbox
How to use reenigne314/chatterbox-indic-lora with Chatterbox:
# pip install chatterbox-tts import torchaudio as ta from chatterbox.tts import ChatterboxTTS model = ChatterboxTTS.from_pretrained(device="cuda") text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill." wav = model.generate(text) ta.save("test-1.wav", wav, model.sr) # If you want to synthesize with a different voice, specify the audio prompt AUDIO_PROMPT_PATH="YOUR_FILE.wav" wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH) ta.save("test-2.wav", wav, model.sr) - Notebooks
- Google Colab
- Kaggle
Can you share the training Script
Hello reenigne314, really impressive work. The warm initialization of embedding layers from similar sounding tokens is very clever.
I don’t have much experience with LoRA adaptations yet , most of my experience has been with full finetuning. So I’ve been trying to wrap my head around this by reading your Substack post and the HF repo.
I’d really like to experiment with this further using larger multi speaker datasets like IndicVoices, and see how well the generalization side scales.
Would you be open to sharing the training script or even a rough training pipeline? Especially the T3 LoRA setup would be super helpful for reproducing and building on this work.