Instructions to use Joeyfully/Voice-Generation with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Joeyfully/Voice-Generation with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="Joeyfully/Voice-Generation")# Load model directly from transformers import AutoProcessor, AutoModelForTextToSpectrogram processor = AutoProcessor.from_pretrained("Joeyfully/Voice-Generation") model = AutoModelForTextToSpectrogram.from_pretrained("Joeyfully/Voice-Generation") - Notebooks
- Google Colab
- Kaggle
Radio TTS Quick Start
This workspace fine-tunes a text-to-speech model on the dataset in data/ and writes all outputs to output/. The training and testing scripts are available at https://github.com/Lambchem/Voice-Generation
What you get
train.pyfor training and automatic checkpoint savingeval.pyfor synthesizing one WAV file from textoutput/model/for saved model checkpointsoutput/rollout/for per-epoch audio samplesloss.csvandloss.pngin the project root for loss tracking
Requirements
You need Python with these packages available:
torchtorchaudiotransformersdatasetspandassoundfilematplotlib
The first run will also download the pretrained SpeechT5 model and vocoder from Hugging Face.
Data layout
The dataset should stay in the provided structure:
data/
train-00000-of-00010.parquet
train-00001-of-00010.parquet
...
The dataset metadata is described in dataset_README.md.
Train
Start training with the default settings:
python train.py
Recommended first run on Windows:
python train.py --epochs 1 --batch-size 2
Useful options:
--max-exampleslimits the dataset for a quick smoke test--sample-textadds custom rollout prompts and may be passed multiple times--output-dirchanges where checkpoints and samples are written
Resume training
Training resumes automatically from the latest checkpoint in output/model/ when one exists.
To force a fresh run, use:
python train.py --no-resume
Outputs after each epoch
Each epoch writes:
output/model/epoch_XXXX/for the checkpointoutput/rollout/epoch_XXXX/for generated audio samplesloss.csvandloss.pngupdated in the project rootoutput/model/last_checkpoint.txtpointing to the most recent checkpoint
Evaluate
Generate a WAV file from text:
python eval.py --text "Hello, this is a test."
The output is written to output/eval/.
Long text
eval.py supports long text by splitting the input into chunks, re-encoding each chunk through the tokenizer, generating audio chunk by chunk, and stitching the result into one WAV file.
If you want to tune long-form generation, try:
python eval.py --text "your long text" --chunk-token-limit 480 --chunk-gap-seconds 0.05 --maxlenratio 60
Notes
- The scripts use the pretrained
microsoft/speecht5_ttsmodel andmicrosoft/speecht5_hifiganvocoder by default. - The dataset README says attribution is required when using the generated voice in interfaces that generate audio in response to user action. Refer to the voice as Jenny, and where practical, Jenny (Dioco).
- Downloads last month
- 20