Instructions to use AutomatedJanitor/vintage-voice with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- F5-TTS
How to use AutomatedJanitor/vintage-voice with F5-TTS:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- CosyVoice
How to use AutomatedJanitor/vintage-voice with CosyVoice:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
v0.2.0 Edison Singing β new branch shipped today
Hi all,
Quick announcement: I've pushed a new branch on this repo called v0.2.0-edison-singing containing an F5-TTS fine-tune trained on Edison wax-cylinder recordings, 1900β1925 (569 rows, ~2.26 hours, 6,800 updates over 50 epochs).
π Branch: https://huggingface.co/AutomatedJanitor/vintage-voice/tree/v0.2.0-edison-singing
π GitHub release: https://github.com/Scottcjn/vintage-voice/releases/tag/v0.2.0
main is unchanged and still serves the v0.1.0 transatlantic weights as the default download. Both versions live side-by-side.
Honest note on what this model actually is
Mid-training we noticed the samples sounded musical, not spoken. An audit pipeline (librosa music/speech classifier β Whisper language detection β text grep) confirmed 60β70% of the dataset is sung material β vaudeville, parlor song, opera, lieder. True modern-sounding spoken cylinders amount to about 40 rows after filtering, and even those register as musical because 1900s recording technique placed the speaker close to the horn and asked them to project.
Rather than fight the data, we shipped what the data is: a singing model that teaches any clean modern reference voice to perform in 1910s theatrical cadence under wax-cylinder acoustic character (band-limited ~300β3,000 Hz, horn-resonance-colored).
When to use which branch
main(v0.1.0 transatlantic) β for transatlantic-cadence spoken delivery (newsreel, radio drama).v0.2.0-edison-singing(this branch) β for sung output with wax-cylinder acoustic character.- For modern-sounding spoken anything β use
SWivid/F5-TTSdirectly. Neither of our fine-tunes is the right tool for clean modern speech.
How to pull
huggingface-cli download AutomatedJanitor/vintage-voice \
--revision v0.2.0-edison-singing \
--local-dir vintage-voice-edison-singing
Or in Python:
from huggingface_hub import snapshot_download
snapshot_download(
"AutomatedJanitor/vintage-voice",
revision="v0.2.0-edison-singing",
local_dir="vintage-voice-edison-singing",
)
The branch includes 8 gen+ref sample pairs at updates 3000/5000/6000/6500 if you want to hear how the cylinder character emerges across training.
License is unchanged β CC-BY-NC-4.0 on weights (inherited from F5-TTS base), MIT on the surrounding scripts, public-domain on the training audio (Internet Archive Edison cylinders).
If you try it and it does something interesting (or breaks), I'd love to hear about it β drop a reply here or open another discussion. This is the first community-tab entry, so the bar is very low.
β Scott / Sophia Elya