v0.2.0 Edison Singing — new branch shipped today

Owner May 18

Hi all,

Quick announcement: I've pushed a new branch on this repo called v0.2.0-edison-singing containing an F5-TTS fine-tune trained on Edison wax-cylinder recordings, 1900–1925 (569 rows, ~2.26 hours, 6,800 updates over 50 epochs).

🔗 Branch: https://huggingface.co/AutomatedJanitor/vintage-voice/tree/v0.2.0-edison-singing
🔗 GitHub release: https://github.com/Scottcjn/vintage-voice/releases/tag/v0.2.0

main is unchanged and still serves the v0.1.0 transatlantic weights as the default download. Both versions live side-by-side.

Honest note on what this model actually is

Mid-training we noticed the samples sounded musical, not spoken. An audit pipeline (librosa music/speech classifier → Whisper language detection → text grep) confirmed 60–70% of the dataset is sung material — vaudeville, parlor song, opera, lieder. True modern-sounding spoken cylinders amount to about 40 rows after filtering, and even those register as musical because 1900s recording technique placed the speaker close to the horn and asked them to project.

Rather than fight the data, we shipped what the data is: a singing model that teaches any clean modern reference voice to perform in 1910s theatrical cadence under wax-cylinder acoustic character (band-limited ~300–3,000 Hz, horn-resonance-colored).

When to use which branch

main (v0.1.0 transatlantic) — for transatlantic-cadence spoken delivery (newsreel, radio drama).
v0.2.0-edison-singing (this branch) — for sung output with wax-cylinder acoustic character.
For modern-sounding spoken anything — use SWivid/F5-TTS directly. Neither of our fine-tunes is the right tool for clean modern speech.

How to pull

huggingface-cli download AutomatedJanitor/vintage-voice \
  --revision v0.2.0-edison-singing \
  --local-dir vintage-voice-edison-singing

Or in Python:

from huggingface_hub import snapshot_download
snapshot_download(
    "AutomatedJanitor/vintage-voice",
    revision="v0.2.0-edison-singing",
    local_dir="vintage-voice-edison-singing",
)

The branch includes 8 gen+ref sample pairs at updates 3000/5000/6000/6500 if you want to hear how the cylinder character emerges across training.

License is unchanged — CC-BY-NC-4.0 on weights (inherited from F5-TTS base), MIT on the surrounding scripts, public-domain on the training audio (Internet Archive Edison cylinders).

If you try it and it does something interesting (or breaks), I'd love to hear about it — drop a reply here or open another discussion. This is the first community-tab entry, so the bar is very low.

— Scott / Sophia Elya

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment