ArchiTTS VAE 12.5 Hz

Packaged ArchiTTS 24 kHz continuous VAE for convenient audio encode/decode.

  • Sample rate: 24 kHz
  • Latent frame rate: 12.5 Hz
  • Downsampling ratio: 1920 samples
  • Latent dimension: 64
  • Default encoder output: deterministic posterior mean

Files:

  • architts_vae12_5hz.py: single-file loader and CLI
  • architts_vae12_5hz.pt: packaged checkpoint containing model source, config, and state_dict

Install

pip install torch torchaudio einops huggingface_hub

CLI

python architts_vae12_5hz.py info
python architts_vae12_5hz.py encode input.wav latent.pt
python architts_vae12_5hz.py decode latent.pt recon.wav
python architts_vae12_5hz.py reconstruct input.wav recon.wav

Python

from architts_vae12_5hz import ArchiTTSVAE12Hz

vae = ArchiTTSVAE12Hz(device="cuda")
audio = vae.load_audio("input.wav")      # (1, 1, samples), 24 kHz
latent = vae.encode(audio)               # (1, T, 64)
recon = vae.decode(latent)               # (1, 1, samples)
vae.save_audio(recon, "recon.wav")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support