ArchiTTS VAE 12.5 Hz
Packaged ArchiTTS 24 kHz continuous VAE for convenient audio encode/decode.
- Sample rate: 24 kHz
- Latent frame rate: 12.5 Hz
- Downsampling ratio: 1920 samples
- Latent dimension: 64
- Default encoder output: deterministic posterior mean
Files:
architts_vae12_5hz.py: single-file loader and CLIarchitts_vae12_5hz.pt: packaged checkpoint containing model source, config, and state_dict
Install
pip install torch torchaudio einops huggingface_hub
CLI
python architts_vae12_5hz.py info
python architts_vae12_5hz.py encode input.wav latent.pt
python architts_vae12_5hz.py decode latent.pt recon.wav
python architts_vae12_5hz.py reconstruct input.wav recon.wav
Python
from architts_vae12_5hz import ArchiTTSVAE12Hz
vae = ArchiTTSVAE12Hz(device="cuda")
audio = vae.load_audio("input.wav") # (1, 1, samples), 24 kHz
latent = vae.encode(audio) # (1, T, 64)
recon = vae.decode(latent) # (1, 1, samples)
vae.save_audio(recon, "recon.wav")