AIDEN β Acoustic Intonation Decoder + Expressive Neural
Single-speaker RVC v2 model trained on Jaiden (17yo male) for the Paion project.
Architecture
- Base: Applio 3.6.2, RVC v2, HiFi-GAN vocoder, ContentVec embedder
- Pretrain: TITAN Medium 32k (
blaise-tk/TITAN) - F0: rmvpe
- Sample rate: 32kHz mono
Training
- Dataset: 537 clips, ~45 min total, 15 emotional categories
- Hardware: RTX PRO 6000 Blackwell (96GB)
- Hyperparameters: batch 16, 300 epochs, save every 25
- Wall time: ~30 min
- Best loss: g_loss 24.6 at epoch 50
Files
aiden_300e_19200s.pthβ final epoch 300 checkpointaiden_300e_19200s_best_epoch.pthβ best loss checkpointaiden.indexβ FAISS retrieval index for ContentVec featuresconfig.jsonβ training configmodel_info.jsonβ dataset metadata
Inference parameters (calibrated)
- index_rate: 0.4-0.7 (higher = more Jaiden timbre, less source emotion)
- protect: 0.33
- volume_envelope (rms_mix_rate): 0.25
- f0_method: rmvpe
Status
Trained 2026-05-06. Project subsequently pivoted to Kokoro/Chatterbox-only (without voice transfer step) β voice cloning to Jaiden's specific timbre wasn't a hard requirement. AIDEN preserved here as a reference / future option.
- Downloads last month
- 6