AIDEN β€” Acoustic Intonation Decoder + Expressive Neural

Single-speaker RVC v2 model trained on Jaiden (17yo male) for the Paion project.

Architecture

  • Base: Applio 3.6.2, RVC v2, HiFi-GAN vocoder, ContentVec embedder
  • Pretrain: TITAN Medium 32k (blaise-tk/TITAN)
  • F0: rmvpe
  • Sample rate: 32kHz mono

Training

  • Dataset: 537 clips, ~45 min total, 15 emotional categories
  • Hardware: RTX PRO 6000 Blackwell (96GB)
  • Hyperparameters: batch 16, 300 epochs, save every 25
  • Wall time: ~30 min
  • Best loss: g_loss 24.6 at epoch 50

Files

  • aiden_300e_19200s.pth β€” final epoch 300 checkpoint
  • aiden_300e_19200s_best_epoch.pth β€” best loss checkpoint
  • aiden.index β€” FAISS retrieval index for ContentVec features
  • config.json β€” training config
  • model_info.json β€” dataset metadata

Inference parameters (calibrated)

  • index_rate: 0.4-0.7 (higher = more Jaiden timbre, less source emotion)
  • protect: 0.33
  • volume_envelope (rms_mix_rate): 0.25
  • f0_method: rmvpe

Status

Trained 2026-05-06. Project subsequently pivoted to Kokoro/Chatterbox-only (without voice transfer step) β€” voice cloning to Jaiden's specific timbre wasn't a hard requirement. AIDEN preserved here as a reference / future option.

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support