AIDEN — Acoustic Intonation Decoder + Expressive Neural

Single-speaker RVC v2 model trained on Jaiden (17yo male) for the Paion project.

Architecture

Base: Applio 3.6.2, RVC v2, HiFi-GAN vocoder, ContentVec embedder
Pretrain: TITAN Medium 32k (blaise-tk/TITAN)
F0: rmvpe
Sample rate: 32kHz mono

Training

Dataset: 537 clips, ~45 min total, 15 emotional categories
Hardware: RTX PRO 6000 Blackwell (96GB)
Hyperparameters: batch 16, 300 epochs, save every 25
Wall time: ~30 min
Best loss: g_loss 24.6 at epoch 50

Files

aiden_300e_19200s.pth — final epoch 300 checkpoint
aiden_300e_19200s_best_epoch.pth — best loss checkpoint
aiden.index — FAISS retrieval index for ContentVec features
config.json — training config
model_info.json — dataset metadata

Inference parameters (calibrated)

index_rate: 0.4-0.7 (higher = more Jaiden timbre, less source emotion)
protect: 0.33
volume_envelope (rms_mix_rate): 0.25
f0_method: rmvpe

Status

Trained 2026-05-06. Project subsequently pivoted to Kokoro/Chatterbox-only (without voice transfer step) — voice cloning to Jaiden's specific timbre wasn't a hard requirement. AIDEN preserved here as a reference / future option.

Downloads last month: 6