amphion/Emilia-Dataset
Viewer β’ Updated β’ 54.8M β’ 44.6k β’ 459
This directory contains the local model files used by viitor-ai/viitor-voice-nar.
ViiTorVoice-NAR is a non-autoregressive speech generation model for voice cloning, local speech editing, and emotion / paralinguistic speech control. The files in this directory are split by function so each model component can be loaded independently.
local_models/
βββ aligner/
β βββ Qwen3-ForcedAligner-0.6B/
βββ assets/
β βββ dualcodec_silence_2s.pt
βββ dualcodec/
β βββ dualcodec_ckpts/
β βββ w2v-bert-2.0/
βββ llm/
βββ 0p6_emotion/
| Component | Path | Purpose |
|---|---|---|
| ViiTorVoice-NAR LLM | llm/0p6_emotion/ |
Generates target speech tokens from text, prompt speech tokens, edit masks, duration conditions, and emotion or non-verbal tags. |
| DualCodec | dualcodec/dualcodec_ckpts/ |
Converts waveform audio into discrete speech codebook tokens and decodes generated tokens back into waveform audio. |
| W2V-BERT 2.0 | dualcodec/w2v-bert-2.0/ |
Extracts semantic speech features used by the DualCodec encoder. |
| Qwen3 Forced Aligner | aligner/Qwen3-ForcedAligner-0.6B/ |
Aligns speech audio with text and provides timestamps for local speech editing. |
| Runtime Assets | assets/ |
Stores small auxiliary files, such as precomputed silence tokens used during generation or padding. |