ViiTorVoice-NAR Local Models

This directory contains the local model files used by viitor-ai/viitor-voice-nar.

ViiTorVoice-NAR is a non-autoregressive speech generation model for voice cloning, local speech editing, and emotion / paralinguistic speech control. The files in this directory are split by function so each model component can be loaded independently.

Model Components

Component	Path	Purpose
ViiTorVoice-NAR LLM	`llm/0p6_emotion/`	Generates target speech tokens from text, prompt speech tokens, edit masks, duration conditions, and emotion or non-verbal tags.
DualCodec	`dualcodec/dualcodec_ckpts/`	Converts waveform audio into discrete speech codebook tokens and decodes generated tokens back into waveform audio.
W2V-BERT 2.0	`dualcodec/w2v-bert-2.0/`	Extracts semantic speech features used by the DualCodec encoder.
Qwen3 Forced Aligner	`aligner/Qwen3-ForcedAligner-0.6B/`	Aligns speech audio with text and provides timestamps for local speech editing.
Runtime Assets	`assets/`	Stores small auxiliary files, such as precomputed silence tokens used during generation or padding.

Main Uses

Voice cloning: synthesize new speech from target text while preserving the speaker characteristics of prompt audio.
Local speech editing: replace only the changed region of an utterance while keeping the rest of the audio stable.
Emotion and paralinguistic control: condition generation with tags such as emotion labels or non-verbal vocal events.

Notes

Keep the directory structure unchanged unless the loading code is updated as well.
Model weights are large binary files and are usually stored outside normal git tracking.
Check the upstream project and each submodel for license and usage terms.

Downloads last month: -

Model tree for ZzWater/ViiTorVoice-NAR

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Quantized

(324)

this model

ZzWater
/

ViiTorVoice-NAR

ViiTorVoice-NAR Local Models

Directory

Model Components

Main Uses

Notes

Model tree for ZzWater/ViiTorVoice-NAR

Datasets used to train ZzWater/ViiTorVoice-NAR

Space using ZzWater/ViiTorVoice-NAR 1