Text-to-Speech
ONNX
Safetensors
Text-to-Speech
TTS
speech-edit

ViiTorVoice-NAR Local Models

GitHub Hugging Face Demo

This directory contains the local model files used by viitor-ai/viitor-voice-nar.

ViiTorVoice-NAR is a non-autoregressive speech generation model for voice cloning, local speech editing, and emotion / paralinguistic speech control. The files in this directory are split by function so each model component can be loaded independently.

Directory

local_models/
β”œβ”€β”€ aligner/
β”‚   └── Qwen3-ForcedAligner-0.6B/
β”œβ”€β”€ assets/
β”‚   └── dualcodec_silence_2s.pt
β”œβ”€β”€ dualcodec/
β”‚   β”œβ”€β”€ dualcodec_ckpts/
β”‚   └── w2v-bert-2.0/
└── llm/
    └── 0p6_emotion/

Model Components

Component Path Purpose
ViiTorVoice-NAR LLM llm/0p6_emotion/ Generates target speech tokens from text, prompt speech tokens, edit masks, duration conditions, and emotion or non-verbal tags.
DualCodec dualcodec/dualcodec_ckpts/ Converts waveform audio into discrete speech codebook tokens and decodes generated tokens back into waveform audio.
W2V-BERT 2.0 dualcodec/w2v-bert-2.0/ Extracts semantic speech features used by the DualCodec encoder.
Qwen3 Forced Aligner aligner/Qwen3-ForcedAligner-0.6B/ Aligns speech audio with text and provides timestamps for local speech editing.
Runtime Assets assets/ Stores small auxiliary files, such as precomputed silence tokens used during generation or padding.

Main Uses

  • Voice cloning: synthesize new speech from target text while preserving the speaker characteristics of prompt audio.
  • Local speech editing: replace only the changed region of an utterance while keeping the rest of the audio stable.
  • Emotion and paralinguistic control: condition generation with tags such as emotion labels or non-verbal vocal events.

Notes

  • Keep the directory structure unchanged unless the loading code is updated as well.
  • Model weights are large binary files and are usually stored outside normal git tracking.
  • Check the upstream project and each submodel for license and usage terms.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ZzWater/ViiTorVoice-NAR

Finetuned
Qwen/Qwen3-0.6B
Quantized
(324)
this model

Datasets used to train ZzWater/ViiTorVoice-NAR

Space using ZzWater/ViiTorVoice-NAR 1