rhy-TTS-v1

rhy-TTS-v1 is a PromptTTS release for Vocence (subnet repo; Bittensor SN78): natural-language voice instructions plus text in, mono speech out. Weights and runtime live in this repo for Chutes deployment.

Hub: KGSS/rhy-TTS-v1 — see VOCENCE_HF.md for revision pinning on deploy / chain commit.

What it does

Input: instruction (how to sound: gender, accent, emotion, pace, age, tone, etc.) and text (exact words to speak).
Output: mono float32 waveform and sample rate (typically 24 kHz), via the Miner class in miner.py.

Validators score script accuracy, naturalness, and trait alignment (gender, speed, emotion, age, pitch, accent, tone)—so instruction adherence matters.

Repo layout (Vocence miner)

File	Role
`miner.py`	`Miner(path_hf_repo)` → `warmup()` → `generate_wav(instruction, text)`
`chute_config.yml`	Chutes image, GPU class, pip stack
`vocence_config.yaml`	Optional limits, flash-attn toggle, default language
`model.safetensors`	Main acoustic LM weights
`speech_tokenizer/`	Tokenizer weights + configs
Tokenizer text files	`vocab.json`, `merges.txt`, `tokenizer_config.json`, etc.

Model

Family: 12 Hz acoustic tokenizer, ~1.7B parameter instruction-conditioned TTS (discrete multi-codebook LM), English-forward with multi-language support in the stack.
Runtime: qwen-tts (Qwen3TTSModel), generate_custom_voice, with built-in ensemble timbres (mapped from your instruction + language).
Base lineage: Built on the open Qwen3-TTS research line and tokenizer; this HF revision is the rhy-TTS-v1 distribution (including Vocence packaging and safetensors metadata), not an official Qwen model card.

Languages & voices (summary)

Language is inferred from text and instruction (CJK scripts and keywords), with a configurable default (see vocence_config.yaml).
Timbre is chosen from the model’s fixed speaker set (e.g. English-capable voices and Chinese/Japanese/Korean natives as appropriate). Your instruction drives style; the engine maps to the closest speaker and passes the full instruction through as instruct.

For exact speaker names and upstream API details, see the qwen-tts docs for generate_custom_voice and get_supported_speakers().

Local quick check

pip install qwen-tts torch torchaudio  # plus deps from chute_config.yml
python -c "
from pathlib import Path
from miner import Miner
m = Miner(Path('.'))
m.warmup()
wav, sr = m.generate_wav('A calm female voice with a British accent.', 'Hello from rhy-TTS-v1.')
print(wav.shape, sr)
"

License

Apache-2.0 (see repository headers). Third-party components (e.g. tokenizer and architecture lineage) remain under their respective licenses; this card describes the rhy-TTS-v1 distribution only.

Downloads last month: 46

Safetensors

Model size

2B params

Tensor type

BF16