rhy-TTS-v1
rhy-TTS-v1 is a PromptTTS release for Vocence (subnet repo; Bittensor SN78): natural-language voice instructions plus text in, mono speech out. Weights and runtime live in this repo for Chutes deployment.
Hub: KGSS/rhy-TTS-v1 — see VOCENCE_HF.md for revision pinning on deploy / chain commit.
What it does
- Input:
instruction(how to sound: gender, accent, emotion, pace, age, tone, etc.) andtext(exact words to speak). - Output: mono float32 waveform and sample rate (typically 24 kHz), via the
Minerclass inminer.py.
Validators score script accuracy, naturalness, and trait alignment (gender, speed, emotion, age, pitch, accent, tone)—so instruction adherence matters.
Repo layout (Vocence miner)
| File | Role |
|---|---|
miner.py |
Miner(path_hf_repo) → warmup() → generate_wav(instruction, text) |
chute_config.yml |
Chutes image, GPU class, pip stack |
vocence_config.yaml |
Optional limits, flash-attn toggle, default language |
model.safetensors |
Main acoustic LM weights |
speech_tokenizer/ |
Tokenizer weights + configs |
| Tokenizer text files | vocab.json, merges.txt, tokenizer_config.json, etc. |
Model
- Family: 12 Hz acoustic tokenizer, ~1.7B parameter instruction-conditioned TTS (discrete multi-codebook LM), English-forward with multi-language support in the stack.
- Runtime:
qwen-tts(Qwen3TTSModel),generate_custom_voice, with built-in ensemble timbres (mapped from your instruction + language). - Base lineage: Built on the open Qwen3-TTS research line and tokenizer; this HF revision is the rhy-TTS-v1 distribution (including Vocence packaging and safetensors metadata), not an official Qwen model card.
Languages & voices (summary)
- Language is inferred from text and instruction (CJK scripts and keywords), with a configurable default (see
vocence_config.yaml). - Timbre is chosen from the model’s fixed speaker set (e.g. English-capable voices and Chinese/Japanese/Korean natives as appropriate). Your instruction drives style; the engine maps to the closest speaker and passes the full instruction through as
instruct.
For exact speaker names and upstream API details, see the qwen-tts docs for generate_custom_voice and get_supported_speakers().
Local quick check
pip install qwen-tts torch torchaudio # plus deps from chute_config.yml
python -c "
from pathlib import Path
from miner import Miner
m = Miner(Path('.'))
m.warmup()
wav, sr = m.generate_wav('A calm female voice with a British accent.', 'Hello from rhy-TTS-v1.')
print(wav.shape, sr)
"
License
Apache-2.0 (see repository headers). Third-party components (e.g. tokenizer and architecture lineage) remain under their respective licenses; this card describes the rhy-TTS-v1 distribution only.
- Downloads last month
- 46