Qwen3-TTS โ€” Custom Voice Clone

A fine-tuned version of Qwen3-TTS-12Hz-1.7B-Base trained on a custom speaker dataset using supervised fine-tuning (SFT).

Nigga op....

Requirements

pip install qwen-tts torch torchaudio

Usage

import torch
import soundfile as sf
from qwen_tts import Qwen3TTSModel

tts = Qwen3TTSModel.from_pretrained(
    "thunk6/qwen3-tts-custom-voice",
    device_map="cuda:0",
    dtype=torch.bfloat16,
)

wavs, sr = tts.generate_custom_voice(
    text="Hello, this is my cloned voice.",
    speaker="my_custom_voice",
)

sf.write("output.wav", wavs[0], sr)

Training Details

Setting Value
Base model Qwen3-TTS-12Hz-1.7B-Base
Fine-tuning method SFT
Learning rate 2e-6
Batch size 4
Epochs 10
Hardware A100 80GB
Downloads last month
56
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for thunk6/qwen3-tts-custom-voice

Finetuned
(28)
this model