Qwen3-TTS — Custom Voice Clone

A fine-tuned version of Qwen3-TTS-12Hz-1.7B-Base trained on a custom speaker dataset using supervised fine-tuning (SFT).

Nigga op....

Requirements

pip install qwen-tts torch torchaudio

Usage

import torch
import soundfile as sf
from qwen_tts import Qwen3TTSModel

tts = Qwen3TTSModel.from_pretrained(
    "thunk6/qwen3-tts-custom-voice",
    device_map="cuda:0",
    dtype=torch.bfloat16,
)

wavs, sr = tts.generate_custom_voice(
    text="Hello, this is my cloned voice.",
    speaker="my_custom_voice",
)

sf.write("output.wav", wavs[0], sr)

Training Details

Setting	Value
Base model	Qwen3-TTS-12Hz-1.7B-Base
Fine-tuning method	SFT
Learning rate	2e-6
Batch size	4
Epochs	10
Hardware	A100 80GB

Downloads last month: 56

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for thunk6/qwen3-tts-custom-voice

Base model

Qwen/Qwen3-TTS-12Hz-1.7B-Base

Finetuned

(28)

this model