fxstar1128/talentdev_01

Custom voice LoRA fine-tuned Qwen3-TTS model trained by fxstar1128 on specialized voice data. This checkpoint represents a targeted adaptation of the Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign base model using Low-Rank Adaptation (LoRA) techniques for enhanced voice characteristics.

Model Overview

This model was fine-tuned with:

LoRA rank: 32
LoRA alpha: 64
Training strategy: Single-epoch targeted voice optimization
Base architecture: Qwen3-TTS with merged LoRA weights
Output format: 24kHz mono WAV

The fine-tuning focused on capturing specific voice characteristics while maintaining the naturalness and expressiveness of the base Qwen3-TTS architecture.

Quick Start

Installation

pip install qwen-tts transformers torch soundfile

Basic Usage

from qwen_tts import Qwen3TTSModel
import soundfile as sf

# Load the fine-tuned model
model = Qwen3TTSModel.from_pretrained("fxstar1128/talentdev_01")

# Generate speech
audio, sample_rate = model.generate_voice_design(
    text="Hello, this is a demonstration of the fine-tuned voice model.",
    instruct="A natural speaking voice, clear and expressive.",
    language="english",
)

# Save output
sf.write("output.wav", audio[0], sample_rate)

Training Details

This model was trained using the following configuration:

Optimizer: AdamW (lr=2.5e-9)
Batch size: 2
Gradient accumulation: 4 steps
Max gradient norm: 1.0
Trainable parameters: ~46.8M (2.37% of total)
Dataset: Custom voice dataset with Qwen audio codes
Speaker embedding: Custom projection layer (1024 → 2048)

The LoRA adapters were merged back into the base weights, so this model runs at full inference speed with no PEFT overhead.

Prompt Engineering

The model inherits the prompt-following capabilities of the base Qwen3-TTS model. Effective prompts typically include:

Voice characteristics:

Gender and age indicators ("a young woman", "an older man")
Speaking style ("conversational", "professional", "warm")
Emotional tone ("calm", "enthusiastic", "thoughtful")

Example prompts:

A clear, natural voice speaking conversationally.
A professional speaker with measured pacing.
A warm, friendly voice with subtle expressiveness.
A calm narrator with natural intonation.

Model Architecture

Base: Qwen3-TTS Voice Design model
LoRA targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Modules saved: talker.model.codec_embedding, speaker_embedding_projection
Attention: SDPA (Scaled Dot-Product Attention)
Mixed precision: bfloat16

Performance Characteristics

Strengths:

Natural voice quality with fine-tuned characteristics
Fast inference (merged weights, no adapter overhead)
Consistent voice across different prompts
Maintains base model's expressiveness

Considerations:

Optimized for specific voice characteristics learned during training
Best results with prompts similar to training style
English language focused

Files Included

model.safetensors            # Merged model weights (base + LoRA)
config.json                  # Model configuration
tokenizer.json               # Text tokenizer
speech_tokenizer/            # Audio codec components
vocence_config.yaml          # Runtime configuration
chute_config.yml             # Deployment configuration
miner.py                     # Vocence integration
demo.py                      # Example inference script

Deployment

This model is compatible with:

Bittensor SN78 (Vocence) subnet miners
Chutes TEE deployment framework
Standard Hugging Face Transformers pipeline
Direct qwen-tts inference

License & Attribution

License: CC BY-NC-SA 4.0 (Non-Commercial)

Base model: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign Fine-tuned by: fxstar1128
Framework: Qwen3-TTS by Alibaba

This model is intended for research and non-commercial applications only.

Citation

@misc{fxstar1128_talentdev01,
  author = {fxstar1128},
  title = {talentdev_01: LoRA Fine-tuned Qwen3-TTS Voice Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/fxstar1128/talentdev_01}},
}

Built with Qwen3-TTS • Trained with LoRA • Deployed on Vocence

Downloads last month: 4

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for shiningstar1128/seal-trainer-v02

Base model

Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign

Finetuned

(34)

this model