Instructions to use shiningstar1128/seal-trainer-v02 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use shiningstar1128/seal-trainer-v02 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="shiningstar1128/seal-trainer-v02")# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("shiningstar1128/seal-trainer-v02", dtype="auto") - Notebooks
- Google Colab
- Kaggle
fxstar1128/talentdev_01
Custom voice LoRA fine-tuned Qwen3-TTS model trained by fxstar1128 on specialized voice data. This checkpoint represents a targeted adaptation of the Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign base model using Low-Rank Adaptation (LoRA) techniques for enhanced voice characteristics.
Model Overview
This model was fine-tuned with:
- LoRA rank: 32
- LoRA alpha: 64
- Training strategy: Single-epoch targeted voice optimization
- Base architecture: Qwen3-TTS with merged LoRA weights
- Output format: 24kHz mono WAV
The fine-tuning focused on capturing specific voice characteristics while maintaining the naturalness and expressiveness of the base Qwen3-TTS architecture.
Quick Start
Installation
pip install qwen-tts transformers torch soundfile
Basic Usage
from qwen_tts import Qwen3TTSModel
import soundfile as sf
# Load the fine-tuned model
model = Qwen3TTSModel.from_pretrained("fxstar1128/talentdev_01")
# Generate speech
audio, sample_rate = model.generate_voice_design(
text="Hello, this is a demonstration of the fine-tuned voice model.",
instruct="A natural speaking voice, clear and expressive.",
language="english",
)
# Save output
sf.write("output.wav", audio[0], sample_rate)
Training Details
This model was trained using the following configuration:
- Optimizer: AdamW (lr=2.5e-9)
- Batch size: 2
- Gradient accumulation: 4 steps
- Max gradient norm: 1.0
- Trainable parameters: ~46.8M (2.37% of total)
- Dataset: Custom voice dataset with Qwen audio codes
- Speaker embedding: Custom projection layer (1024 → 2048)
The LoRA adapters were merged back into the base weights, so this model runs at full inference speed with no PEFT overhead.
Prompt Engineering
The model inherits the prompt-following capabilities of the base Qwen3-TTS model. Effective prompts typically include:
Voice characteristics:
- Gender and age indicators ("a young woman", "an older man")
- Speaking style ("conversational", "professional", "warm")
- Emotional tone ("calm", "enthusiastic", "thoughtful")
Example prompts:
A clear, natural voice speaking conversationally.
A professional speaker with measured pacing.
A warm, friendly voice with subtle expressiveness.
A calm narrator with natural intonation.
Model Architecture
- Base: Qwen3-TTS Voice Design model
- LoRA targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Modules saved: talker.model.codec_embedding, speaker_embedding_projection
- Attention: SDPA (Scaled Dot-Product Attention)
- Mixed precision: bfloat16
Performance Characteristics
Strengths:
- Natural voice quality with fine-tuned characteristics
- Fast inference (merged weights, no adapter overhead)
- Consistent voice across different prompts
- Maintains base model's expressiveness
Considerations:
- Optimized for specific voice characteristics learned during training
- Best results with prompts similar to training style
- English language focused
Files Included
model.safetensors # Merged model weights (base + LoRA)
config.json # Model configuration
tokenizer.json # Text tokenizer
speech_tokenizer/ # Audio codec components
vocence_config.yaml # Runtime configuration
chute_config.yml # Deployment configuration
miner.py # Vocence integration
demo.py # Example inference script
Deployment
This model is compatible with:
- Bittensor SN78 (Vocence) subnet miners
- Chutes TEE deployment framework
- Standard Hugging Face Transformers pipeline
- Direct qwen-tts inference
License & Attribution
License: CC BY-NC-SA 4.0 (Non-Commercial)
Base model: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
Fine-tuned by: fxstar1128
Framework: Qwen3-TTS by Alibaba
This model is intended for research and non-commercial applications only.
Citation
@misc{fxstar1128_talentdev01,
author = {fxstar1128},
title = {talentdev_01: LoRA Fine-tuned Qwen3-TTS Voice Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/fxstar1128/talentdev_01}},
}
Built with Qwen3-TTS • Trained with LoRA • Deployed on Vocence
- Downloads last month
- 4
Model tree for shiningstar1128/seal-trainer-v02
Base model
Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign