arxiv:2501.14273

Efficient Emotion and Speaker Adaptation in LLM-Based TTS via Characteristic-Specific Partial Fine-Tuning

Published on Mar 6

Authors:

Abstract

A novel partial fine-tuning approach for speech synthesis models that selectively updates specific layers based on their contribution to emotion and speaker characteristics, achieving faster training and better preservation of pronunciation accuracy compared to traditional full fine-tuning methods.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

While LLM-based TTS models exhibit zero-shot emotion and speaker cloning, their cloning fidelity and pronunciation clarity degrade on unseen domains. Fine-tuning is essential for adaptation, yet uniform approaches overlook specific parameter contributions. Uniform tuning on limited data causes slow training and catastrophic forgetting, leading to degraded pronunciation accuracy. To address this, we propose CSP-FT, a characteristic-specific partial fine-tuning strategy. By dynamically analyzing layer contributions via a weighted sum, we selectively fine-tune only the two layers capturing the most and least emotion and speaker information, maximizing the utility of the former while explicitly strengthening the capacity of the latter. Experiments on a combined corpus of 11 datasets show CSP-FT matches or exceeds the fidelity and intelligibility of full fine-tuning while updating only ~8% of parameters, accelerating training by ~2x, and significantly mitigating catastrophic forgetting.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2501.14273

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.14273 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.14273 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.