Joycent: Diffusion-based Accent TTS without Accented Phone Prediction
Paper • 2606.16417 • Published
How to use walston/cosyvoice3-sg with CosyVoice:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
This repository contains the llm.pt checkpoint fine-tuned on the Singapore Mandarin subset, as presented in the paper Joycent: Diffusion-based Accent TTS without Accented Phone Prediction.
The remaining CosyVoice3 components are loaded from FunAudioLLM/Fun-CosyVoice3-0.5B-2512. The checkpoint is intended to replace the base model's llm.pt; it does not include flow.pt, hift.pt, tokenizer, or ONNX files.
The inference wrapper for this model is available in the Joycent project as joycent/inference_cosyvoice.py.
If you find this work useful, please cite:
@misc{wang2026joycentdiffusionbasedaccenttts,
title={Joycent: Diffusion-based Accent TTS without Accented Phone Prediction},
author={Xintong Wang and Ye Wang},
year={2026},
eprint={2606.16417},
archivePrefix={arXiv},
primaryClass={cs.SD},
}
Base model
FunAudioLLM/Fun-CosyVoice3-0.5B-2512