F5-TTS-RKNN2
(English README see below)
在RK3588上运行超高质量的F5-TTS文字转语音/零样本音色克隆模型!
- 推理速度(RK3588, 生成9秒音频): 每次迭代用时11s, 迭代32步, 总用时352s
- 内存占用(RK3588): 2.2GB
使用方法
克隆或者下载此仓库到本地. 模型较大, 请确保有足够的磁盘空间.
安装依赖
pip install "numpy<2" rknn-toolkit-lite2 jieba torch onnxruntime soundfile pydub pypinyin tqdm
- 运行
python F5-TTS-ONNX-Inference-rknn2.py
你可以修改F5-TTS-ONNX-Inference-rknn2.py
中的文本等参数来生成不同的音频。
模型转换
下载ONNX模型文件
安装依赖
pip install "numpy<2" rknn-toolkit2==2.3.0 onnx onnxruntime
- 转换模型
python convert_opset.py
python convert_F5_Transformer_opset19.py
已知问题
- 由于RKNN2不支持动态输入,这里把序列长度固定为了1536,并通过缩放音频速度来补偿。在差距不大的情况下效果可以接受。
- 模型中DiT中的RoPE位置编码部分有一个Transpose操作无法在NPU上运行,造成推理速度下降~15%。这个问题应该可以通过修改原模型来解决,但我懒得改了,因为改完之后推理还是会非常慢,因为序列长度实在太长了。
- 只有DiT部分使用了NPU,其他部分都是CPU推理,但其他部分运行速度快,总体上不会对推理速度有太大影响。
参考
English README
Run the ultra-high-quality F5-TTS text-to-speech / zero-shot voice cloning model on RK3588!
- Inference Speed (on RK3588, generating 9 seconds of audio): 11s per iteration, 32 iterations, total time ~352s
- Memory Usage (on RK3588): 2.2GB
Usage
Clone or download this repository locally. The models are large, ensure you have sufficient disk space.
Install dependencies:
pip install "numpy<2" rknn-toolkit-lite2 jieba torch onnxruntime soundfile pydub pypinyin tqdm
Run:
python F5-TTS-ONNX-Inference-rknn2.py
You can modify parameters such as the text within
F5-TTS-ONNX-Inference-rknn2.py
to generate different audio.
Model Conversion
Download the ONNX model files.
Install dependencies:
pip install "numpy<2" rknn-toolkit2==2.3.0 onnx onnxruntime
Convert the models:
python convert_opset.py python convert_F5_Transformer_opset19.py
Known Issues
- Due to RKNN2 limitations with dynamic inputs, the sequence length is fixed at 1536. Audio speed scaling is used to compensate. The effect is acceptable when the difference isn't significant.
- A
Transpose
operation within the RoPE (Rotary Positional Embedding) part of the DiT (Diffusion Transformer) model cannot run on the NPU, causing an approximate 15% decrease in inference speed. This could potentially be resolved by modifying the original model, but I chose not to, as inference would still be very slow due to the extremely long sequence length. - Only the DiT part utilizes the NPU; other parts run on the CPU. However, these CPU parts are fast and do not significantly impact the overall inference speed.
References
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for happyme531/F5-TTS-RKNN2
Base model
H5N1AIDS/F5-TTS-ONNX