fcyai commited on
Commit
003d053
1 Parent(s): 4dd22f9
ApiTest.html ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="zh-CN">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>TTS 测试</title>
7
+ <style>
8
+ body {
9
+ font-family: Arial, sans-serif;
10
+ margin: 20px;
11
+ max-width: 600px;
12
+ margin: auto;
13
+ }
14
+ label, button {
15
+ display: block;
16
+ margin-top: 10px;
17
+ }
18
+ textarea, select, input[type="number"], input[type="text"] {
19
+ width: calc(100% - 10px);
20
+ margin-top: 5px;
21
+ padding: 5px;
22
+ font-size: 16px;
23
+ }
24
+ button {
25
+ width: 100%;
26
+ padding: 10px;
27
+ background-color: #4CAF50;
28
+ color: white;
29
+ border: none;
30
+ font-size: 16px;
31
+ cursor: pointer;
32
+ }
33
+ button:hover {
34
+ background-color: #45a049;
35
+ }
36
+ h2, pre {
37
+ margin-top: 20px;
38
+ }
39
+ audio {
40
+ width: 100%;
41
+ margin-top: 10px;
42
+ }
43
+ </style>
44
+ </head>
45
+ <body>
46
+ <h1>ChatTTS API 测试</h1>
47
+ <label for="api_url">接口 URL:</label>
48
+ <input type="text" id="api_url" value="http://localhost:9880/">
49
+
50
+ <label for="text">测试文本:</label>
51
+ <textarea id="text" rows="10">我是一个充满活力的人,喜欢运动,喜欢旅行,喜欢尝试新鲜事物。我喜欢挑战自己,不断突破自己的极限,让自己变得更加强大。我是一个充满活力的人,喜欢运动,喜欢旅行,喜欢尝试新鲜事物。我喜欢挑战自己,不断突破自己的极限,让自己变得更加强大。</textarea>
52
+
53
+ <label for="media_type">媒体类型:</label>
54
+ <select id="media_type">
55
+ <option value="wav">wav</option>
56
+ <option value="mp3">mp3</option>
57
+ <option value="flac">flac</option>
58
+ </select>
59
+
60
+ <label for="seed">种子:</label>
61
+ <input type="number" id="seed" value="2581">
62
+
63
+ <label for="streaming">流式输出:</label>
64
+ <input type="checkbox" id="streaming" checked>
65
+
66
+ <button onclick="sendRequest()">发送请求</button>
67
+
68
+ <h2>输出</h2>
69
+ <pre id="output"></pre>
70
+ <audio id="audio" controls></audio>
71
+
72
+ <script>
73
+ async function sendRequest() {
74
+ const apiUrl = document.getElementById('api_url').value;
75
+ const text = document.getElementById('text').value;
76
+ const media_type = document.getElementById('media_type').value;
77
+ const seed = document.getElementById('seed').value;
78
+ const streaming = document.getElementById('streaming').checked ? 1 : 0;
79
+
80
+ const output = document.getElementById('output');
81
+ output.textContent = "请求中...\n";
82
+
83
+ const audioElement = document.getElementById('audio');
84
+
85
+ try {
86
+ const url = `${apiUrl}?text=${encodeURIComponent(text)}&media_type=${media_type}&seed=${seed}&streaming=${streaming}`;
87
+ output.textContent += `请求 URL: ${url}\n`;
88
+
89
+ audioElement.src = url;
90
+ audioElement.play();
91
+
92
+ output.textContent += "音频即将播放...\n";
93
+ } catch (error) {
94
+ output.textContent += `请求错误: ${error}\n`;
95
+ }
96
+ }
97
+ </script>
98
+ </body>
99
+ </html>
README.md CHANGED
@@ -1,13 +1,121 @@
1
- ---
2
- title: ChatTTS Story Telling
3
- emoji: 🏆
4
- colorFrom: blue
5
- colorTo: green
6
- sdk: gradio
7
- sdk_version: 4.36.1
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ChatTTS_colab
2
+
3
+ 🚀 一键部署(含win离线整合包)!基于 [ChatTTS](https://github.com/2noise/ChatTTS) ,支持音色抽卡、长音频生成和分角色朗读。简单易用,无需复杂安装。。
4
+
5
+ **🏆 2000条说话人音色库已开源 🏆** 项目地址: [ChatTTS_Speaker](https://github.com/6drf21e/ChatTTS_Speaker)
6
+
7
+ > 支持按男女、年龄、特征查找稳定音色。
8
+
9
+ # 下载地址
10
+
11
+ | 版本 | 地址 | 介绍 |
12
+ |----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------|
13
+ | 在线Colab版 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/6drf21e/ChatTTS_colab/blob/main/chattts_webui_mix.ipynb) | 可以在 Google Colab 上一键运行,需要 Google账号,Colab 自带15GB的GPU |
14
+ | 离线整合版 | [百度网盘](https://pan.baidu.com/s/1-hGiPLs6ORM8sZv0xTdxFA?pwd=h3c5) 提取码: h3c5 | 下载本地运行,支持 GPU/CPU,适用 Windows 10 及以上 |
15
+ | 离线整合版 | [夸克网盘](https://pan.quark.cn/s/c963e147f204) | 下载本地运行,支持 GPU/CPU,适用 Windows 10 及以上 |
16
+
17
+ # 演示视频
18
+
19
+ [![演示视频](https://img.youtube.com/vi/199fyU7NfUQ/0.jpg)](https://www.youtube.com/watch?v=199fyU7NfUQ)
20
+
21
+ 欢迎关注 [氪学家频道](https://www.youtube.com/@kexue) ,获取更多有趣的科技视频。
22
+
23
+ ## 特点
24
+
25
+ - **Colab 一键运行**:无需复杂的环境配置,只需点击上方的 Colab 按钮,即可在浏览器中直接运行项目。
26
+ - **音色抽卡功能**:批量生成多个音色,并可保存自己喜欢的音色。
27
+ - **支持生成长音频**:适合生成较长的语音内容。
28
+ - **字符处理**:对数字和朗读错误的标点做了初步处理。
29
+ - **分角色朗读功能** :支持对不同角色的文本进行分角色朗读,并支持大模型一键生产脚本。
30
+
31
+ ## 功能展示
32
+
33
+ ### 分角色朗读功能
34
+
35
+ ![分角色朗读功能](assets/shot3.png)
36
+
37
+ ### 音色抽卡功能
38
+
39
+ ![音色抽卡功能](assets/shot1.png)
40
+
41
+ ### 支持生成长音频
42
+
43
+ ![生成长音频](assets/shot2.png)
44
+
45
+ ## 快速开始
46
+
47
+ ### 在 Colab 运行
48
+
49
+ 1. 点击最上方的 "Open In Colab" 按钮,打开 Colab 笔记本。
50
+ 2. 点击菜单栏的–代码执行程序–全部运行即可
51
+ 3. 执行后在下方的日志中找到类似
52
+ Running on public URL: https://**********.gradio.live
53
+ 4. https://**********.gradio.live 就是可以访问的公网地址
54
+
55
+ ### 在 macOS 上运行
56
+
57
+ 1. 安装 [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/macos.html)(如果尚未安装)。
58
+ 2. 打开终端,创建一个新的 conda 环境:
59
+ ```bash
60
+ conda create -n "ChatTTS_colab" python=3.11
61
+ ```
62
+ 3. 激活刚创建的环境:
63
+ ```bash
64
+ conda activate ChatTTS_colab
65
+ ```
66
+ 3. 克隆本项目仓库到本地:
67
+ ```bash
68
+ git clone git@github.com:6drf21e/ChatTTS_colab.git
69
+ ```
70
+ 4. 手动安装 ChatTTS 依赖到项目目录:
71
+ ```bash
72
+ cd ChatTTS_colab
73
+ git clone https://github.com/2noise/ChatTTS
74
+ cd ChatTTS
75
+ git checkout -q f4c8329
76
+ cd ..
77
+ mv ChatTTS temp
78
+ mv temp/ChatTTS ./ChatTTS
79
+ rm -rf temp
80
+ ```
81
+ 5. 在项目目录安装 ChatTTS_colab 所需的依赖:
82
+ ```bash
83
+ pip install -r requirements-macos.txt
84
+ ```
85
+ 6. 运行项目,等待自动下载模型:
86
+ ```bash
87
+ python webui_mix.py
88
+ # Loading ChatTTS model...
89
+ ```
90
+ 一切正常的话会自动打开浏览器。
91
+
92
+ ## 常见问题:
93
+
94
+ 1. 第一次运行项目,ChatTTS 会自动从 huggingface 下载模型,如果因为网络问题下载失败,那么 ChatTTS 是无法自行重新下载的,需要清除缓存后重新触发下载。
95
+ 错误信息示例:
96
+ ```log
97
+ FileNotFoundError: [Errno 2] No such file or directory: '~/.cache/huggingface/hub/models--2Noise--ChatTTS/snapshots/d7474137acb4f988874e5d57ad88d81bcb7e10b6/asset/Vocos.pt'
98
+ ```
99
+ 清除缓存的方法:
100
+ ```bash
101
+ rm -rf ~/.cache/huggingface/hub/models--2Noise--ChatTTS
102
+ ```
103
+ 清除缓存后,再次执行 `python webui_mix.py`,就会重新下载模型。
104
+
105
+ 如果多次下载都无法成功,可以手动将**离线包**里的 models 拷贝到项目目录,从本地加载模型
106
+ ```bash
107
+ python webui_mix.py --source local --local_path models
108
+ ```
109
+ 2. 如果下载模型速度慢,建议使用赛博活菩萨 [@padeoe](https://github.com/padeoe) 的镜像加速 https://hf-mirror.com/
110
+ ```bash
111
+ export HF_ENDPOINT=https://hf-mirror.com
112
+ ```
113
+
114
+ ## 贡献
115
+
116
+ 欢迎对本项目提出建议或贡献代码。请通过 GitHub Issues 提出问题,或提交 Pull Request。
117
+
118
+ ## 许可证
119
+
120
+ 本项目使用 MIT 许可证。
121
+
abc/.gitignore ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Byte-compiled / optimized / DLL files
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.ckpt
6
+ # C extensions
7
+ *.so
8
+ *.pt
9
+
10
+ # Distribution / packaging
11
+ .Python
12
+ outputs/
13
+ build/
14
+ develop-eggs/
15
+ dist/
16
+ downloads/
17
+ eggs/
18
+ .eggs/
19
+ lib/
20
+ lib64/
21
+ parts/
22
+ sdist/
23
+ var/
24
+ wheels/
25
+ share/python-wheels/
26
+ *.egg-info/
27
+ asset/*
28
+ .installed.cfg
29
+ *.egg
30
+ MANIFEST
31
+
32
+ # PyInstaller
33
+ # Usually these files are written by a python script from a template
34
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
35
+ *.manifest
36
+ *.spec
37
+
38
+ # Installer logs
39
+ pip-log.txt
40
+ pip-delete-this-directory.txt
41
+
42
+ # Unit test / coverage reports
43
+ htmlcov/
44
+ .tox/
45
+ .nox/
46
+ .coverage
47
+ .coverage.*
48
+ .cache
49
+ nosetests.xml
50
+ coverage.xml
51
+ *.cover
52
+ *.py,cover
53
+ .hypothesis/
54
+ .pytest_cache/
55
+ cover/
56
+
57
+ # Translations
58
+ *.mo
59
+ *.pot
60
+
61
+ # Django stuff:
62
+ *.log
63
+ local_settings.py
64
+ db.sqlite3
65
+ db.sqlite3-journal
66
+
67
+ # Flask stuff:
68
+ instance/
69
+ .webassets-cache
70
+
71
+ # Scrapy stuff:
72
+ .scrapy
73
+
74
+ # Sphinx documentation
75
+ docs/_build/
76
+
77
+ # PyBuilder
78
+ .pybuilder/
79
+ target/
80
+
81
+ # Jupyter Notebook
82
+ .ipynb_checkpoints
83
+
84
+ # IPython
85
+ profile_default/
86
+ ipython_config.py
87
+
88
+ # pyenv
89
+ # For a library or package, you might want to ignore these files since the code is
90
+ # intended to run in multiple environments; otherwise, check them in:
91
+ # .python-version
92
+
93
+ # pipenv
94
+ # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
95
+ # However, in case of collaboration, if having platform-specific dependencies or dependencies
96
+ # having no cross-platform support, pipenv may install dependencies that don't work, or not
97
+ # install all needed dependencies.
98
+ #Pipfile.lock
99
+
100
+ # poetry
101
+ # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
102
+ # This is especially recommended for binary packages to ensure reproducibility, and is more
103
+ # commonly ignored for libraries.
104
+ # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
105
+ #poetry.lock
106
+
107
+ # pdm
108
+ # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
109
+ #pdm.lock
110
+ # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
111
+ # in version control.
112
+ # https://pdm.fming.dev/#use-with-ide
113
+ .pdm.toml
114
+
115
+ # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
116
+ __pypackages__/
117
+
118
+ # Celery stuff
119
+ celerybeat-schedule
120
+ celerybeat.pid
121
+
122
+ # SageMath parsed files
123
+ *.sage.py
124
+
125
+ # Environments
126
+ .env
127
+ .venv
128
+ env/
129
+ venv/
130
+ ENV/
131
+ env.bak/
132
+ venv.bak/
133
+
134
+ # Spyder project settings
135
+ .spyderproject
136
+ .spyproject
137
+
138
+ # Rope project settings
139
+ .ropeproject
140
+
141
+ # mkdocs documentation
142
+ /site
143
+
144
+ # mypy
145
+ .mypy_cache/
146
+ .dmypy.json
147
+ dmypy.json
148
+
149
+ # Pyre type checker
150
+ .pyre/
151
+
152
+ # pytype static type analyzer
153
+ .pytype/
154
+
155
+ # Cython debug symbols
156
+ cython_debug/
157
+
158
+ # PyCharm
159
+ # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
160
+ # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
161
+ # and can be added to the global gitignore or merged into this file. For a more nuclear
162
+ # option (not recommended) you can uncomment the following to ignore the entire idea folder.
163
+ #.idea/
abc/LICENSE ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Attribution-NonCommercial-NoDerivatives 4.0 International
2
+
3
+ > *Creative Commons Corporation (“Creative Commons”) is not a law firm and does not provide legal services or legal advice. Distribution of Creative Commons public licenses does not create a lawyer-client or other relationship. Creative Commons makes its licenses and related information available on an “as-is” basis. Creative Commons gives no warranties regarding its licenses, any material licensed under their terms and conditions, or any related information. Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible.*
4
+ >
5
+ > ### Using Creative Commons Public Licenses
6
+ >
7
+ > Creative Commons public licenses provide a standard set of terms and conditions that creators and other rights holders may use to share original works of authorship and other material subject to copyright and certain other rights specified in the public license below. The following considerations are for informational purposes only, are not exhaustive, and do not form part of our licenses.
8
+ >
9
+ > * __Considerations for licensors:__ Our public licenses are intended for use by those authorized to give the public permission to use material in ways otherwise restricted by copyright and certain other rights. Our licenses are irrevocable. Licensors should read and understand the terms and conditions of the license they choose before applying it. Licensors should also secure all rights necessary before applying our licenses so that the public can reuse the material as expected. Licensors should clearly mark any material not subject to the license. This includes other CC-licensed material, or material used under an exception or limitation to copyright. [More considerations for licensors](http://wiki.creativecommons.org/Considerations_for_licensors_and_licensees#Considerations_for_licensors).
10
+ >
11
+ > * __Considerations for the public:__ By using one of our public licenses, a licensor grants the public permission to use the licensed material under specified terms and conditions. If the licensor’s permission is not necessary for any reason–for example, because of any applicable exception or limitation to copyright–then that use is not regulated by the license. Our licenses grant only permissions under copyright and certain other rights that a licensor has authority to grant. Use of the licensed material may still be restricted for other reasons, including because others have copyright or other rights in the material. A licensor may make special requests, such as asking that all changes be marked or described. Although not required by our licenses, you are encouraged to respect those requests where reasonable. [More considerations for the public](http://wiki.creativecommons.org/Considerations_for_licensors_and_licensees#Considerations_for_licensees).
12
+
13
+ ## Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License
14
+
15
+ By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions.
16
+
17
+ ### Section 1 – Definitions.
18
+
19
+ a. __Adapted Material__ means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image.
20
+
21
+ b. __Copyright and Similar Rights__ means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights.
22
+
23
+ e. __Effective Technological Measures__ means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements.
24
+
25
+ f. __Exceptions and Limitations__ means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material.
26
+
27
+ h. __Licensed Material__ means the artistic or literary work, database, or other material to which the Licensor applied this Public License.
28
+
29
+ i. __Licensed Rights__ means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license.
30
+
31
+ h. __Licensor__ means the individual(s) or entity(ies) granting rights under this Public License.
32
+
33
+ i. __NonCommercial__ means not primarily intended for or directed towards commercial advantage or monetary compensation. For purposes of this Public License, the exchange of the Licensed Material for other material subject to Copyright and Similar Rights by digital file-sharing or similar means is NonCommercial provided there is no payment of monetary compensation in connection with the exchange.
34
+
35
+ j. __Share__ means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them.
36
+
37
+ k. __Sui Generis Database Rights__ means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world.
38
+
39
+ l. __You__ means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning.
40
+
41
+ ### Section 2 – Scope.
42
+
43
+ a. ___License grant.___
44
+
45
+ 1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to:
46
+
47
+ A. reproduce and Share the Licensed Material, in whole or in part, for NonCommercial purposes only; and
48
+
49
+ B. produce and reproduce, but not Share, Adapted Material for NonCommercial purposes only.
50
+
51
+ 2. __Exceptions and Limitations.__ For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions.
52
+
53
+ 3. __Term.__ The term of this Public License is specified in Section 6(a).
54
+
55
+ 4. __Media and formats; technical modifications allowed.__ The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material.
56
+
57
+ 5. __Downstream recipients.__
58
+
59
+ A. __Offer from the Licensor – Licensed Material.__ Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License.
60
+
61
+ B. __No downstream restrictions.__ You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.
62
+
63
+ 6. __No endorsement.__ Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i).
64
+
65
+ b. ___Other rights.___
66
+
67
+ 1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise.
68
+
69
+ 2. Patent and trademark rights are not licensed under this Public License.
70
+
71
+ 3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties, including when the Licensed Material is used other than for NonCommercial purposes.
72
+
73
+ ### Section 3 – License Conditions.
74
+
75
+ Your exercise of the Licensed Rights is expressly made subject to the following conditions.
76
+
77
+ a. ___Attribution.___
78
+
79
+ 1. If You Share the Licensed Material, You must:
80
+
81
+ A. retain the following if it is supplied by the Licensor with the Licensed Material:
82
+
83
+ i. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);
84
+
85
+ ii. a copyright notice;
86
+
87
+ iii. a notice that refers to this Public License;
88
+
89
+ iv. a notice that refers to the disclaimer of warranties;
90
+
91
+ v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;
92
+
93
+ B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and
94
+
95
+ C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.
96
+
97
+ For the avoidance of doubt, You do not have permission under this Public License to Share Adapted Material.
98
+
99
+ 2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.
100
+
101
+ 3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable.
102
+
103
+ ### Section 4 – Sui Generis Database Rights.
104
+
105
+ Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material:
106
+
107
+ a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database for NonCommercial purposes only and provided You do not Share Adapted Material;
108
+
109
+ b. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material; and
110
+
111
+ c. You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database.
112
+
113
+ For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights.
114
+
115
+ ### Section 5 – Disclaimer of Warranties and Limitation of Liability.
116
+
117
+ a. __Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You.__
118
+
119
+ b. __To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You.__
120
+
121
+ c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability.
122
+
123
+ ### Section 6 – Term and Termination.
124
+
125
+ a. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically.
126
+
127
+ b. Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates:
128
+
129
+ 1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or
130
+
131
+ 2. upon express reinstatement by the Licensor.
132
+
133
+ For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License.
134
+
135
+ c. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License.
136
+
137
+ d. Sections 1, 5, 6, 7, and 8 survive termination of this Public License.
138
+
139
+ ### Section 7 – Other Terms and Conditions.
140
+
141
+ a. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed.
142
+
143
+ b. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License.
144
+
145
+ ### Section 8 – Interpretation.
146
+
147
+ a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License.
148
+
149
+ b. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions.
150
+
151
+ c. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor.
152
+
153
+ d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority.
154
+
155
+ > Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the “Licensor.” Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at [creativecommons.org/policies](http://creativecommons.org/policies), Creative Commons does not authorize the use of the trademark “Creative Commons” or any other trademark or logo of Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses.
156
+ >
157
+ > Creative Commons may be contacted at [creativecommons.org](http://creativecommons.org).
abc/README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ChatTTS
2
+ [**English**](./README.md) | [**中文简体**](./README_CN.md)
3
+
4
+ ChatTTS is a text-to-speech model designed specifically for dialogue scenario such as LLM assistant. It supports both English and Chinese languages. Our model is trained with 100,000+ hours composed of chinese and english. The open-source version on HuggingFace is a 40,000 hours pre trained model without SFT.
5
+
6
+ For formal inquiries about model and roadmap, please contact us at open-source@2noise.com. You could join our QQ group: 808364215 for discussion. Adding github issues is always welcomed.
7
+
8
+ ---
9
+ ## Highlights
10
+ 1. **Conversational TTS**: ChatTTS is optimized for dialogue-based tasks, enabling natural and expressive speech synthesis. It supports multiple speakers, facilitating interactive conversations.
11
+ 2. **Fine-grained Control**: The model could predict and control fine-grained prosodic features, including laughter, pauses, and interjections.
12
+ 3. **Better Prosody**: ChatTTS surpasses most of open-source TTS models in terms of prosody. We provide pretrained models to support further research and development.
13
+
14
+ For the detailed description of the model, you can refer to [video on Bilibili](https://www.bilibili.com/video/BV1zn4y1o7iV)
15
+
16
+ ---
17
+
18
+ ## Disclaimer
19
+
20
+ This repo is for academic purposes only. It is intended for educational and research use, and should not be used for any commercial or legal purposes. The authors do not guarantee the accuracy, completeness, or reliability of the information. The information and data used in this repo, are for academic and research purposes only. The data obtained from publicly available sources, and the authors do not claim any ownership or copyright over the data.
21
+
22
+ ChatTTS is a powerful text-to-speech system. However, it is very important to utilize this technology responsibly and ethically. To limit the use of ChatTTS, we added a small amount of high-frequency noise during the training of the 40,000-hour model, and compressed the audio quality as much as possible using MP3 format, to prevent malicious actors from potentially using it for criminal purposes. At the same time, we have internally trained a detection model and plan to open-source it in the future.
23
+
24
+
25
+ ---
26
+ ## Usage
27
+
28
+ <h4>basic usage</h4>
29
+
30
+ ```python
31
+ import ChatTTS
32
+ from IPython.display import Audio
33
+
34
+ chat = ChatTTS.Chat()
35
+ chat.load_models()
36
+
37
+ texts = ["<PUT YOUR TEXT HERE>",]
38
+
39
+ wavs = chat.infer(texts, use_decoder=True)
40
+ Audio(wavs[0], rate=24_000, autoplay=True)
41
+ ```
42
+
43
+ <h4>advanced usage</h4>
44
+
45
+ ```python
46
+ ###################################
47
+ # Sample a speaker from Gaussian.
48
+ import torch
49
+ std, mean = torch.load('ChatTTS/asset/spk_stat.pt').chunk(2)
50
+ rand_spk = torch.randn(768) * std + mean
51
+
52
+ params_infer_code = {
53
+ 'spk_emb': rand_spk, # add sampled speaker
54
+ 'temperature': .3, # using custom temperature
55
+ 'top_P': 0.7, # top P decode
56
+ 'top_K': 20, # top K decode
57
+ }
58
+
59
+ ###################################
60
+ # For sentence level manual control.
61
+
62
+ # use oral_(0-9), laugh_(0-2), break_(0-7)
63
+ # to generate special token in text to synthesize.
64
+ params_refine_text = {
65
+ 'prompt': '[oral_2][laugh_0][break_6]'
66
+ }
67
+
68
+ wav = chat.infer("<PUT YOUR TEXT HERE>", params_refine_text=params_refine_text, params_infer_code=params_infer_code)
69
+
70
+ ###################################
71
+ # For word level manual control.
72
+ text = 'What is [uv_break]your favorite english food?[laugh][lbreak]'
73
+ wav = chat.infer(text, skip_refine_text=True, params_infer_code=params_infer_code)
74
+
75
+ ```
76
+
77
+ <details open>
78
+ <summary><h4>Example: self introduction</h4></summary>
79
+
80
+ ```python
81
+ inputs_en = """
82
+ chat T T S is a text to speech model designed for dialogue applications.
83
+ [uv_break]it supports mixed language input [uv_break]and offers multi speaker
84
+ capabilities with precise control over prosodic elements [laugh]like like
85
+ [uv_break]laughter[laugh], [uv_break]pauses, [uv_break]and intonation.
86
+ [uv_break]it delivers natural and expressive speech,[uv_break]so please
87
+ [uv_break] use the project responsibly at your own risk.[uv_break]
88
+ """.replace('\n', '') # English is still experimental.
89
+
90
+ params_refine_text = {
91
+ 'prompt': '[oral_2][laugh_0][break_4]'
92
+ }
93
+ audio_array_cn = chat.infer(inputs_cn, params_refine_text=params_refine_text)
94
+ audio_array_en = chat.infer(inputs_en, params_refine_text=params_refine_text)
95
+ ```
96
+ [male speaker](https://github.com/2noise/ChatTTS/assets/130631963/e0f51251-db7f-4d39-a0e9-3e095bb65de1)
97
+
98
+ [female speaker](https://github.com/2noise/ChatTTS/assets/130631963/f5dcdd01-1091-47c5-8241-c4f6aaaa8bbd)
99
+ </details>
100
+
101
+ ---
102
+ ## Roadmap
103
+ - [x] Open-source the 40k hour base model and spk_stats file
104
+ - [ ] Open-source VQ encoder and Lora training code
105
+ - [ ] Streaming audio generation without refining the text*
106
+ - [ ] Open-source the 40k hour version with multi-emotion control
107
+ - [ ] ChatTTS.cpp maybe? (PR or new repo are welcomed.)
108
+
109
+ ----
110
+ ## FAQ
111
+
112
+ ##### How much VRAM do I need? How about infer speed?
113
+ For a 30-second audio clip, at least 4GB of GPU memory is required. For the 4090D GPU, it can generate audio corresponding to approximately 7 semantic tokens per second. The Real-Time Factor (RTF) is around 0.65.
114
+
115
+ ##### model stability is not good enough, with issues such as multi speakers or poor audio quality.
116
+
117
+ This is a problem that typically occurs with autoregressive models(for bark and valle). It's generally difficult to avoid. One can try multiple samples to find a suitable result.
118
+
119
+ ##### Besides laughter, can we control anything else? Can we control other emotions?
120
+
121
+ In the current released model, the only token-level control units are [laugh], [uv_break], and [lbreak]. In future versions, we may open-source models with additional emotional control capabilities.
122
+
123
+ ---
124
+ ## Acknowledgements
125
+ - [bark](https://github.com/suno-ai/bark), [XTTSv2](https://github.com/coqui-ai/TTS) and [valle](https://arxiv.org/abs/2301.02111) demostrate a remarkable TTS result by a autoregressive-style system.
126
+ - [fish-speech](https://github.com/fishaudio/fish-speech) reveals capability of GVQ as audio tokenizer for LLM modeling.
127
+ - [vocos](https://github.com/gemelo-ai/vocos) which is used as a pretrained vocoder.
128
+
129
+ ---
130
+ ## Special Appreciation
131
+ - [wlu-audio lab](https://audio.westlake.edu.cn/) for early algorithm experiments.
abc/README_CN.md ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ChatTTS
2
+ [**English**](./README.md) | [**中文简体**](./README_CN.md)
3
+
4
+ ChatTTS是专门为对话场景设计的文本转语音模型,例如LLM助手对话任务。它支持英文和中文两种语言。最大的模型使用了10万小时以上的中英文数据进行训练。在HuggingFace中开源的版本为4万小时训练且未SFT的版本.
5
+
6
+ 如需就模型进行正式商业咨询,请发送邮件至 open-source@2noise.com。对于中文用户,您可以加入我们的QQ群:808364215 进行讨论。同时欢迎在GitHub上提出问题。如果遇到无法使用HuggingFace的情况,可以在[modelscope](https://www.modelscope.cn/models/pzc163/chatTTS)上进行下载.
7
+
8
+ ---
9
+ ## 亮点
10
+ 1. **对话式 TTS**: ChatTTS针对对话式任务进行了优化,实现了自然流畅的语音合成,同时支持多说话人。
11
+ 2. **细粒度控制**: 该模型能够预测和控制细粒度的韵律特征,包括笑声、停顿和插入词等。
12
+ 3. **更好的韵律**: ChatTTS在韵律方面超越了大部分开源TTS模型。同时提供预训练模型,支持进一步的研究。
13
+
14
+ 对于模型的具体介绍, 可以参考B站的[宣传视频](https://www.bilibili.com/video/BV1zn4y1o7iV)
15
+
16
+ ---
17
+
18
+ ## 免责声明
19
+ 本文件中的信息仅供学术交流使用。其目的是用于教育和研究,不得用于任何商业或法律目的。作者不保证信息的准确性、完整性或可靠性。本文件中使用的信息和数据,仅用于学术研究目的。这些数据来自公开可用的来源,作者不对数据的所有权或版权提出任何主张。
20
+
21
+ ChatTTS是一个强大的文本转语音系统。然而,负责任地和符合伦理地利用这项技术是非常重要的。为了限制ChatTTS的使用,我们在4w小时模型的训练过程中添加了少量额外的高频噪音,并用mp3格式尽可能压低了音质,以防不法分子用于潜在的犯罪可能。同时我们在内部训练了检测模型,并计划在未来开放。
22
+
23
+ ---
24
+ ## 用法
25
+
26
+ <h4>基本用法</h4>
27
+
28
+ ```python
29
+ import ChatTTS
30
+ from IPython.display import Audio
31
+
32
+ chat = ChatTTS.Chat()
33
+ chat.load_models()
34
+
35
+ texts = ["<PUT YOUR TEXT HERE>",]
36
+
37
+ wavs = chat.infer(texts, use_decoder=True)
38
+ Audio(wavs[0], rate=24_000, autoplay=True)
39
+ ```
40
+
41
+ <h4>进阶用法</h4>
42
+
43
+ ```python
44
+ ###################################
45
+ # Sample a speaker from Gaussian.
46
+ import torch
47
+ std, mean = torch.load('ChatTTS/asset/spk_stat.pt').chunk(2)
48
+ rand_spk = torch.randn(768) * std + mean
49
+
50
+ params_infer_code = {
51
+ 'spk_emb': rand_spk, # add sampled speaker
52
+ 'temperature': .3, # using custom temperature
53
+ 'top_P': 0.7, # top P decode
54
+ 'top_K': 20, # top K decode
55
+ }
56
+
57
+ ###################################
58
+ # For sentence level manual control.
59
+
60
+ # use oral_(0-9), laugh_(0-2), break_(0-7)
61
+ # to generate special token in text to synthesize.
62
+ params_refine_text = {
63
+ 'prompt': '[oral_2][laugh_0][break_6]'
64
+ }
65
+
66
+ wav = chat.infer("<PUT YOUR TEXT HERE>", params_refine_text=params_refine_text, params_infer_code=params_infer_code)
67
+
68
+ ###################################
69
+ # For word level manual control.
70
+ # use_decoder=False to infer faster with a bit worse quality
71
+ text = 'What is [uv_break]your favorite english food?[laugh][lbreak]'
72
+ wav = chat.infer(text, skip_refine_text=True, params_infer_code=params_infer_code, use_decoder=False)
73
+
74
+ ```
75
+
76
+ <details open>
77
+ <summary><h4>自我介绍样例</h4></summary>
78
+
79
+ ```python
80
+ inputs_cn = """
81
+ chat T T S 是一款强大的对话式文本转语音模型。它有中英混读和多说话人的能力。
82
+ chat T T S 不仅能够生成自然流畅的语音,还能控制[laugh]笑声啊[laugh],
83
+ 停顿啊[uv_break]语气词啊等副语言现象[uv_break]。这个韵律超越了许多开源模型[uv_break]。
84
+ 请注意,chat T T S 的使用应遵守法律和伦理准则,避免滥用的安全风险。[uv_break]'
85
+ """.replace('\n', '')
86
+
87
+ params_refine_text = {
88
+ 'prompt': '[oral_2][laugh_0][break_4]'
89
+ }
90
+ audio_array_cn = chat.infer(inputs_cn, params_refine_text=params_refine_text)
91
+ audio_array_en = chat.infer(inputs_en, params_refine_text=params_refine_text)
92
+ ```
93
+ [男说话人](https://github.com/2noise/ChatTTS/assets/130631963/bbfa3b83-2b67-4bb6-9315-64c992b63788)
94
+
95
+ [女说话人](https://github.com/2noise/ChatTTS/assets/130631963/e061f230-0e05-45e6-8e4e-0189f2d260c4)
96
+ </details>
97
+
98
+
99
+ ---
100
+ ## 计划路线
101
+ - [x] 开源4w小时基础模型和spk_stats文件
102
+ - [ ] 开源VQ encoder和Lora 训练代码
103
+ - [ ] 在非refine text情况下, 流式生成音频*
104
+ - [ ] 开源多情感可控的4w小时版本
105
+ - [ ] ChatTTS.cpp maybe? (欢迎社区PR或独立的新repo)
106
+
107
+ ---
108
+ ## 常见问题
109
+
110
+ ##### 连不上HuggingFace
111
+ 请使用[modelscope](https://www.modelscope.cn/models/pzc163/chatTTS)的版本. 并设置cache的位置:
112
+ ```python
113
+
114
+ ```
115
+
116
+ ##### 我要多少显存? Infer的速度是怎么样的?
117
+ 对于30s的音频, 至少需要4G的显存. 对于4090D, 1s生成约7个字所对应的音频. RTF约0.65.
118
+
119
+ ##### 模型稳定性似乎不够好, 会出现其他说话人或音质很差的现象.
120
+ 这是自回归模型通常都会出现的问题. 说话人可能会在中间变化, 可能会采样到音质非常差的结果, 这通常难以避免. 可以多采样几次来找到合适的结果.
121
+
122
+ ##### 除了笑声还能控制什么吗? 还能控制其他情感吗?
123
+ 在现在放出的模型版本中, 只有[laugh]和[uv_break], [lbreak]作为字级别的控制单元. 在未来的版本中我们可能会开源其他情感控制的版本.
124
+
125
+ ---
126
+ ## 致谢
127
+ - [bark](https://github.com/suno-ai/bark),[XTTSv2](https://github.com/coqui-ai/TTS)和[valle](https://arxiv.org/abs/2301.02111)展示了自回归任务用于TTS任务的可能性.
128
+ - [fish-speech](https://github.com/fishaudio/fish-speech)一个优秀的自回归TTS模型, 揭示了GVQ用于LLM任务的可能性.
129
+ - [vocos](https://github.com/gemelo-ai/vocos)作为模型中的vocoder.
130
+
131
+ ---
132
+ ## 特别致谢
133
+ - [wlu-audio lab](https://audio.westlake.edu.cn/)为我们提供了早期算法试验的支持.
abc/infer.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
abc/requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ omegaconf~=2.3.0
2
+ torch~=2.0
3
+ tqdm
4
+ einops
5
+ vector_quantize_pytorch
6
+ transformers~=4.41.1
7
+ vocos
api.py ADDED
@@ -0,0 +1,361 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ sys.path.insert(0, os.getcwd())
4
+ import ChatTTS
5
+ import re
6
+ import time
7
+ import io
8
+ from io import BytesIO
9
+ import pandas
10
+ import numpy as np
11
+ from tqdm import tqdm
12
+ import random
13
+ import os
14
+ import json
15
+ from utils import batch_split,normalize_zh
16
+ import torch
17
+ import soundfile as sf
18
+ import wave
19
+
20
+ from fastapi import FastAPI, Request, HTTPException, Response
21
+ from fastapi.responses import StreamingResponse, JSONResponse
22
+
23
+ from starlette.middleware.cors import CORSMiddleware #引入 CORS中间件模块
24
+
25
+ #设置允许访问的域名
26
+ origins = ["*"] #"*",即为所有。
27
+
28
+ from pydantic import BaseModel
29
+
30
+ import uvicorn
31
+
32
+
33
+ from typing import Generator
34
+
35
+
36
+
37
+ chat = ChatTTS.Chat()
38
+ def clear_cuda_cache():
39
+ """
40
+ Clear CUDA cache
41
+ :return:
42
+ """
43
+ torch.cuda.empty_cache()
44
+
45
+
46
+ def deterministic(seed=0):
47
+ """
48
+ Set random seed for reproducibility
49
+ :param seed:
50
+ :return:
51
+ """
52
+ # ref: https://github.com/Jackiexiao/ChatTTS-api-ui-docker/blob/main/api.py#L27
53
+ torch.manual_seed(seed)
54
+ np.random.seed(seed)
55
+ torch.cuda.manual_seed(seed)
56
+ torch.backends.cudnn.deterministic = True
57
+ torch.backends.cudnn.benchmark = False
58
+
59
+
60
+ class TTS_Request(BaseModel):
61
+ text: str = None
62
+ seed: int = 2581
63
+ speed: int = 3
64
+ media_type: str = "wav"
65
+ streaming: int = 0
66
+
67
+
68
+
69
+
70
+
71
+
72
+ app = FastAPI()
73
+
74
+ app.add_middleware(
75
+ CORSMiddleware,
76
+ allow_origins=origins, #设置允许的origins来源
77
+ allow_credentials=True,
78
+ allow_methods=["*"], # 设置允许跨域的http方法,比如 get、post、put等。
79
+ allow_headers=["*"]) #允许跨域的headers,可以用来鉴别来源等作用。
80
+
81
+
82
+ def cut5(inp):
83
+ # if not re.search(r'[^\w\s]', inp[-1]):
84
+ # inp += '。'
85
+ inp = inp.strip("\n")
86
+ punds = r'[,.;?!、,。?!;:…]'
87
+ items = re.split(f'({punds})', inp)
88
+ mergeitems = ["".join(group) for group in zip(items[::2], items[1::2])]
89
+ # 在句子不存在符号或句尾无符号的时候保证文本完整
90
+ if len(items)%2 == 1:
91
+ mergeitems.append(items[-1])
92
+ # opt = "\n".join(mergeitems)
93
+ return mergeitems
94
+
95
+ # from https://huggingface.co/spaces/coqui/voice-chat-with-mistral/blob/main/app.py
96
+ def wave_header_chunk(frame_input=b"", channels=1, sample_width=2, sample_rate=24000):
97
+ # This will create a wave header then append the frame input
98
+ # It should be first on a streaming wav file
99
+ # Other frames better should not have it (else you will hear some artifacts each chunk start)
100
+ wav_buf = BytesIO()
101
+ with wave.open(wav_buf, "wb") as vfout:
102
+ vfout.setnchannels(channels)
103
+ vfout.setsampwidth(sample_width)
104
+ vfout.setframerate(sample_rate)
105
+ vfout.writeframes(frame_input)
106
+
107
+ wav_buf.seek(0)
108
+ return wav_buf.read()
109
+
110
+
111
+
112
+ ### modify from https://github.com/RVC-Boss/GPT-SoVITS/pull/894/files
113
+ def pack_ogg(io_buffer:BytesIO, data:np.ndarray, rate:int):
114
+
115
+ with sf.SoundFile(io_buffer, mode='w',samplerate=rate, channels=1, format='ogg') as audio_file:
116
+ audio_file.write(data)
117
+ return io_buffer
118
+
119
+
120
+ def pack_raw(io_buffer:BytesIO, data:np.ndarray, rate:int):
121
+ io_buffer.write(data.tobytes())
122
+ return io_buffer
123
+
124
+
125
+ def pack_wav(io_buffer:BytesIO, data:np.ndarray, rate:int):
126
+ io_buffer = BytesIO()
127
+ sf.write(io_buffer, data, rate, format='wav')
128
+ return io_buffer
129
+
130
+
131
+ def pack_aac(io_buffer:BytesIO, data:np.ndarray, rate:int):
132
+ process = subprocess.Popen([
133
+ 'ffmpeg',
134
+ '-f', 's16le', # 输入16位有符号小端整数PCM
135
+ '-ar', str(rate), # 设置采样率
136
+ '-ac', '1', # 单声道
137
+ '-i', 'pipe:0', # 从管道读取输入
138
+ '-c:a', 'aac', # 音频编码器为AAC
139
+ '-b:a', '192k', # 比特率
140
+ '-vn', # 不包含视频
141
+ '-f', 'adts', # 输出AAC数据流格式
142
+ 'pipe:1' # 将输出写入管道
143
+ ], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
144
+ out, _ = process.communicate(input=data.tobytes())
145
+ io_buffer.write(out)
146
+ return io_buffer
147
+
148
+ def pack_audio(io_buffer:BytesIO, data:np.ndarray, rate:int, media_type:str):
149
+
150
+ if media_type == "ogg":
151
+ io_buffer = pack_ogg(io_buffer, data, rate)
152
+ elif media_type == "aac":
153
+ io_buffer = pack_aac(io_buffer, data, rate)
154
+ elif media_type == "wav":
155
+ io_buffer = pack_wav(io_buffer, data, rate)
156
+ else:
157
+ io_buffer = pack_raw(io_buffer, data, rate)
158
+ io_buffer.seek(0)
159
+ return io_buffer
160
+
161
+
162
+ def generate_tts_audio(text_file,seed=2581,speed=1, oral=0, laugh=0, bk=4, min_length=80, batch_size=5, temperature=0.01, top_P=0.7,
163
+ top_K=20,streaming=0,cur_tqdm=None):
164
+
165
+ from utils import combine_audio, save_audio, batch_split
166
+
167
+ from utils import split_text, replace_tokens, restore_tokens
168
+
169
+
170
+ if seed in [0, -1, None]:
171
+ seed = random.randint(1, 9999)
172
+
173
+
174
+ content = text_file
175
+ # texts = split_text(content, min_length=min_length)
176
+
177
+
178
+ # if oral < 0 or oral > 9 or laugh < 0 or laugh > 2 or bk < 0 or bk > 7:
179
+ # raise ValueError("oral_(0-9), laugh_(0-2), break_(0-7) out of range")
180
+
181
+ # refine_text_prompt = f"[oral_{oral}][laugh_{laugh}][break_{bk}]"
182
+
183
+ # 将 [uv_break] [laugh] 替换为 _uv_break_ _laugh_ 处理后再还原
184
+ content = replace_tokens(content)
185
+ texts = split_text(content, min_length=min_length)
186
+ for i, text in enumerate(texts):
187
+ texts[i] = restore_tokens(text)
188
+
189
+ if oral < 0 or oral > 9 or laugh < 0 or laugh > 2 or bk < 0 or bk > 7:
190
+ raise ValueError("oral_(0-9), laugh_(0-2), break_(0-7) out of range")
191
+
192
+ refine_text_prompt = f"[oral_{oral}][laugh_{laugh}][break_{bk}]"
193
+
194
+
195
+ deterministic(seed)
196
+ rnd_spk_emb = chat.sample_random_speaker()
197
+ params_infer_code = {
198
+ 'spk_emb': rnd_spk_emb,
199
+ 'prompt': f'[speed_{speed}]',
200
+ 'top_P': top_P,
201
+ 'top_K': top_K,
202
+ 'temperature': temperature
203
+ }
204
+ params_refine_text = {
205
+ 'prompt': refine_text_prompt,
206
+ 'top_P': top_P,
207
+ 'top_K': top_K,
208
+ 'temperature': temperature
209
+ }
210
+
211
+
212
+
213
+ if not cur_tqdm:
214
+ cur_tqdm = tqdm
215
+
216
+ start_time = time.time()
217
+
218
+ if not streaming:
219
+
220
+ all_wavs = []
221
+
222
+
223
+ for batch in cur_tqdm(batch_split(texts, batch_size), desc=f"Inferring audio for seed={seed}"):
224
+
225
+ print(batch)
226
+ wavs = chat.infer(batch, params_infer_code=params_infer_code, params_refine_text=params_refine_text,use_decoder=True, skip_refine_text=True)
227
+ audio_data = wavs[0][0]
228
+ audio_data = audio_data / np.max(np.abs(audio_data))
229
+
230
+
231
+ all_wavs.append(audio_data)
232
+
233
+ # all_wavs.extend(wavs)
234
+
235
+ clear_cuda_cache()
236
+
237
+
238
+
239
+ audio = (np.concatenate(all_wavs) * 32768).astype(
240
+ np.int16
241
+ )
242
+
243
+ # end_time = time.time()
244
+ # elapsed_time = end_time - start_time
245
+ # print(f"Saving audio for seed {seed}, took {elapsed_time:.2f}s")
246
+
247
+ yield audio
248
+
249
+
250
+ else:
251
+
252
+ print("流式生成")
253
+
254
+ texts = [normalize_zh(_) for _ in content.split('\n') if _.strip()]
255
+
256
+
257
+ for text in texts:
258
+
259
+ wavs_gen = chat.infer(text, params_infer_code=params_infer_code, params_refine_text=params_refine_text,use_decoder=True, skip_refine_text=True,stream=True)
260
+
261
+ for gen in wavs_gen:
262
+ wavs = [np.array([[]])]
263
+ wavs[0] = np.hstack([wavs[0], np.array(gen[0])])
264
+ audio_data = wavs[0][0]
265
+
266
+ audio_data = audio_data / np.max(np.abs(audio_data))
267
+
268
+
269
+
270
+ yield (audio_data * 32767).astype(np.int16)
271
+
272
+ # clear_cuda_cache()
273
+
274
+
275
+
276
+
277
+
278
+ async def tts_handle(req:dict):
279
+
280
+ media_type = req["media_type"]
281
+
282
+ print(req["streaming"])
283
+ print(req["media_type"])
284
+
285
+ if not req["streaming"]:
286
+
287
+ audio_data = next(generate_tts_audio(req["text"],req["seed"]))
288
+
289
+ # print(audio_data)
290
+
291
+ sr = 24000
292
+
293
+ audio_data = pack_audio(BytesIO(), audio_data, sr, media_type).getvalue()
294
+
295
+
296
+ return Response(audio_data, media_type=f"audio/{media_type}")
297
+
298
+
299
+ # return FileResponse(f"./{audio_data}", media_type="audio/wav")
300
+
301
+ else:
302
+
303
+ tts_generator = generate_tts_audio(req["text"],req["seed"],streaming=1)
304
+
305
+ sr = 24000
306
+
307
+ def streaming_generator(tts_generator:Generator, media_type:str):
308
+ if media_type == "wav":
309
+ yield wave_header_chunk()
310
+ media_type = "raw"
311
+ for chunk in tts_generator:
312
+ print(chunk)
313
+ yield pack_audio(BytesIO(), chunk, sr, media_type).getvalue()
314
+
315
+ return StreamingResponse(streaming_generator(tts_generator, media_type), media_type=f"audio/{media_type}")
316
+
317
+
318
+
319
+ @app.get("/")
320
+ async def tts_get(text: str = None,media_type:str = "wav",seed:int = 2581,streaming:int = 0):
321
+ req = {
322
+ "text": text,
323
+ "media_type": media_type,
324
+ "seed": seed,
325
+ "streaming": streaming,
326
+ }
327
+ return await tts_handle(req)
328
+
329
+
330
+ @app.get("/speakers")
331
+ def speakers_endpoint():
332
+ return JSONResponse([{"name":"default","vid":1}], status_code=200)
333
+
334
+
335
+ @app.get("/speakers_list")
336
+ def speakerlist_endpoint():
337
+ return JSONResponse(["female_calm","female","male"], status_code=200)
338
+
339
+
340
+ @app.post("/")
341
+ async def tts_post_endpoint(request: TTS_Request):
342
+ req = request.dict()
343
+ return await tts_handle(req)
344
+
345
+
346
+ @app.post("/tts_to_audio/")
347
+ async def tts_to_audio(request: TTS_Request):
348
+ req = request.dict()
349
+ from config import llama_seed
350
+
351
+ req["seed"] = llama_seed
352
+
353
+ return await tts_handle(req)
354
+
355
+ if __name__ == "__main__":
356
+
357
+ chat.load_models(source="local", local_path="models")
358
+
359
+ # chat = load_chat_tts_model(source="local", local_path="models")
360
+
361
+ uvicorn.run(app,host='0.0.0.0',port=9880,workers=1)
assets/shot1.png ADDED
assets/shot2.png ADDED
assets/shot3.png ADDED
chattts_webui_mix.ipynb ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "nbformat": 4,
3
+ "nbformat_minor": 0,
4
+ "metadata": {
5
+ "colab": {
6
+ "provenance": [],
7
+ "gpuType": "T4"
8
+ },
9
+ "kernelspec": {
10
+ "name": "python3",
11
+ "display_name": "Python 3"
12
+ },
13
+ "language_info": {
14
+ "name": "python"
15
+ },
16
+ "accelerator": "GPU"
17
+ },
18
+ "cells": [
19
+ {
20
+ "cell_type": "markdown",
21
+ "source": [
22
+ "> 🌟 如果你觉得 ChatTTS 和 ChatTTS_colab 项目对你有帮助,请访问以下链接给它们点个星星吧!🌟\n",
23
+ "\n",
24
+ "- [ChatTTS 项目](https://github.com/2noise/ChatTTS)\n",
25
+ "\n",
26
+ "- [ChatTTS_colab 项目](https://github.com/6drf21e/ChatTTS_colab)\n",
27
+ "\n",
28
+ "感谢你的支持!\n",
29
+ "\n",
30
+ "# 运行方法\n",
31
+ "\n",
32
+ "- 点击菜单栏的--代码执行程序--全部运行即可\n",
33
+ "- 执行后在下方的日志中找到类似\n",
34
+ "\n",
35
+ " Running on public URL: https://**************.gradio.live <-这个就是可以访问的公网地址\n",
36
+ "\n",
37
+ "安装包的时候提示要重启 请点**\"否\"**"
38
+ ],
39
+ "metadata": {
40
+ "id": "Xo3k5XsTzWK6"
41
+ }
42
+ },
43
+ {
44
+ "cell_type": "code",
45
+ "source": [
46
+ "!git clone -q https://github.com/6drf21e/ChatTTS_colab\n",
47
+ "%cd ChatTTS_colab\n",
48
+ "!git clone -q https://github.com/2noise/ChatTTS\n",
49
+ "%cd ChatTTS\n",
50
+ "!git checkout -q e6412b1\n",
51
+ "%cd ..\n",
52
+ "!mv ChatTTS abc\n",
53
+ "!mv abc/* /content/ChatTTS_colab/\n",
54
+ "!pip install -q omegaconf vocos vector_quantize_pytorch gradio cn2an pypinyin openai jieba WeTextProcessing python-dotenv\n",
55
+ "# 启动 Gradio 有公网地址\n",
56
+ "!python webui_mix.py --share\n"
57
+ ],
58
+ "metadata": {
59
+ "id": "hNDl-5muR77-"
60
+ },
61
+ "execution_count": null,
62
+ "outputs": []
63
+ }
64
+ ]
65
+ }
cli.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import os
3
+ from tts_model import load_chat_tts_model, tts
4
+ from config import DEFAULT_SPEED, DEFAULT_ORAL, DEFAULT_LAUGH, DEFAULT_BK, DEFAULT_SEG_LENGTH, DEFAULT_BATCH_SIZE
5
+
6
+ if __name__ == "__main__":
7
+ parser = argparse.ArgumentParser(description="Generate TTS audio from text file.")
8
+ parser.add_argument("--text_file", type=str, required=True, help="Path to the text file to convert.")
9
+ parser.add_argument("--seed", type=int,
10
+ help="Specific seed for generating audio. If not provided, seeds will be random.")
11
+ parser.add_argument("--speed", type=int, default=DEFAULT_SPEED, help="Speed of generated audio.")
12
+ parser.add_argument("--oral", type=int, default=DEFAULT_ORAL, help="Oral")
13
+ parser.add_argument("--laugh", type=int, default=DEFAULT_LAUGH, help="Laugh")
14
+ parser.add_argument("--bk", type=int, default=DEFAULT_BK, help="Break")
15
+ parser.add_argument("--seg", type=int, default=DEFAULT_SEG_LENGTH, help="Max len of text segments.")
16
+ parser.add_argument("--batch", type=int, default=DEFAULT_BATCH_SIZE, help="Batch size for TTS inference.")
17
+ parser.add_argument("--source", type=str, default="huggingface", help="Model source: 'huggingface' or 'local'.")
18
+ parser.add_argument("--local_path", type=str, help="Path to local model if source is 'local'.")
19
+
20
+ args = parser.parse_args()
21
+ chat = load_chat_tts_model(source=args.source, local_path=args.local_path)
22
+ # chat = None
23
+ tts(chat, args.text_file, args.seed, args.speed, args.oral, args.laugh, args.bk, args.seg,
24
+ args.batch)
config.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Description: Configuration file for the project
2
+ llama_seed = 2581
3
+ DEFAULT_DIR = "output"
4
+ DEFAULT_SPEED = 5
5
+ DEFAULT_ORAL = 2
6
+ DEFAULT_LAUGH = 0
7
+ DEFAULT_BK = 4
8
+ # 段落切割
9
+ DEFAULT_SEG_LENGTH = 80
10
+ DEFAULT_BATCH_SIZE = 3
11
+ # 温度
12
+ DEFAULT_TEMPERATURE = 0.1
13
+ # top_P
14
+ DEFAULT_TOP_P = 0.7
15
+ # top_K
16
+ DEFAULT_TOP_K = 20
17
+ # LLM settings
18
+ LLM_RETRIES = 1
19
+ LLM_REQUEST_INTERVAL = 0.5
20
+ LLM_RETRY_DELAY = 1.1
21
+ LLM_MAX_TEXT_LENGTH = 2000
22
+ LLM_PROMPT = """
23
+ 角色: 你是一位专业的剧本编辑,擅长将故事文本转化为适合舞台或屏幕的剧本格式。
24
+ 技能: 剧本编辑、角色分析、文本转换、JSON格式处理。
25
+ 目标: 你需要将一个故事转换成旁白和各个角色的文本,并且希望最终的输出格式是JSON。
26
+ 限制条件: 确保转换的文本保留故事的原意,并且角色对话清晰、易于理解。
27
+ 输出格式: JSON格式(python可解析),包含旁白和各个角色的对话。
28
+ 工作流程:
29
+ - 阅读并理解原始故事文本。
30
+ - 将故事文本分解为大段的旁白和丰富角色对话。旁白应确保听众能够理解故事,包含细节、引人入胜。角色分配的 character 要符合角色身份。
31
+ - 将旁白和角色对话格式化为JSON。
32
+ 示例:
33
+ 故事文本: "在一个遥远的王国里,有一位勇敢的骑士和一位美丽的公主。有一天骑士遇到了公主。骑士说道:公主你真漂亮!。“谢谢你 亲爱的骑士先生”"
34
+ 转换后的JSON格式:
35
+ ```
36
+ [
37
+ {"txt": "在一个遥远的王国里,有一位勇敢的骑士和一位美丽的公主。有一天骑士遇到了公主。", "character": "旁白"},
38
+ {"txt": "骑士说道", "character": "旁白"},
39
+ {"txt": "公主你真漂亮!", "character": "年轻男性"},
40
+ {"txt": "谢谢你 亲爱的骑士先生", "character": "年轻女性"}
41
+ ]
42
+ ```
43
+ 注意: character 字段的值需要使用类似 "旁白"、"年轻男性"、"年轻女性" 等角色身份。如果有多个角色,可以使用 "年轻男性1"、"年轻男性2" 等。
44
+
45
+ --故事文本--
46
+ """
llm_utils.py ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ try:
2
+ import openai
3
+ except ImportError:
4
+ print("The 'openai' module is not installed. Please install it using 'pip install openai'.")
5
+ exit(1)
6
+ import json
7
+ import re
8
+ import time
9
+ from tqdm import tqdm
10
+ from config import LLM_RETRIES, LLM_REQUEST_INTERVAL, LLM_RETRY_DELAY, LLM_MAX_TEXT_LENGTH, LLM_PROMPT
11
+
12
+
13
+ def send_request(client, prompt, text, model):
14
+ text = remove_json_escape_characters(text)
15
+ messages = [{"role": "user", "content": f"{prompt}\n\n{text}"}]
16
+ try:
17
+ response = client.chat.completions.create(model=model, messages=messages, max_tokens=4096)
18
+ print(response)
19
+ return response.choices[0].message.content
20
+ except openai.OpenAIError as e:
21
+ print(f"OpenAI API error: {e}")
22
+ return None
23
+
24
+
25
+ def clean_text(text):
26
+ import re
27
+ if isinstance(text, str):
28
+ # 移除 ASCII 控制字符(0-31 和 127)
29
+ text = re.sub(r'[\x00-\x1F\x7F]', '', text)
30
+ return text
31
+
32
+
33
+ def extract_json(response_text):
34
+ with open("debug.txt", "w", encoding="utf8") as f:
35
+ f.write(response_text)
36
+ pattern = re.compile(r'((\[[^\}]{3,})?\{s*[^\}\{]{3,}?:.*\}([^\{]+\])?)', re.M | re.S)
37
+ match = re.search(pattern, response_text)
38
+ if match:
39
+ return match.group(0)
40
+ return None
41
+
42
+
43
+ def clean_and_load_json(json_string):
44
+ try:
45
+ cleaned_json_string = json_string.replace("'", '"')
46
+ cleaned_json_string = clean_text(cleaned_json_string)
47
+ # debug 写入文本
48
+ with open("debug.json", "w", encoding="utf8") as f:
49
+ f.write(cleaned_json_string)
50
+ json_obj = json.loads(cleaned_json_string)
51
+ return json_obj
52
+ except json.JSONDecodeError as e:
53
+ print(f"JSON decode error: {e}")
54
+ return None
55
+
56
+
57
+ def validate_json(json_obj, required_keys):
58
+ return isinstance(json_obj, list)
59
+ print(json_obj)
60
+ return True
61
+ if json_obj and all(key in json_obj for key in required_keys):
62
+ return True
63
+ return False
64
+
65
+
66
+ def process_text(client, prompt, text, model, required_keys):
67
+ parts = [text[i:i + LLM_MAX_TEXT_LENGTH] for i in range(0, len(text), LLM_MAX_TEXT_LENGTH)]
68
+ results = []
69
+
70
+ for part in tqdm(parts, desc="Processing text"):
71
+ for attempt in range(LLM_RETRIES + 1):
72
+ response = send_request(client, prompt, part, model)
73
+ if response:
74
+ json_string = extract_json(response)
75
+ if json_string:
76
+ json_obj = clean_and_load_json(json_string)
77
+ if validate_json(json_obj, required_keys):
78
+ results.extend(json_obj)
79
+ break
80
+ else:
81
+ print(f"Invalid JSON structure. Retrying ({attempt + 1}/{LLM_RETRIES})...")
82
+ else:
83
+ print(f"No JSON found in response. Retrying ({attempt + 1}/{LLM_RETRIES})...")
84
+ else:
85
+ print(f"API request failed. Retrying ({attempt + 1}/{LLM_RETRIES})...")
86
+ time.sleep(LLM_RETRY_DELAY)
87
+ time.sleep(LLM_REQUEST_INTERVAL)
88
+
89
+ return results
90
+
91
+
92
+ def llm_operation(api_base, api_key, model, prompt, text, required_keys):
93
+ client = openai.OpenAI(api_key=api_key, base_url=api_base)
94
+ return process_text(client, prompt, text, model, required_keys)
95
+
96
+
97
+ def remove_json_escape_characters(s):
98
+ """
99
+ 移除用户提交文本中容易被llm输出导致json校验出错的字符
100
+ :param s:
101
+ :return:
102
+ """
103
+ # 定义需要移除的字符
104
+ escape_chars = {
105
+ '"': '',
106
+ '\\': '',
107
+ '/': '',
108
+ '\b': '',
109
+ '\f': '',
110
+ '\n': '',
111
+ '\r': '',
112
+ '\t': '',
113
+ }
114
+ escape_re = re.compile('|'.join(re.escape(key) for key in escape_chars.keys()))
115
+
116
+ def replace(match):
117
+ return escape_chars[match.group(0)]
118
+
119
+ return escape_re.sub(replace, s)
requirements-macos.txt ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ aiofiles
2
+ altair
3
+ annotated-types
4
+ antlr4-python3-runtime
5
+ anyio
6
+ attrs
7
+ certifi
8
+ charset-normalizer
9
+ click
10
+ cn2an
11
+ contourpy
12
+ cycler
13
+ distro
14
+ dnspython
15
+ einops
16
+ einx
17
+ email_validator
18
+ encodec
19
+ fastapi
20
+ fastapi-cli
21
+ ffmpy
22
+ filelock
23
+ fonttools
24
+ frozendict
25
+ fsspec
26
+ gradio
27
+ gradio_client
28
+ h11
29
+ httpcore
30
+ httptools
31
+ httpx
32
+ huggingface-hub
33
+ idna
34
+ importlib_resources
35
+ jieba
36
+ Jinja2
37
+ jsonschema
38
+ jsonschema-specifications
39
+ kiwisolver
40
+ markdown-it-py
41
+ MarkupSafe
42
+ matplotlib
43
+ mdurl
44
+ mpmath
45
+ networkx
46
+ numpy
47
+ omegaconf
48
+ openai
49
+ orjson
50
+ packaging
51
+ pandas
52
+ pillow
53
+ proces
54
+ pydantic
55
+ pydantic_core
56
+ pydub
57
+ Pygments
58
+ pyparsing
59
+ pypinyin
60
+ python-dateutil
61
+ python-dotenv
62
+ python-multipart
63
+ pytz
64
+ PyYAML
65
+ referencing
66
+ regex
67
+ requests
68
+ rich
69
+ rpds-py
70
+ ruff
71
+ safetensors
72
+ scipy
73
+ semantic-version
74
+ shellingham
75
+ six
76
+ sniffio
77
+ socksio
78
+ starlette
79
+ sympy
80
+ tokenizers
81
+ tomlkit
82
+ toolz
83
+ torch
84
+ torchaudio
85
+ tqdm
86
+ transformers
87
+ typer
88
+ typing_extensions
89
+ tzdata
90
+ ujson
91
+ urllib3
92
+ uvicorn
93
+ uvloop
94
+ vector-quantize-pytorch
95
+ vocos
96
+ watchfiles
97
+ websockets
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ cn2an
2
+ pypinyin
3
+ openai
4
+ WeTextProcessing
slct_voice_240605.json ADDED
The diff for this file is too large to render. See raw diff
 
tts_model.py ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import datetime
2
+ import json
3
+ import os
4
+ import re
5
+ import time
6
+
7
+ import numpy as np
8
+ import torch
9
+ from tqdm import tqdm
10
+
11
+ import ChatTTS
12
+ from config import DEFAULT_TEMPERATURE, DEFAULT_TOP_P, DEFAULT_TOP_K
13
+
14
+
15
+ def load_chat_tts_model(source='huggingface', force_redownload=False, local_path=None):
16
+ """
17
+ Load ChatTTS model
18
+ :param source:
19
+ :param force_redownload:
20
+ :param local_path:
21
+ :return:
22
+ """
23
+ print("Loading ChatTTS model...")
24
+ chat = ChatTTS.Chat()
25
+ chat.load_models(source=source, force_redownload=force_redownload, custom_path=local_path, compile=False)
26
+ return chat
27
+
28
+
29
+ def clear_cuda_cache():
30
+ """
31
+ Clear CUDA cache
32
+ :return:
33
+ """
34
+ torch.cuda.empty_cache()
35
+
36
+
37
+ def deterministic(seed=0):
38
+ """
39
+ Set random seed for reproducibility
40
+ :param seed:
41
+ :return:
42
+ """
43
+ # ref: https://github.com/Jackiexiao/ChatTTS-api-ui-docker/blob/main/api.py#L27
44
+ torch.manual_seed(seed)
45
+ np.random.seed(seed)
46
+ torch.cuda.manual_seed(seed)
47
+ torch.backends.cudnn.deterministic = True
48
+ torch.backends.cudnn.benchmark = False
49
+
50
+
51
+ def generate_audio_for_seed(chat, seed, texts, batch_size, speed, refine_text_prompt, roleid=None,
52
+ temperature=DEFAULT_TEMPERATURE,
53
+ top_P=DEFAULT_TOP_P, top_K=DEFAULT_TOP_K, cur_tqdm=None, skip_save=False,
54
+ skip_refine_text=False, speaker_type="seed", pt_file=None):
55
+ from utils import combine_audio, save_audio, batch_split
56
+ print(f"speaker_type: {speaker_type}")
57
+ if speaker_type == "seed":
58
+ if seed in [None, -1, 0, "", "random"]:
59
+ seed = np.random.randint(0, 9999)
60
+ deterministic(seed)
61
+ rnd_spk_emb = chat.sample_random_speaker()
62
+ elif speaker_type == "role":
63
+ # 从 JSON 文件中读取数据
64
+ with open('./slct_voice_240605.json', 'r', encoding='utf-8') as json_file:
65
+ slct_idx_loaded = json.load(json_file)
66
+ # 将包含 Tensor 数据的部分转换回 Tensor 对象
67
+ for key in slct_idx_loaded:
68
+ tensor_list = slct_idx_loaded[key]["tensor"]
69
+ slct_idx_loaded[key]["tensor"] = torch.tensor(tensor_list)
70
+ # 将音色 tensor 打包进params_infer_code,固定使用此音色发音,调低temperature
71
+ rnd_spk_emb = slct_idx_loaded[roleid]["tensor"]
72
+ # temperature = 0.001
73
+ elif speaker_type == "pt":
74
+ print(pt_file)
75
+ rnd_spk_emb = torch.load(pt_file)
76
+ print(rnd_spk_emb.shape)
77
+ if rnd_spk_emb.shape != (768,):
78
+ raise ValueError("维度应为 768。")
79
+ else:
80
+ raise ValueError(f"Invalid speaker_type: {speaker_type}. ")
81
+
82
+ params_infer_code = {
83
+ 'spk_emb': rnd_spk_emb,
84
+ 'prompt': f'[speed_{speed}]',
85
+ 'top_P': top_P,
86
+ 'top_K': top_K,
87
+ 'temperature': temperature
88
+ }
89
+ params_refine_text = {
90
+ 'prompt': refine_text_prompt,
91
+ 'top_P': top_P,
92
+ 'top_K': top_K,
93
+ 'temperature': temperature
94
+ }
95
+ all_wavs = []
96
+ start_time = time.time()
97
+ total = len(texts)
98
+ flag = 0
99
+ if not cur_tqdm:
100
+ cur_tqdm = tqdm
101
+
102
+ if re.search(r'\[uv_break\]|\[laugh\]', ''.join(texts)) is not None:
103
+ if not skip_refine_text:
104
+ print("Detected [uv_break] or [laugh] in text, skipping refine_text")
105
+ skip_refine_text = True
106
+
107
+ for batch in cur_tqdm(batch_split(texts, batch_size), desc=f"Inferring audio for seed={seed}"):
108
+ flag += len(batch)
109
+ _params_infer_code = {**params_infer_code}
110
+ wavs = chat.infer(batch, params_infer_code=_params_infer_code, params_refine_text=params_refine_text,
111
+ use_decoder=True, skip_refine_text=skip_refine_text)
112
+ all_wavs.extend(wavs)
113
+ clear_cuda_cache()
114
+ if skip_save:
115
+ return all_wavs
116
+ combined_audio = combine_audio(all_wavs)
117
+ end_time = time.time()
118
+ elapsed_time = end_time - start_time
119
+ print(f"Saving audio for seed {seed}, took {elapsed_time:.2f}s")
120
+ timestamp = datetime.datetime.now().strftime('%Y-%m-%d_%H%M%S')
121
+ wav_filename = f"chattts-[seed_{seed}][speed_{speed}]{refine_text_prompt}[{timestamp}].wav"
122
+ return save_audio(wav_filename, combined_audio)
123
+
124
+
125
+ def generate_refine_text(chat, seed, text, refine_text_prompt, temperature=DEFAULT_TEMPERATURE,
126
+ top_P=DEFAULT_TOP_P, top_K=DEFAULT_TOP_K):
127
+ if seed in [None, -1, 0, "", "random"]:
128
+ seed = np.random.randint(0, 9999)
129
+
130
+ deterministic(seed)
131
+
132
+ params_refine_text = {
133
+ 'prompt': refine_text_prompt,
134
+ 'top_P': top_P,
135
+ 'top_K': top_K,
136
+ 'temperature': temperature
137
+ }
138
+ print('params_refine_text:', text)
139
+ print('refine_text_prompt:', refine_text_prompt)
140
+ refine_text = chat.infer(text, params_refine_text=params_refine_text, refine_text_only=True, skip_refine_text=False)
141
+ print('refine_text:', refine_text)
142
+ return refine_text
143
+
144
+
145
+ def tts(chat, text_file, seed, speed, oral, laugh, bk, seg, batch, progres=None):
146
+ """
147
+ Text-to-Speech
148
+ :param chat: ChatTTS model
149
+ :param text_file: Text file or string
150
+ :param seed: Seed
151
+ :param speed: Speed
152
+ :param oral: Oral
153
+ :param laugh: Laugh
154
+ :param bk:
155
+ :param seg:
156
+ :param batch:
157
+ :param progres:
158
+ :return:
159
+ """
160
+ from utils import read_long_text, split_text
161
+
162
+ if os.path.isfile(text_file):
163
+ content = read_long_text(text_file)
164
+ elif isinstance(text_file, str):
165
+ content = text_file
166
+ texts = split_text(content, min_length=seg)
167
+
168
+ print(texts)
169
+ # exit()
170
+
171
+ if oral < 0 or oral > 9 or laugh < 0 or laugh > 2 or bk < 0 or bk > 7:
172
+ raise ValueError("oral_(0-9), laugh_(0-2), break_(0-7) out of range")
173
+
174
+ refine_text_prompt = f"[oral_{oral}][laugh_{laugh}][break_{bk}]"
175
+ return generate_audio_for_seed(chat, seed, texts, batch, speed, refine_text_prompt)
utils.py ADDED
@@ -0,0 +1,333 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ try:
2
+ import cn2an
3
+ except ImportError:
4
+ print("The 'cn2an' module is not installed. Please install it using 'pip install cn2an'.")
5
+ exit(1)
6
+
7
+ try:
8
+ import jieba
9
+ except ImportError:
10
+ print("The 'jieba' module is not installed. Please install it using 'pip install jieba'.")
11
+ exit(1)
12
+
13
+ import re
14
+ import numpy as np
15
+ import wave
16
+ import jieba.posseg as pseg
17
+
18
+
19
+ def save_audio(file_name, audio, rate=24000):
20
+ """
21
+ 保存音频文件
22
+ :param file_name:
23
+ :param audio:
24
+ :param rate:
25
+ :return:
26
+ """
27
+ import os
28
+ from config import DEFAULT_DIR
29
+ audio = (audio * 32767).astype(np.int16)
30
+
31
+ # 检查默认目录
32
+ if not os.path.exists(DEFAULT_DIR):
33
+ os.makedirs(DEFAULT_DIR)
34
+ full_path = os.path.join(DEFAULT_DIR, file_name)
35
+ with wave.open(full_path, "w") as wf:
36
+ wf.setnchannels(1)
37
+ wf.setsampwidth(2)
38
+ wf.setframerate(rate)
39
+ wf.writeframes(audio.tobytes())
40
+ return full_path
41
+
42
+
43
+ def combine_audio(wavs):
44
+ """
45
+ 合并多段音频
46
+ :param wavs:
47
+ :return:
48
+ """
49
+ wavs = [normalize_audio(w) for w in wavs] # 先对每段音频归一化
50
+ combined_audio = np.concatenate(wavs, axis=1) # 沿着时间轴合并
51
+ return normalize_audio(combined_audio) # 合并后再次归一化
52
+
53
+
54
+ def normalize_audio(audio):
55
+ """
56
+ Normalize audio array to be between -1 and 1
57
+ :param audio: Input audio array
58
+ :return: Normalized audio array
59
+ """
60
+ audio = np.clip(audio, -1, 1)
61
+ max_val = np.max(np.abs(audio))
62
+ if max_val > 0:
63
+ audio = audio / max_val
64
+ return audio
65
+
66
+
67
+ def combine_audio_with_crossfade(audio_arrays, crossfade_duration=0.1, rate=24000):
68
+ """
69
+ Combine audio arrays with crossfade to avoid clipping noise at the junctions.
70
+ :param audio_arrays: List of audio arrays to combine
71
+ :param crossfade_duration: Duration of the crossfade in seconds
72
+ :param rate: Sample rate of the audio
73
+ :return: Combined audio array
74
+ """
75
+ crossfade_samples = int(crossfade_duration * rate)
76
+ combined_audio = np.array([], dtype=np.float32)
77
+
78
+ for i in range(len(audio_arrays)):
79
+ audio_arrays[i] = np.squeeze(audio_arrays[i]) # Ensure all arrays are 1D
80
+ if i == 0:
81
+ combined_audio = audio_arrays[i] # Start with the first audio array
82
+ else:
83
+ # Apply crossfade between the end of the current combined audio and the start of the next array
84
+ overlap = np.minimum(len(combined_audio), crossfade_samples)
85
+ crossfade_end = combined_audio[-overlap:]
86
+ crossfade_start = audio_arrays[i][:overlap]
87
+ # Crossfade by linearly blending the audio samples
88
+ t = np.linspace(0, 1, overlap)
89
+ crossfaded = crossfade_end * (1 - t) + crossfade_start * t
90
+ # Combine audio by replacing the end of the current combined audio with the crossfaded audio
91
+ combined_audio[-overlap:] = crossfaded
92
+ # Append the rest of the new array
93
+ combined_audio = np.concatenate((combined_audio, audio_arrays[i][overlap:]))
94
+
95
+ return combined_audio
96
+
97
+
98
+ def remove_chinese_punctuation(text):
99
+ """
100
+ 移除文本中的中文标点符号 [:;!(),【】『』「」《》-‘“’”:,;!\(\)\[\]><\-] 替换为 ,
101
+ :param text:
102
+ :return:
103
+ """
104
+ chinese_punctuation_pattern = r"[:;!(),【】『』「」《》-‘“’”:,;!\(\)\[\]><\-·]"
105
+ text = re.sub(chinese_punctuation_pattern, ',', text)
106
+ # 使用正则表达式将多个连续的句号替换为一个句号
107
+ text = re.sub(r'[。,]{2,}', '。', text)
108
+ # 删除开头和结尾的 , 号
109
+ text = re.sub(r'^,|,$', '', text)
110
+ return text
111
+
112
+ def remove_english_punctuation(text):
113
+ """
114
+ 移除文本中的中文标点符号 [:;!(),【】『』「」《》-‘“’”:,;!\(\)\[\]><\-] 替换为 ,
115
+ :param text:
116
+ :return:
117
+ """
118
+ chinese_punctuation_pattern = r"[:;!(),【】『』「」《》-‘“’”:,;!\(\)\[\]><\-·]"
119
+ text = re.sub(chinese_punctuation_pattern, ',', text)
120
+ # 使用正则表达式将多个连续的句号替换为一个句号
121
+ text = re.sub(r'[,\.]{2,}', '.', text)
122
+ # 删除开头和结尾的 , 号
123
+ text = re.sub(r'^,|,$', '', text)
124
+ return text
125
+
126
+
127
+ def text_normalize(text):
128
+ """
129
+ 对文本进行归一化处理 (PaddlePaddle版本)
130
+ :param text:
131
+ :return:
132
+ """
133
+ from zh_normalization import TextNormalizer
134
+ # ref: https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/zh_normalization
135
+ tx = TextNormalizer()
136
+ sentences = tx.normalize(text)
137
+ _txt = ''.join(sentences)
138
+ return _txt
139
+
140
+
141
+ def convert_numbers_to_chinese(text):
142
+ """
143
+ 将文本中的数字转换为中文数字 例如 123 -> 一百二十三
144
+ :param text:
145
+ :return:
146
+ """
147
+ return cn2an.transform(text, "an2cn")
148
+
149
+
150
+ def detect_language(sentence):
151
+ # ref: https://github.com/2noise/ChatTTS/blob/main/ChatTTS/utils/infer_utils.py#L55
152
+ chinese_char_pattern = re.compile(r'[\u4e00-\u9fff]')
153
+ english_word_pattern = re.compile(r'\b[A-Za-z]+\b')
154
+
155
+ chinese_chars = chinese_char_pattern.findall(sentence)
156
+ english_words = english_word_pattern.findall(sentence)
157
+
158
+ if len(chinese_chars) > len(english_words):
159
+ return "zh"
160
+ else:
161
+ return "en"
162
+
163
+
164
+ def split_text(text, min_length=60):
165
+ """
166
+ 将文本分割为长度不小于min_length的句子
167
+ :param text:
168
+ :param min_length:
169
+ :return:
170
+ """
171
+ # 短句分割符号
172
+ sentence_delimiters = re.compile(r'([。?!\.]+)')
173
+ # 匹配多个连续的回车符 作为段落点 强制分段
174
+ paragraph_delimiters = re.compile(r'(\s*\n\s*)+')
175
+
176
+ paragraphs = re.split(paragraph_delimiters, text)
177
+
178
+ result = []
179
+
180
+ for paragraph in paragraphs:
181
+ if not paragraph.strip():
182
+ continue # 跳过空段落
183
+ # 小于阈值的段落直接分开
184
+ if len(paragraph.strip()) < min_length:
185
+ result.append(paragraph.strip())
186
+ continue
187
+ # 大于的再计算拆分
188
+ sentences = re.split(sentence_delimiters, paragraph)
189
+ current_sentence = ''
190
+ for sentence in sentences:
191
+ if re.match(sentence_delimiters, sentence):
192
+ current_sentence += sentence.strip() + ''
193
+ if len(current_sentence) >= min_length:
194
+ result.append(current_sentence.strip())
195
+ current_sentence = ''
196
+ else:
197
+ current_sentence += sentence.strip()
198
+
199
+ if current_sentence:
200
+ if len(current_sentence) < min_length and len(result) > 0:
201
+ result[-1] += current_sentence
202
+ else:
203
+ result.append(current_sentence)
204
+ if detect_language(text[:1024]) == "zh":
205
+ result = [normalize_zh(_.strip()) for _ in result if _.strip()]
206
+ else:
207
+ result = [normalize_en(_.strip()) for _ in result if _.strip()]
208
+ return result
209
+
210
+
211
+ def normalize_en(text):
212
+ # 不再在 ChatTTS 外正则化文本
213
+ # from tn.english.normalizer import Normalizer
214
+ # normalizer = Normalizer()
215
+ # text = normalizer.normalize(text)
216
+ # text = remove_english_punctuation(text)
217
+ return text
218
+
219
+
220
+ def normalize_zh(text):
221
+ # 不再在 ChatTTS 外正则化文本
222
+ # from tn.chinese.normalizer import Normalizer
223
+ # normalizer = Normalizer()
224
+ # text = normalizer.normalize(text)
225
+ # text = remove_chinese_punctuation(text)
226
+ text = process_ddd(text)
227
+ return text
228
+
229
+
230
+ def batch_split(items, batch_size=5):
231
+ """
232
+ 将items划分为大小为batch_size的批次
233
+ :param items:
234
+ :param batch_size:
235
+ :return:
236
+ """
237
+ return [items[i:i + batch_size] for i in range(0, len(items), batch_size)]
238
+
239
+
240
+ # 读取 txt 文件,支持自动判断文件编码
241
+ def read_long_text(file_path):
242
+ """
243
+ 读取长文本文件,自动判断文件编码
244
+ :param file_path: 文件路径
245
+ :return: 文本内容
246
+ """
247
+ encodings = ['utf-8', 'gbk', 'iso-8859-1', 'utf-16']
248
+
249
+ for encoding in encodings:
250
+ try:
251
+ with open(file_path, 'r', encoding=encoding) as file:
252
+ return file.read()
253
+ except (UnicodeDecodeError, LookupError):
254
+ continue
255
+
256
+ raise ValueError("无法识别文件编码")
257
+
258
+
259
+ def replace_tokens(text):
260
+ remove_tokens = ['UNK']
261
+ for token in remove_tokens:
262
+ text = re.sub(r'\[' + re.escape(token) + r'\]', '', text)
263
+
264
+ tokens = ['uv_break', 'laugh','lbreak']
265
+ for token in tokens:
266
+ text = re.sub(r'\[' + re.escape(token) + r'\]', f'uu{token}uu', text)
267
+ text = text.replace('_', '')
268
+ return text
269
+
270
+
271
+ def restore_tokens(text):
272
+ tokens = ['uvbreak', 'laugh', 'UNK', 'lbreak']
273
+ for token in tokens:
274
+ text = re.sub(r'uu' + re.escape(token) + r'uu', f'[{token}]', text)
275
+ text = text.replace('[uvbreak]', '[uv_break]')
276
+ return text
277
+
278
+
279
+ def process_ddd(text):
280
+ """
281
+ 处理“地”、“得” 字的使用,都替换为“的”
282
+ 依据:地、得的使用,主要是在动词和形容词前后,本方法没有严格按照语法替换,因为时常遇到用错的情况。
283
+ 另外受 jieba 分词准确率的影响,部分情况下可能会出漏掉。例如:小红帽疑惑地问
284
+ :param text: 输入的文本
285
+ :return: 处理后的文本
286
+ """
287
+ word_list = [(word, flag) for word, flag in pseg.cut(text, use_paddle=False)]
288
+ # print(word_list)
289
+ processed_words = []
290
+ for i, (word, flag) in enumerate(word_list):
291
+ if word in ["地", "得"]:
292
+ # Check previous and next word's flag
293
+ # prev_flag = word_list[i - 1][1] if i > 0 else None
294
+ # next_flag = word_list[i + 1][1] if i + 1 < len(word_list) else None
295
+
296
+ # if prev_flag in ['v', 'a'] or next_flag in ['v', 'a']:
297
+ if flag in ['uv', 'ud']:
298
+ processed_words.append("的")
299
+ else:
300
+ processed_words.append(word)
301
+ else:
302
+ processed_words.append(word)
303
+
304
+ return ''.join(processed_words)
305
+
306
+
307
+ def replace_space_between_chinese(text):
308
+ return re.sub(r'(?<=[\u4e00-\u9fff])\s+(?=[\u4e00-\u9fff])', '', text)
309
+
310
+
311
+ if __name__ == '__main__':
312
+ # txts = [
313
+ # "快速地跑过红色的大门",
314
+ # "笑得很开心,学得很好",
315
+ # "小红帽疑惑地问?",
316
+ # "大灰狼慌张地回答",
317
+ # "哦,这是为了更好地听你说话。",
318
+ # "大灰狼不耐烦地说:“为了更好地抱你。”",
319
+ # "他跑得很快,工作做得非常认真,这是他努力地结果。得到",
320
+ # ]
321
+ # for txt in txts:
322
+ # print(txt, '-->', process_ddd(txt))
323
+
324
+ txts = [
325
+ "电影中梁朝伟扮演的陈永仁的编号27149",
326
+ "这块黄金重达324.75克 我们班的最高总分为583分",
327
+ "12\~23 -1.5\~2",
328
+ "居维埃·拉色别德①、杜梅里②、卡特法日③,"
329
+
330
+ ]
331
+ for txt in txts:
332
+ print(txt, '-->', text_normalize(txt))
333
+ # print(txt, '-->', convert_numbers_to_chinese(txt))
webui_mix.py ADDED
@@ -0,0 +1,1036 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+
4
+ sys.path.insert(0, os.getcwd())
5
+ import argparse
6
+ import re
7
+ import time
8
+
9
+ import pandas
10
+ import numpy as np
11
+ from tqdm import tqdm
12
+ import random
13
+ import gradio as gr
14
+ import json
15
+ from utils import normalize_zh, batch_split, normalize_audio, combine_audio
16
+ from tts_model import load_chat_tts_model, clear_cuda_cache, generate_audio_for_seed
17
+ from config import DEFAULT_BATCH_SIZE, DEFAULT_SPEED, DEFAULT_TEMPERATURE, DEFAULT_TOP_K, DEFAULT_TOP_P, DEFAULT_ORAL, \
18
+ DEFAULT_LAUGH, DEFAULT_BK, DEFAULT_SEG_LENGTH
19
+ import torch
20
+
21
+ parser = argparse.ArgumentParser(description="Gradio ChatTTS MIX")
22
+ parser.add_argument("--source", type=str, default="huggingface", help="Model source: 'huggingface' or 'local'.")
23
+ parser.add_argument("--local_path", type=str, help="Path to local model if source is 'local'.")
24
+ parser.add_argument("--share", default=False, action="store_true", help="Share the server publicly.")
25
+
26
+ args = parser.parse_args()
27
+
28
+ # 存放音频种子文件的目录
29
+ SAVED_DIR = "saved_seeds"
30
+
31
+ # mkdir
32
+ if not os.path.exists(SAVED_DIR):
33
+ os.makedirs(SAVED_DIR)
34
+
35
+ # 文件路径
36
+ SAVED_SEEDS_FILE = os.path.join(SAVED_DIR, "saved_seeds.json")
37
+
38
+ # 选中的种子index
39
+ SELECTED_SEED_INDEX = -1
40
+
41
+ # 初始化JSON文件
42
+ if not os.path.exists(SAVED_SEEDS_FILE):
43
+ with open(SAVED_SEEDS_FILE, "w") as f:
44
+ f.write("[]")
45
+
46
+ chat = load_chat_tts_model(source=args.source, local_path=args.local_path)
47
+ # chat = None
48
+ # chat = load_chat_tts_model(source="local", local_path=r"models")
49
+
50
+ # 抽卡的最大数量
51
+ max_audio_components = 10
52
+
53
+ # 加载
54
+ def load_seeds():
55
+ with open(SAVED_SEEDS_FILE, "r") as f:
56
+ global saved_seeds
57
+
58
+ seeds = json.load(f)
59
+
60
+ # 兼容旧的 JSON 格式,添加 path 字段
61
+ for seed in seeds:
62
+ if 'path' not in seed:
63
+ seed['path'] = None
64
+
65
+ saved_seeds = seeds
66
+ return saved_seeds
67
+
68
+
69
+ def display_seeds():
70
+ seeds = load_seeds()
71
+ # 转换为 List[List] 的形式
72
+ return [[i, s['seed'], s['name'], s['path']] for i, s in enumerate(seeds)]
73
+
74
+
75
+ saved_seeds = load_seeds()
76
+ num_seeds_default = 2
77
+
78
+
79
+ def save_seeds():
80
+ global saved_seeds
81
+ with open(SAVED_SEEDS_FILE, "w") as f:
82
+ json.dump(saved_seeds, f)
83
+ saved_seeds = load_seeds()
84
+
85
+
86
+ # 添加 seed
87
+ def add_seed(seed, name, audio_path, save=True):
88
+ for s in saved_seeds:
89
+ if s['seed'] == seed:
90
+ return False
91
+ saved_seeds.append({
92
+ 'seed': seed,
93
+ 'name': name,
94
+ 'path': audio_path
95
+ })
96
+ if save:
97
+ save_seeds()
98
+
99
+
100
+ # 修改 seed
101
+ def modify_seed(seed, name, save=True):
102
+ for s in saved_seeds:
103
+ if s['seed'] == seed:
104
+ s['name'] = name
105
+ if save:
106
+ save_seeds()
107
+ return True
108
+ return False
109
+
110
+
111
+ def delete_seed(seed, save=True):
112
+ for s in saved_seeds:
113
+ if s['seed'] == seed:
114
+ saved_seeds.remove(s)
115
+ if save:
116
+ save_seeds()
117
+ return True
118
+ return False
119
+
120
+
121
+ def generate_seeds(num_seeds, texts, tq):
122
+ """
123
+ 生成随机音频种子并保存
124
+ :param num_seeds:
125
+ :param texts:
126
+ :param tq:
127
+ :return:
128
+ """
129
+ seeds = []
130
+ sample_rate = 24000
131
+ # 按行分割文本 并正则化数字和标点字符
132
+ texts = [normalize_zh(_) for _ in texts.split('\n') if _.strip()]
133
+ print(texts)
134
+ if not tq:
135
+ tq = tqdm
136
+ for _ in tq(range(num_seeds), desc=f"随机音色生成中..."):
137
+ seed = np.random.randint(0, 9999)
138
+
139
+ filename = generate_audio_for_seed(chat, seed, texts, 1, 5, "[oral_2][laugh_0][break_4]", None, 0.3, 0.7, 20)
140
+ seeds.append((filename, seed))
141
+ clear_cuda_cache()
142
+
143
+ return seeds
144
+
145
+
146
+ # 保存选定的音频种子
147
+ def do_save_seed(seed, audio_path):
148
+ print(f"Saving seed {seed} to {audio_path}")
149
+ seed = seed.replace('保存种子 ', '').strip()
150
+ if not seed:
151
+ return
152
+ add_seed(int(seed), seed, audio_path)
153
+ gr.Info(f"Seed {seed} has been saved.")
154
+
155
+
156
+ def do_save_seeds(seeds):
157
+ assert isinstance(seeds, pandas.DataFrame)
158
+
159
+ seeds = seeds.drop(columns=['Index'])
160
+
161
+ # 将 DataFrame 转换为字典列表格式,并将键转换为小写
162
+ result = [{k.lower(): v for k, v in row.items()} for row in seeds.to_dict(orient='records')]
163
+ print(result)
164
+ if result:
165
+ global saved_seeds
166
+ saved_seeds = result
167
+ save_seeds()
168
+ gr.Info(f"Seeds have been saved.")
169
+ return result
170
+
171
+
172
+ def do_delete_seed(val):
173
+ # 从 val 匹配 [(\d+)] 获取index
174
+ index = re.search(r'\[(\d+)\]', val)
175
+ global saved_seeds
176
+ if index:
177
+ index = int(index.group(1))
178
+ seed = saved_seeds[index]['seed']
179
+ delete_seed(seed)
180
+ gr.Info(f"Seed {seed} has been deleted.")
181
+ return display_seeds()
182
+
183
+
184
+ # 定义播放音频的函数
185
+ def do_play_seed(val):
186
+ # 从 val 匹配 [(\d+)] 获取index
187
+ index = re.search(r'\[(\d+)\]', val)
188
+ if index:
189
+ index = int(index.group(1))
190
+ seed = saved_seeds[index]['seed']
191
+ audio_path = saved_seeds[index]['path']
192
+ if audio_path:
193
+ return gr.update(visible=True, value=audio_path)
194
+ return gr.update(visible=False, value=None)
195
+
196
+
197
+ def seed_change_btn():
198
+ global SELECTED_SEED_INDEX
199
+ if SELECTED_SEED_INDEX == -1:
200
+ return ['删除', '试听']
201
+ return [f'删除 idx=[{SELECTED_SEED_INDEX[0]}]', f'试听 idx=[{SELECTED_SEED_INDEX[0]}]']
202
+
203
+
204
+ def audio_interface(num_seeds, texts, progress=gr.Progress()):
205
+ """
206
+ 生成音频
207
+ :param num_seeds:
208
+ :param texts:
209
+ :param progress:
210
+ :return:
211
+ """
212
+ seeds = generate_seeds(num_seeds, texts, progress.tqdm)
213
+ wavs = [_[0] for _ in seeds]
214
+ seeds = [f"保存种子 {_[1]}" for _ in seeds]
215
+ # 不足的部分
216
+ all_wavs = wavs + [None] * (max_audio_components - len(wavs))
217
+ all_seeds = seeds + [''] * (max_audio_components - len(seeds))
218
+ return [item for pair in zip(all_wavs, all_seeds, all_wavs) for item in pair]
219
+
220
+
221
+ # 保存刚刚生成的种子文件路径
222
+ audio_paths = [gr.State(value=None) for _ in range(max_audio_components)]
223
+
224
+
225
+ def audio_interface_with_paths(num_seeds, texts, progress=gr.Progress()):
226
+ """
227
+ 比 audio_interface 多携带音频的 path
228
+ """
229
+ results = audio_interface(num_seeds, texts, progress)
230
+ wavs = results[::2] # 提取音频文件路径
231
+ for i, wav in enumerate(wavs):
232
+ audio_paths[i].value = wav # 直接为 State 组件赋值
233
+ return results
234
+
235
+
236
+ def audio_interface_empty(num_seeds, texts, progress=gr.Progress(track_tqdm=True)):
237
+ return [None, "", None] * max_audio_components
238
+
239
+
240
+ def update_audio_components(slider_value):
241
+ # 根据滑块的值更新 Audio 组件的可见性
242
+ k = int(slider_value)
243
+ audios = [gr.Audio(visible=True)] * k + [gr.Audio(visible=False)] * (max_audio_components - k)
244
+ tbs = [gr.Textbox(visible=True)] * k + [gr.Textbox(visible=False)] * (max_audio_components - k)
245
+ stats = [gr.State(value=None)] * max_audio_components
246
+ print(f'k={k}, audios={len(audios)}')
247
+ return [item for pair in zip(audios, tbs, stats) for item in pair]
248
+
249
+
250
+ def seed_change(evt: gr.SelectData):
251
+ # print(f"You selected {evt.value} at {evt.index} from {evt.target}")
252
+ global SELECTED_SEED_INDEX
253
+ SELECTED_SEED_INDEX = evt.index
254
+ return evt.index
255
+
256
+
257
+ def generate_tts_audio(text_file, num_seeds, seed, speed, oral, laugh, bk, min_length, batch_size, temperature, top_P,
258
+ top_K, roleid=None, refine_text=True, speaker_type="seed", pt_file=None, progress=gr.Progress()):
259
+ from tts_model import generate_audio_for_seed
260
+ from utils import split_text, replace_tokens, restore_tokens
261
+ if seed in [0, -1, None]:
262
+ seed = random.randint(1, 9999)
263
+ content = ''
264
+ if os.path.isfile(text_file):
265
+ content = ""
266
+ elif isinstance(text_file, str):
267
+ content = text_file
268
+ # 将 [uv_break] [laugh] 替换为 _uv_break_ _laugh_ 处理后再还原
269
+ content = replace_tokens(content)
270
+ texts = split_text(content, min_length=min_length)
271
+ for i, text in enumerate(texts):
272
+ texts[i] = restore_tokens(text)
273
+
274
+ if oral < 0 or oral > 9 or laugh < 0 or laugh > 2 or bk < 0 or bk > 7:
275
+ raise ValueError("oral_(0-9), laugh_(0-2), break_(0-7) out of range")
276
+
277
+ refine_text_prompt = f"[oral_{oral}][laugh_{laugh}][break_{bk}]"
278
+ try:
279
+ output_files = generate_audio_for_seed(
280
+ chat=chat,
281
+ seed=seed,
282
+ texts=texts,
283
+ batch_size=batch_size,
284
+ speed=speed,
285
+ refine_text_prompt=refine_text_prompt,
286
+ roleid=roleid,
287
+ temperature=temperature,
288
+ top_P=top_P,
289
+ top_K=top_K,
290
+ cur_tqdm=progress.tqdm,
291
+ skip_save=False,
292
+ skip_refine_text=not refine_text,
293
+ speaker_type=speaker_type,
294
+ pt_file=pt_file,
295
+ )
296
+ return output_files
297
+ except Exception as e:
298
+ raise e
299
+
300
+
301
+ def generate_tts_audio_stream(text_file, num_seeds, seed, speed, oral, laugh, bk, min_length, batch_size, temperature,
302
+ top_P,
303
+ top_K, roleid=None, refine_text=True, speaker_type="seed", pt_file=None,
304
+ stream_mode="fake"):
305
+ from utils import split_text, replace_tokens, restore_tokens
306
+ from tts_model import deterministic
307
+ if seed in [0, -1, None]:
308
+ seed = random.randint(1, 9999)
309
+ content = ''
310
+ if os.path.isfile(text_file):
311
+ content = ""
312
+ elif isinstance(text_file, str):
313
+ content = text_file
314
+ # 将 [uv_break] [laugh] 替换为 _uv_break_ _laugh_ 处理后再还原
315
+ content = replace_tokens(content)
316
+ # texts = [normalize_zh(_) for _ in content.split('\n') if _.strip()]
317
+ texts = split_text(content, min_length=min_length)
318
+
319
+ for i, text in enumerate(texts):
320
+ texts[i] = restore_tokens(text)
321
+
322
+ if oral < 0 or oral > 9 or laugh < 0 or laugh > 2 or bk < 0 or bk > 7:
323
+ raise ValueError("oral_(0-9), laugh_(0-2), break_(0-7) out of range")
324
+
325
+ refine_text_prompt = f"[oral_{oral}][laugh_{laugh}][break_{bk}]"
326
+
327
+ print(f"speaker_type: {speaker_type}")
328
+ if speaker_type == "seed":
329
+ if seed in [None, -1, 0, "", "random"]:
330
+ seed = np.random.randint(0, 9999)
331
+ deterministic(seed)
332
+ rnd_spk_emb = chat.sample_random_speaker()
333
+ elif speaker_type == "role":
334
+ # 从 JSON 文件中读取数据
335
+ with open('./slct_voice_240605.json', 'r', encoding='utf-8') as json_file:
336
+ slct_idx_loaded = json.load(json_file)
337
+ # 将包含 Tensor 数据的部分转换回 Tensor 对象
338
+ for key in slct_idx_loaded:
339
+ tensor_list = slct_idx_loaded[key]["tensor"]
340
+ slct_idx_loaded[key]["tensor"] = torch.tensor(tensor_list)
341
+ # 将音色 tensor 打包进params_infer_code,固定使用此音色发音,调低temperature
342
+ rnd_spk_emb = slct_idx_loaded[roleid]["tensor"]
343
+ # temperature = 0.001
344
+ elif speaker_type == "pt":
345
+ print(pt_file)
346
+ rnd_spk_emb = torch.load(pt_file)
347
+ print(rnd_spk_emb.shape)
348
+ if rnd_spk_emb.shape != (768,):
349
+ raise ValueError("维度应为 768。")
350
+ else:
351
+ raise ValueError(f"Invalid speaker_type: {speaker_type}. ")
352
+
353
+ params_infer_code = {
354
+ 'spk_emb': rnd_spk_emb,
355
+ 'prompt': f'[speed_{speed}]',
356
+ 'top_P': top_P,
357
+ 'top_K': top_K,
358
+ 'temperature': temperature
359
+ }
360
+ params_refine_text = {
361
+ 'prompt': refine_text_prompt,
362
+ 'top_P': top_P,
363
+ 'top_K': top_K,
364
+ 'temperature': temperature
365
+ }
366
+
367
+ if stream_mode == "real":
368
+ for text in texts:
369
+ _params_infer_code = {**params_infer_code}
370
+ wavs_gen = chat.infer(text, params_infer_code=_params_infer_code, params_refine_text=params_refine_text,
371
+ use_decoder=True, skip_refine_text=True, stream=True)
372
+ for gen in wavs_gen:
373
+ wavs = [np.array([[]])]
374
+ wavs[0] = np.hstack([wavs[0], np.array(gen[0])])
375
+ audio = wavs[0][0]
376
+ yield 24000, normalize_audio(audio)
377
+
378
+ clear_cuda_cache()
379
+ else:
380
+ for text in batch_split(texts, batch_size):
381
+ _params_infer_code = {**params_infer_code}
382
+ wavs = chat.infer(text, params_infer_code=_params_infer_code, params_refine_text=params_refine_text,
383
+ use_decoder=True, skip_refine_text=False, stream=False)
384
+ combined_audio = combine_audio(wavs)
385
+ yield 24000, combined_audio[0]
386
+
387
+
388
+ def generate_refine(text_file, oral, laugh, bk, temperature, top_P, top_K, progress=gr.Progress()):
389
+ from tts_model import generate_refine_text
390
+ from utils import split_text, replace_tokens, restore_tokens, replace_space_between_chinese
391
+ seed = random.randint(1, 9999)
392
+ refine_text_prompt = f"[oral_{oral}][laugh_{laugh}][break_{bk}]"
393
+ content = ''
394
+ if os.path.isfile(text_file):
395
+ content = ""
396
+ elif isinstance(text_file, str):
397
+ content = text_file
398
+ if re.search(r'\[uv_break\]|\[laugh\]', content) is not None:
399
+ gr.Info("检测到 [uv_break] [laugh],不能重复 refine ")
400
+ # print("检测到 [uv_break] [laugh],不能重复 refine ")
401
+ return content
402
+ batch_size = 5
403
+
404
+ content = replace_tokens(content)
405
+ texts = split_text(content, min_length=120)
406
+ print(texts)
407
+ for i, text in enumerate(texts):
408
+ texts[i] = restore_tokens(text)
409
+ txts = []
410
+ for batch in progress.tqdm(batch_split(texts, batch_size), desc=f"Refine Text Please Wait ..."):
411
+ txts.extend(generate_refine_text(chat, seed, batch, refine_text_prompt, temperature, top_P, top_K))
412
+ return replace_space_between_chinese('\n\n'.join(txts))
413
+
414
+
415
+ def generate_seed():
416
+ new_seed = random.randint(1, 9999)
417
+ return {
418
+ "__type__": "update",
419
+ "value": new_seed
420
+ }
421
+
422
+
423
+ def update_label(text):
424
+ word_count = len(text)
425
+ return gr.update(label=f"朗读文本({word_count} 字)")
426
+
427
+
428
+ def inser_token(text, btn):
429
+ if btn == "+笑声":
430
+ return gr.update(
431
+ value=text + "[laugh]"
432
+ )
433
+ elif btn == "+停顿":
434
+ return gr.update(
435
+ value=text + "[uv_break]"
436
+ )
437
+
438
+
439
+ with gr.Blocks() as demo:
440
+ # 项目链接
441
+ gr.Markdown("""
442
+ <div style='text-align: center; font-size: 16px;'>
443
+ 🌟 <a href='https://github.com/6drf21e/ChatTTS_colab'>项目地址 欢迎 start</a> 🌟
444
+ </div>
445
+ """)
446
+
447
+ with gr.Tab("音色抽卡"):
448
+ with gr.Row():
449
+ with gr.Column(scale=1):
450
+ texts = [
451
+ "四川美食确实以辣闻名,但也有不辣的选择。比如甜水面、赖汤圆、蛋烘糕、叶儿粑等,这些小吃口味温和,甜而不腻,也很受欢迎。",
452
+ "我是一个充满活力的人,喜欢运动,喜欢旅行,喜欢尝试新鲜事物。我喜欢挑战自己,不断突破自己的极限,让自己变得更加强大。",
453
+ "罗森宣布将于7月24日退市,在华门店超6000家!",
454
+ ]
455
+ # gr.Markdown("### 随机音色抽卡")
456
+ gr.Markdown("""
457
+ 免抽卡,直接找稳定音色👇
458
+
459
+ [ModelScope ChatTTS Speaker(国内)](https://modelscope.cn/studios/ttwwwaa/ChatTTS_Speaker) | [HuggingFace ChatTTS Speaker(国外)](https://huggingface.co/spaces/taa/ChatTTS_Speaker)
460
+
461
+ 在相同的 seed 和 温度等参数下,音色具有一定的一致性。点击下面的“随机音色生成”按钮将生成多个 seed。找到满意的音色后,点击音频下方“保存”按钮。
462
+ **注意:不同机器使用相同种子生成的音频音色可能不同,同一机器使用相同种子多次生成的音频音色也可能变化。**
463
+ """)
464
+ input_text = gr.Textbox(label="测试文本",
465
+ info="**每行文本**都会生成一段音频,最终输出的音频是将这些音频段合成后的结果。建议使用**多行文本**进行测试,以确保音色稳定性。",
466
+ lines=4, placeholder="请输入文本...", value='\n'.join(texts))
467
+
468
+ num_seeds = gr.Slider(minimum=1, maximum=max_audio_components, step=1, label="seed生成数量",
469
+ value=num_seeds_default)
470
+
471
+ generate_button = gr.Button("随机音色抽卡🎲", variant="primary")
472
+
473
+ # 保存的种子
474
+ gr.Markdown("### 种子管理界面")
475
+ seed_list = gr.DataFrame(
476
+ label="种子列表",
477
+ headers=["Index", "Seed", "Name", "Path"],
478
+ datatype=["number", "number", "str", "str"],
479
+ interactive=True,
480
+ col_count=(4, "fixed"),
481
+ value=display_seeds
482
+ )
483
+
484
+ with gr.Row():
485
+ refresh_button = gr.Button("刷新")
486
+ save_button = gr.Button("保存")
487
+ del_button = gr.Button("删除")
488
+ play_button = gr.Button("试听")
489
+
490
+ with gr.Row():
491
+ # 添加已保存的种子音频播放组件
492
+ audio_player = gr.Audio(label="播放已保存种子音频", visible=False)
493
+
494
+ # 绑定按钮和函数
495
+ refresh_button.click(display_seeds, outputs=seed_list)
496
+ seed_list.select(seed_change).success(seed_change_btn, outputs=[del_button, play_button])
497
+ save_button.click(do_save_seeds, inputs=[seed_list], outputs=None)
498
+ del_button.click(do_delete_seed, inputs=del_button, outputs=seed_list)
499
+ play_button.click(do_play_seed, inputs=play_button, outputs=audio_player)
500
+
501
+ with gr.Column(scale=1):
502
+ audio_components = []
503
+ for i in range(max_audio_components):
504
+ visible = i < num_seeds_default
505
+ a = gr.Audio(f"Audio {i}", visible=visible)
506
+ t = gr.Button(f"Seed", visible=visible)
507
+ s = gr.State(value=None)
508
+ t.click(do_save_seed, inputs=[t, s], outputs=None).success(display_seeds, outputs=seed_list)
509
+ audio_components.append(a)
510
+ audio_components.append(t)
511
+ audio_components.append(s)
512
+
513
+ num_seeds.change(update_audio_components, inputs=num_seeds, outputs=audio_components)
514
+ # output = gr.Column()
515
+ # audio = gr.Audio(label="Output Audio")
516
+
517
+ generate_button.click(
518
+ audio_interface_empty,
519
+ inputs=[num_seeds, input_text],
520
+ outputs=audio_components
521
+ ).success(audio_interface, inputs=[num_seeds, input_text], outputs=audio_components)
522
+ with gr.Tab("长音频生成"):
523
+ with gr.Row():
524
+ with gr.Column():
525
+ gr.Markdown("### 文本")
526
+ # gr.Markdown("请上传要转换的文本文件(.txt 格式)。")
527
+ # text_file_input = gr.File(label="文本文件", file_types=[".txt"])
528
+ default_text = "四川美食确实以辣闻名,但也有不辣的选择。比如甜水面、赖汤圆、蛋烘糕、叶儿粑等,这些小吃口味温和,甜而不腻,也很受欢迎。"
529
+ text_file_input = gr.Textbox(label=f"朗读文本(字数: {len(default_text)})", lines=4,
530
+ placeholder="Please Input Text...", value=default_text)
531
+ # 当文本框内容发生变化时调用 update_label 函数
532
+ text_file_input.change(update_label, inputs=text_file_input, outputs=text_file_input)
533
+ # 加入停顿按钮
534
+ with gr.Row():
535
+ break_button = gr.Button("+停顿", variant="secondary")
536
+ laugh_button = gr.Button("+笑声", variant="secondary")
537
+ refine_button = gr.Button("Refine Text(预处理 加入停顿词、笑声等)", variant="secondary")
538
+
539
+ with gr.Column():
540
+ gr.Markdown("### 配置参数")
541
+ with gr.Row():
542
+ with gr.Column():
543
+ gr.Markdown("音色选择")
544
+ num_seeds_input = gr.Number(label="生成音频的数量", value=1, precision=0, visible=False)
545
+ speaker_stat = gr.State(value="seed")
546
+ tab_seed = gr.Tab(label="种子")
547
+ with tab_seed:
548
+ with gr.Row():
549
+ seed_input = gr.Number(label="指定种子", info="种子决定音色 0则随机", value=None,
550
+ precision=0)
551
+ generate_audio_seed = gr.Button("\U0001F3B2")
552
+ tab_roleid = gr.Tab(label="内置音色")
553
+ with tab_roleid:
554
+ roleid_input = gr.Dropdown(label="内置音色",
555
+ choices=[("发姐", "1"),
556
+ ("纯情男大学生", "2"),
557
+ ("阳光开朗大男孩", "3"),
558
+ ("知心小姐姐", "4"),
559
+ ("电视台女主持", "5"),
560
+ ("魅力大叔", "6"),
561
+ ("优雅甜美", "7"),
562
+ ("贴心男宝2", "21"),
563
+ ("正式打工人", "8"),
564
+ ("贴心男宝1", "9")],
565
+ value="1",
566
+ info="选择音色后会覆盖种子。感谢 @QuantumDriver 提供音色")
567
+ tab_pt = gr.Tab(label="上传.PT文件")
568
+ with tab_pt:
569
+ pt_input = gr.File(label="上传音色文件", file_types=[".pt"], height=100)
570
+
571
+ with gr.Row():
572
+ style_select = gr.Radio(label="预设参数", info="语速部分可自行更改",
573
+ choices=["小说朗读", "对话", "中英混合", "默认"], value="默认",
574
+ interactive=True, )
575
+ with gr.Row():
576
+ # refine
577
+ refine_text_input = gr.Checkbox(label="Refine",
578
+ info="打开后会自动根据下方参数添加笑声/停顿等。关闭后可自行添加 [uv_break] [laugh] 或者点击下方 Refin按钮先行转换",
579
+ value=True)
580
+ speed_input = gr.Slider(label="语速", minimum=1, maximum=10, value=DEFAULT_SPEED, step=1)
581
+ with gr.Row():
582
+ oral_input = gr.Slider(label="口语化", minimum=0, maximum=9, value=DEFAULT_ORAL, step=1)
583
+ laugh_input = gr.Slider(label="笑声", minimum=0, maximum=2, value=DEFAULT_LAUGH, step=1)
584
+ bk_input = gr.Slider(label="停顿", minimum=0, maximum=7, value=DEFAULT_BK, step=1)
585
+ # gr.Markdown("### 文本参数")
586
+ with gr.Row():
587
+ min_length_input = gr.Number(label="文本分段长度", info="大于这个数值进行分段",
588
+ value=DEFAULT_SEG_LENGTH, precision=0)
589
+ batch_size_input = gr.Number(label="批大小", info="越高越快 太高爆显存 4G推荐3 其他酌情",
590
+ value=DEFAULT_BATCH_SIZE, precision=0)
591
+ with gr.Accordion("其他参数", open=False):
592
+ with gr.Row():
593
+ # 温度 top_P top_K
594
+ temperature_input = gr.Slider(label="温度", minimum=0.01, maximum=1.0, step=0.01,
595
+ value=DEFAULT_TEMPERATURE)
596
+ top_P_input = gr.Slider(label="top_P", minimum=0.1, maximum=0.9, step=0.05, value=DEFAULT_TOP_P)
597
+ top_K_input = gr.Slider(label="top_K", minimum=1, maximum=20, step=1, value=DEFAULT_TOP_K)
598
+ # reset 按钮
599
+ reset_button = gr.Button("重置")
600
+
601
+ with gr.Row():
602
+ with gr.Column():
603
+ generate_button = gr.Button("生成音频", variant="primary")
604
+ with gr.Column():
605
+ generate_button_stream = gr.Button("流式生成音频(一边播放一边推理)", variant="primary")
606
+ stream_select = gr.Radio(label="流输出方式",
607
+ info="真流式为实验功能,播放效果:卡播卡播卡播(⏳🎵⏳🎵⏳🎵);伪流式为分段推理后输出,播放效果:卡卡卡播播播播(⏳⏳🎵🎵🎵🎵)。伪流式批次建议4以上减少卡顿",
608
+ choices=[("真", "real"), ("伪", "fake")], value="fake", interactive=True, )
609
+
610
+ with gr.Row():
611
+ output_audio = gr.Audio(label="生成的音频文件")
612
+ output_audio_stream = gr.Audio(label="流式音频", value=None,
613
+ streaming=True,
614
+ autoplay=True,
615
+ # disable auto play for Windows, due to https://developer.chrome.com/blog/autoplay#webaudio
616
+ interactive=False,
617
+ show_label=True)
618
+
619
+ generate_audio_seed.click(generate_seed,
620
+ inputs=[],
621
+ outputs=seed_input)
622
+
623
+
624
+ def do_tab_change(evt: gr.SelectData):
625
+ print(evt.selected, evt.index, evt.value, evt.target)
626
+ kv = {
627
+ "种子": "seed",
628
+ "内置音色": "role",
629
+ "上传.PT文件": "pt"
630
+ }
631
+ return kv.get(evt.value, "seed")
632
+
633
+
634
+ tab_seed.select(do_tab_change, outputs=speaker_stat)
635
+ tab_roleid.select(do_tab_change, outputs=speaker_stat)
636
+ tab_pt.select(do_tab_change, outputs=speaker_stat)
637
+
638
+
639
+ def do_style_select(x):
640
+ if x == "小说朗读":
641
+ return [4, 0, 0, 2]
642
+ elif x == "对话":
643
+ return [5, 5, 1, 4]
644
+ elif x == "中英混合":
645
+ return [4, 1, 0, 3]
646
+ else:
647
+ return [DEFAULT_SPEED, DEFAULT_ORAL, DEFAULT_LAUGH, DEFAULT_BK]
648
+
649
+
650
+ # style_select 选择
651
+ style_select.change(
652
+ do_style_select,
653
+ inputs=style_select,
654
+ outputs=[speed_input, oral_input, laugh_input, bk_input]
655
+ )
656
+
657
+ # refine 按钮
658
+ refine_button.click(
659
+ generate_refine,
660
+ inputs=[text_file_input, oral_input, laugh_input, bk_input, temperature_input, top_P_input, top_K_input],
661
+ outputs=text_file_input
662
+ )
663
+ # 重置按钮 重置温度等参数
664
+ reset_button.click(
665
+ lambda: [0.3, 0.7, 20],
666
+ inputs=None,
667
+ outputs=[temperature_input, top_P_input, top_K_input]
668
+ )
669
+
670
+ generate_button.click(
671
+ fn=generate_tts_audio,
672
+ inputs=[
673
+ text_file_input,
674
+ num_seeds_input,
675
+ seed_input,
676
+ speed_input,
677
+ oral_input,
678
+ laugh_input,
679
+ bk_input,
680
+ min_length_input,
681
+ batch_size_input,
682
+ temperature_input,
683
+ top_P_input,
684
+ top_K_input,
685
+ roleid_input,
686
+ refine_text_input,
687
+ speaker_stat,
688
+ pt_input
689
+ ],
690
+ outputs=[output_audio]
691
+ )
692
+
693
+ generate_button_stream.click(
694
+ fn=generate_tts_audio_stream,
695
+ inputs=[
696
+ text_file_input,
697
+ num_seeds_input,
698
+ seed_input,
699
+ speed_input,
700
+ oral_input,
701
+ laugh_input,
702
+ bk_input,
703
+ min_length_input,
704
+ batch_size_input,
705
+ temperature_input,
706
+ top_P_input,
707
+ top_K_input,
708
+ roleid_input,
709
+ refine_text_input,
710
+ speaker_stat,
711
+ pt_input,
712
+ stream_select
713
+ ],
714
+ outputs=[output_audio_stream]
715
+ )
716
+
717
+ break_button.click(
718
+ inser_token,
719
+ inputs=[text_file_input, break_button],
720
+ outputs=text_file_input
721
+ )
722
+
723
+ laugh_button.click(
724
+ inser_token,
725
+ inputs=[text_file_input, laugh_button],
726
+ outputs=text_file_input
727
+ )
728
+
729
+ with gr.Tab("角色扮演"):
730
+ def txt_2_script(text):
731
+ lines = text.split("\n")
732
+ data = []
733
+ for line in lines:
734
+ if not line.strip():
735
+ continue
736
+ parts = line.split("::")
737
+ if len(parts) != 2:
738
+ continue
739
+ data.append({
740
+ "character": parts[0],
741
+ "txt": parts[1]
742
+ })
743
+ return data
744
+
745
+
746
+ def script_2_txt(data):
747
+ assert isinstance(data, list)
748
+ result = []
749
+ for item in data:
750
+ txt = item['txt'].replace('\n', ' ')
751
+ result.append(f"{item['character']}::{txt}")
752
+ return "\n".join(result)
753
+
754
+
755
+ def get_characters(lines):
756
+ assert isinstance(lines, list)
757
+ characters = list([_["character"] for _ in lines])
758
+ unique_characters = list(dict.fromkeys(characters))
759
+ print([[character, 0] for character in unique_characters])
760
+ return [[character, 0, 5, 2, 0, 4] for character in unique_characters]
761
+
762
+
763
+ def get_txt_characters(text):
764
+ return get_characters(txt_2_script(text))
765
+
766
+
767
+ def llm_change(model):
768
+ llm_setting = {
769
+ "gpt-3.5-turbo-0125": ["https://api.openai.com/v1"],
770
+ "gpt-4o": ["https://api.openai.com/v1"],
771
+ "deepseek-chat": ["https://api.deepseek.com"],
772
+ "yi-large": ["https://api.lingyiwanwu.com/v1"]
773
+ }
774
+ if model in llm_setting:
775
+ return llm_setting[model][0]
776
+ else:
777
+ gr.Error("Model not found.")
778
+ return None
779
+
780
+
781
+ def ai_script_generate(model, api_base, api_key, text, progress=gr.Progress(track_tqdm=True)):
782
+ from llm_utils import llm_operation
783
+ from config import LLM_PROMPT
784
+ scripts = llm_operation(api_base, api_key, model, LLM_PROMPT, text, required_keys=["txt", "character"])
785
+ return script_2_txt(scripts)
786
+
787
+
788
+ def generate_script_audio(text, models_seeds, progress=gr.Progress()):
789
+ scripts = txt_2_script(text) # 将文本转换为剧本
790
+ characters = get_characters(scripts) # 从剧本中提取角色
791
+
792
+ #
793
+ import pandas as pd
794
+ from collections import defaultdict
795
+ import itertools
796
+ from tts_model import generate_audio_for_seed
797
+ from utils import combine_audio, save_audio, normalize_zh
798
+
799
+ assert isinstance(models_seeds, pd.DataFrame)
800
+
801
+ # 批次处理函数
802
+ def batch(iterable, batch_size):
803
+ it = iter(iterable)
804
+ while True:
805
+ batch = list(itertools.islice(it, batch_size))
806
+ if not batch:
807
+ break
808
+ yield batch
809
+
810
+ column_mapping = {
811
+ '角色': 'character',
812
+ '种子': 'seed',
813
+ '语速': 'speed',
814
+ '口语': 'oral',
815
+ '笑声': 'laugh',
816
+ '停顿': 'break'
817
+ }
818
+ # 使用 rename 方法重命名 DataFrame 的列
819
+ models_seeds = models_seeds.rename(columns=column_mapping).to_dict(orient='records')
820
+ # models_seeds = models_seeds.to_dict(orient='records')
821
+
822
+ # 检查每个角色是否都有对应的种子
823
+ print(models_seeds)
824
+ seed_lookup = {seed['character']: seed for seed in models_seeds}
825
+
826
+ character_seeds = {}
827
+ missing_seeds = []
828
+ # 遍历所有角色
829
+ for character in characters:
830
+ character_name = character[0]
831
+ seed_info = seed_lookup.get(character_name)
832
+ if seed_info:
833
+ character_seeds[character_name] = seed_info
834
+ else:
835
+ missing_seeds.append(character_name)
836
+
837
+ if missing_seeds:
838
+ missing_characters_str = ', '.join(missing_seeds)
839
+ gr.Info(f"以下角色没有种子,请先设置种子:{missing_characters_str}")
840
+ return None
841
+
842
+ print(character_seeds)
843
+ # return
844
+ refine_text_prompt = "[oral_2][laugh_0][break_4]"
845
+ all_wavs = []
846
+
847
+ # 按角色分组,加速推理
848
+ grouped_lines = defaultdict(list)
849
+ for line in scripts:
850
+ grouped_lines[line["character"]].append(line)
851
+
852
+ batch_results = {character: [] for character in grouped_lines}
853
+
854
+ batch_size = 5 # 设置批次大小
855
+ # 按角色处理
856
+ for character, lines in progress.tqdm(grouped_lines.items(), desc="生成剧本音频"):
857
+ info = character_seeds[character]
858
+ seed = info["seed"]
859
+ speed = info["speed"]
860
+ orla = info["oral"]
861
+ laugh = info["laugh"]
862
+ bk = info["break"]
863
+
864
+ refine_text_prompt = f"[oral_{orla}][laugh_{laugh}][break_{bk}]"
865
+
866
+ # 按批次处理
867
+ for batch_lines in batch(lines, batch_size):
868
+ texts = [normalize_zh(line["txt"]) for line in batch_lines]
869
+ print(f"seed={seed} t={texts} c={character} s={speed} r={refine_text_prompt}")
870
+ wavs = generate_audio_for_seed(chat, int(seed), texts, DEFAULT_BATCH_SIZE, speed,
871
+ refine_text_prompt, None, DEFAULT_TEMPERATURE, DEFAULT_TOP_P,
872
+ DEFAULT_TOP_K, skip_save=True) # 批量处理文本
873
+ batch_results[character].extend(wavs)
874
+
875
+ # 转换回原排序
876
+ for line in scripts:
877
+ character = line["character"]
878
+ all_wavs.append(batch_results[character].pop(0))
879
+
880
+ # 合成所有音频
881
+ audio = combine_audio(all_wavs)
882
+ fname = f"script_{int(time.time())}.wav"
883
+ return save_audio(fname, audio)
884
+
885
+
886
+ script_example = {
887
+ "lines": [{
888
+ "txt": "在一个风和日丽的下午,小红帽准备去森林里看望她的奶奶。",
889
+ "character": "旁白"
890
+ }, {
891
+ "txt": "小红帽说",
892
+ "character": "旁白"
893
+ }, {
894
+ "txt": "我要给奶奶带点好吃的。",
895
+ "character": "年轻女性"
896
+ }, {
897
+ "txt": "在森林里,小红帽遇到了狡猾的大灰狼。",
898
+ "character": "旁白"
899
+ }, {
900
+ "txt": "大灰狼说",
901
+ "character": "旁白"
902
+ }, {
903
+ "txt": "小红帽,你的篮子里装的是什么?",
904
+ "character": "中年男性"
905
+ }, {
906
+ "txt": "小红帽回答",
907
+ "character": "旁白"
908
+ }, {
909
+ "txt": "这是给奶奶的蛋糕和果酱。",
910
+ "character": "年轻女性"
911
+ }, {
912
+ "txt": "大灰狼心生一计,决定先到奶奶家等待小红帽。",
913
+ "character": "旁白"
914
+ }, {
915
+ "txt": "当小红帽到达奶奶家时,她发现大灰狼伪装成了奶奶。",
916
+ "character": "旁白"
917
+ }, {
918
+ "txt": "小红帽疑惑的问",
919
+ "character": "旁白"
920
+ }, {
921
+ "txt": "奶奶,你的耳朵怎么这么尖?",
922
+ "character": "年轻女性"
923
+ }, {
924
+ "txt": "大灰狼慌张地回答",
925
+ "character": "旁白"
926
+ }, {
927
+ "txt": "哦,这是为了更好地听你说话。",
928
+ "character": "中年男性"
929
+ }, {
930
+ "txt": "小红帽越发觉得不对劲,最终发现了大灰狼的诡计。",
931
+ "character": "旁白"
932
+ }, {
933
+ "txt": "她大声呼救,森林里的猎人听到后赶来救了她和奶奶。",
934
+ "character": "旁白"
935
+ }, {
936
+ "txt": "从此,小红帽再也没有单独进入森林,而是和家人一起去看望奶奶。",
937
+ "character": "旁白"
938
+ }]
939
+ }
940
+
941
+ ai_text_default = "武侠小说《花木兰大战周树人》 要符合人物背景"
942
+
943
+ with gr.Row(equal_height=True):
944
+ with gr.Column(scale=2):
945
+ gr.Markdown("### AI脚本")
946
+ gr.Markdown("""
947
+ 为确保生成效果稳定,仅支持与 GPT-4 相当的模型,推荐使用 4o yi-large deepseek。
948
+ 如果没有反应,请检查日志中的错误信息。如果提示格式错误,请重试几次。国内模型可能会受到风控影响,建议更换文本内容后再试。
949
+
950
+ 申请渠道(免费额度):
951
+
952
+ - [https://platform.deepseek.com/](https://platform.deepseek.com/)
953
+ - [https://platform.lingyiwanwu.com/](https://platform.lingyiwanwu.com/)
954
+
955
+ """)
956
+ # 申请渠道
957
+
958
+ with gr.Row(equal_height=True):
959
+ # 选择模型 只有 gpt4o deepseek-chat yi-large 三个选项
960
+ model_select = gr.Radio(label="选择模型", choices=["gpt-4o", "deepseek-chat", "yi-large"],
961
+ value="gpt-4o", interactive=True, )
962
+ with gr.Row(equal_height=True):
963
+ openai_api_base_input = gr.Textbox(label="OpenAI API Base URL",
964
+ placeholder="请输入API Base URL",
965
+ value=r"https://api.openai.com/v1")
966
+ openai_api_key_input = gr.Textbox(label="OpenAI API Key", placeholder="请输入API Key",
967
+ value="sk-xxxxxxx", type="password")
968
+ # AI提示词
969
+ ai_text_input = gr.Textbox(label="剧情简介或者一段故事", placeholder="请输入文本...", lines=2,
970
+ value=ai_text_default)
971
+
972
+ # 生成脚本的按钮
973
+ ai_script_generate_button = gr.Button("AI脚本生成")
974
+
975
+ with gr.Column(scale=3):
976
+ gr.Markdown("### 脚本")
977
+ gr.Markdown(
978
+ "脚本可以手工编写也可以从左侧的AI��本生成按钮生成。脚本格式 **角色::文本** 一行为一句” 注意是::")
979
+ script_text = "\n".join(
980
+ [f"{_.get('character', '')}::{_.get('txt', '')}" for _ in script_example['lines']])
981
+
982
+ script_text_input = gr.Textbox(label="脚本格式 “角色::文本 一行为一句” 注意是::",
983
+ placeholder="请输入文本...",
984
+ lines=12, value=script_text)
985
+ script_translate_button = gr.Button("步骤①:提取角色")
986
+
987
+ with gr.Column(scale=1):
988
+ gr.Markdown("### 角色种子")
989
+ # DataFrame 来存放转换后的脚本
990
+ # 默认数据 [speed_5][oral_2][laugh_0][break_4]
991
+ default_data = [
992
+ ["旁白", 2222, 3, 0, 0, 2],
993
+ ["年轻女性", 2, 5, 2, 0, 2],
994
+ ["中年男性", 2424, 5, 2, 0, 2]
995
+ ]
996
+
997
+ script_data = gr.DataFrame(
998
+ value=default_data,
999
+ label="角色对应的音色种子,从抽卡那获取",
1000
+ headers=["角色", "种子", "语速", "口语", "笑声", "停顿"],
1001
+ datatype=["str", "number", "number", "number", "number", "number"],
1002
+ interactive=True,
1003
+ col_count=(6, "fixed"),
1004
+ )
1005
+ # 生视频按钮
1006
+ script_generate_audio = gr.Button("步骤②:生成音频")
1007
+ # 输出的脚本音频
1008
+ script_audio = gr.Audio(label="AI生成的音频", interactive=False)
1009
+
1010
+ # 脚本相关事件
1011
+ # 脚本转换
1012
+ script_translate_button.click(
1013
+ get_txt_characters,
1014
+ inputs=[script_text_input],
1015
+ outputs=script_data
1016
+ )
1017
+ # 处理模型切换
1018
+ model_select.change(
1019
+ llm_change,
1020
+ inputs=[model_select],
1021
+ outputs=[openai_api_base_input]
1022
+ )
1023
+ # AI脚本生成
1024
+ ai_script_generate_button.click(
1025
+ ai_script_generate,
1026
+ inputs=[model_select, openai_api_base_input, openai_api_key_input, ai_text_input],
1027
+ outputs=[script_text_input]
1028
+ )
1029
+ # 音频生成
1030
+ script_generate_audio.click(
1031
+ generate_script_audio,
1032
+ inputs=[script_text_input, script_data],
1033
+ outputs=[script_audio]
1034
+ )
1035
+
1036
+ demo.launch(share=args.share, inbrowser=True)
zh_normalization/README.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Supported NSW (Non-Standard-Word) Normalization
2
+
3
+ |NSW type|raw|normalized|
4
+ |:--|:-|:-|
5
+ |serial number|电影中梁朝伟扮演的陈永仁的编号27149|电影中梁朝伟扮演的陈永仁的编号二七一四九|
6
+ |cardinal|这块黄金重达324.75克<br>我们班的最高总分为583分|这块黄金重达三百二十四点七五克<br>我们班的最高总分为五百八十三分|
7
+ |numeric range |12\~23<br>-1.5\~2|十二到二十三<br>负一点五到二|
8
+ |date|她出生于86年8月18日,她弟弟出生于1995年3月1日|她出生于八六年八月十八日, 她弟弟出生于一九九五年三月一日|
9
+ |time|等会请在12:05请通知我|等会请在十二点零五分请通知我
10
+ |temperature|今天的最低气温达到-10°C|今天的最低气温达到零下十度
11
+ |fraction|现场有7/12的观众投出了赞成票|现场有十二分之七的观众投出了赞成票|
12
+ |percentage|明天有62%的概率降雨|明天有百分之六十二的概率降雨|
13
+ |money|随便来几个价格12块5,34.5元,20.1万|随便来几个价格十二块五,三十四点五元,二十点一万|
14
+ |telephone|这是固话0421-33441122<br>这是手机+86 18544139121|这是固话零四二一三三四四一一二二<br>这是手机八六一八五四四一三九一二一|
15
+ ## References
16
+ [Pull requests #658 of DeepSpeech](https://github.com/PaddlePaddle/DeepSpeech/pull/658/files)
zh_normalization/__init__.py ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ from .text_normlization import *
zh_normalization/char_convert.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
3
+ #
4
+ # Licensed under the Apache License, Version 2.0 (the "License");
5
+ # you may not use this file except in compliance with the License.
6
+ # You may obtain a copy of the License at
7
+ #
8
+ # http://www.apache.org/licenses/LICENSE-2.0
9
+ #
10
+ # Unless required by applicable law or agreed to in writing, software
11
+ # distributed under the License is distributed on an "AS IS" BASIS,
12
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
+ # See the License for the specific language governing permissions and
14
+ # limitations under the License.
15
+ """Traditional and simplified Chinese conversion, a simplified character may correspond to multiple traditional characters.
16
+ """
17
+ simplified_charcters = '制咖片型超声盘鉴定仔点他命书歌粉巾字帐恤手指记忆棒形转弯沟光○〇㐄㐅㐆㐌㐖毒㐜㐡㐤㐰㐺㑇㑳㒳㒸㔾㗂㗎㝵㞎㞙㞞以㢲㢴㤅㥁㥯㨗㫺㬎㮎㮚㮸㲋㲱㲾㳮涧㵪㶸㷖㷭㹢㹴犬㺢狓㺵碗㽮㿝䍃䔢䖟䖸䗈䗥䗪䝓射䥯䦉䯝鲃鱼䲔䳗鹅䵹鼄䶑一对应映射丁不识下儿子做二休世丘之貉并中台原则串为甚谓干净了百事无成八变五十些人得道鸡升天代如并来去个国政策劲幽灵在欧洲游荡接样萝卜坑侧化传价元论醇共再准刀两断切分耕耘收获钱货物向看旧就绪险刻千金动劳永逸匙零夜半卡通回复返影踪反常态口咬气句话同吐快吹周味呼诺呜品红锅哄而散起唱和问三知生熟团漆黑火糟堆场空块面塌糊涂尘染壁厢夔已足多情露水大早到晚夫妻当关万莫开失古恨套所料既往孔见提师要家主审寸阴难买斗牛小撮部阵局展身层巴掌帆风顺席地带过年计于春头载四季期被蛇怕井绳度愿式份弹顷深前律径心意念差愁孤行俱全房厅交遮打技长把抓死拿眼泪鼻涕钥锁折段抿拍即合扫排掬挥拨拥上入击洞掷揽改故辙败文值名斑方面旁族日秋餐隔雅里终父旦时晌会霎间晃暴寒曝更月望垠际朝夕本正经利杯羹东西板枝独秀根筋杆进条龙服务概模次函数又性程总付步脚印趋登毛拔呵氧氮碳决雌雄波未平派谎言流清楚白准溜烟潭有获闻是处降琴鹤甲病发可拾沙目然了直以相眨穿睹瞥瞬矢的解石鸟神教秉虔诚秘种窝蜂穷窍笑置笔苟勾销抹杀煞等奖箍节吃箭仇双雕诗筹箩筐系列纸级士官统丝毫挂维网尽线微吭响股脑胎脉承腔臂力致效资源址器举功投般说讲规贸易叶障着慎满皆输号木电池衣倾钟高低视仁觉醒览遗角银币触溃九鼎蔽抄出驷马追重语破贫洗贯走路安蹴至几蹶振跃役胆汗较辈轮辞赞退六连遍递边针血锤音错门思闪真倒项栽雾类保护川先惊乍体哄鳞爪鸣滴泡邻域党专鼓作齐炒丑烯亥克内酯冬加奴卯肝炎基尺梁街裤镐客宠庭巳汝昌烷玲磊糖肇酉醛啷青县韪良香骨鲷丂七集河市弦喜嘴张舌堵区工业姊妹星架构巧彩扭歪拼凑余热曜武州爷浮屠美乡老阶树荤素碎落能魄鳃鳗珠丄丅丆万俟丈尚摸母娘量管群亚虎必我堂令申件装伏位博侠义界表女墟台戏臭皮匠胜诸葛亮赛顶倍催请运算包立叉戟离疫苗土史志演围揭瓦晒夷姑婆帝村宝烂尖杉碱屉桌山岔岛由纪峡坝库镇废从德后拗汤治旬食明昧曹朋友框栏极权幂曲归依猫民氟硼氯磷铁江侗自旅法司洋浦梅园温暖湾焦班幸用田略番叠皇炮捶硝苯酸腺苷棱草镜穗跳远索锦纲聚氰胺联店胚膲爱色堇紫罗兰芝茶饭菱云虫藏藩乱叛苏亲债凳学座恐恋柱测肌腹衩锥系貂企乌跪叩军车农题迭都甘油屯奏键短阿姨陪姐只顾茅庐槽驾魂鲜鹿页其菜单乘任供势午齿汉组织吊调泻唇坡城报坟外夸将尉建筑岸岗公床扬新剑升杭林栗校楼标款汽社浣海商馆剧院钢华港机械广媒环球融第医科证券综财乐育游涨犹岭疏瘾睑确兵领导缴肢膛船艾瑟尔苍蔡虞效衫覆访诉课谕议轨述野钩限敌鞋颌颔颚饶首龈站例修凡划垂届属崽颏厨拜挫摆放旋削棋榻槛礼沉注滑营狱画确仪聘花葬诏员跌辖周达酒锚闸陷陆雨雪飞威丌于丹久乏予理评产亢卑亦乎舞己悲矩圆词害志但住佞佳便俗信票案幅翁倦伦假偏倚斜亏鬼敲停备伤脾胃仅此像俭匮免宜穴焉戴兼容许冻伯仲负彼昼皂轩轾实刊划颠卫战哥比省非好黄饰别拘束掩奶睬选择摇扰烦苦枚写协厌及格受欢迎约只估侵犯割状告或缺抗拒挽撤救药喻磨灭端倪少逆逾越避靠适吉誉吝玉含延咎歹听啻渊善谋均匀堪忍够太惹妙妥妨孕症孝术室完纳推冠积宣疑辩栗碴称屈挠屑干涉衡待很忙恶忿怎么怠急耻恭息悦惑惜惟想愉愧怍慌愤启懂懈怀材才紧招认扣抵拉舍也罢插揣冒搭撞南墙扩核支攻敢雷攀敬里吗需景智暇曾罪遇朽枉止况竞争辱求愈渝溶济左右袒困补爽特寂寞示弱找谢畏强疾徐痛痒冤符眠睦瞅董何厚云措活疲羞者轻玻璃祥兆禁���稂莠稳佛换答简结果盟绝缕途给谈否羁翼耐肖胫毋宁兴舒若菲莱痕迹窠臼虚衰脸兔撒鹰棺范该详讳抬泰让须眉象众赀账费灰赖奇虑训辍辨菽麦辛近送透逞徒速续逮捕遂遑违逊斧钺艰醉锈随观弃显饱脂肪使丏丐帮丒且慢末丕替桃宗王尊凉爵各图屋脊粮署录坛吾禄职胄袭君厦丗北壑桐疹损逢陵鹬丙寅戌氨腈唑纶辰酮脱氢酶醚丞丢现掉纱帽弄扯炮碗丠両丣坐存激肩臻蒂莲悖序驱丨丩丫挺杈髻鬟细介俄伊犁京尼布订普渡央委监察检查剂圈设警队斯督剩震境航舶革防托播促质版蝾螈锋研艺历残消频谱精密制造陲邮候埔坚压坜凹汇执府究邦俘摄寮彬狼岳肺肿庸英讯诊埋粒胞括控码韩暑枪枢砥澳哇牟寿甸钻探篇签缀缝继耳肯照妇埃悬璧轴柜台辣搁浅邪跑纤阮阳私囊魔丮丰姿采丱烧丳丵丶丷丸参寨朗桂瑞砂衷霞貌凤仆舰因嫌宰峰干络牌持旨祭祷簿编罚宾办丼丿乀乂乃乄仰慕盛旷留考验阔乆乇么丑麽乊湖燃乑乒乓乕乖僻忤戾离谬迕乗危肥劫除隙浪婿乙炔肠酰吡咯盐乚乛乜嘢卿玄宫尾狐龟塔嶷兄弟泉章霄钉耙乞扎哀怜恕讨乢乣乤乥乧乨乩童乪乫乭乳晕汁液瑶浆牙癌突窦罩腐胶猪酪蛋糕菌瘤乴乵乶乷乸乹乺乼乾俸冰嘉哕嚎坤妈尸垒旱枯涸俐渴潮涩煸豆燥爹瘦瘪癣瞪袋脆姜贝隆馏乿亀亁叫咕攘扔搞男砸窜蓬麻亃亄亅却亇迟典今临繁累卵奉婚聪躬巨与迁添裂副宿岁怪恶尕仑愣杆硅硫钛铀锰芑杂异钠砷胂磺琥珀舱棍簧胡茬盗浩盆贩郎腿亍洪亐互欠助勉惠操斥诿系户译亓墓碑刑铃卅渠缤纷斗米旗宪钒灯徽瘟祖拳福谷丰脏腑绑肉腌苓蕴桥铺霸颜闹判喷冈底蛙陉矿亖亘亜罕们娜桑那努哈喀弗烈曼松森杜氏杯奥琛敦戊穆圣裔汇薛孙亟亡佚虏羊牢奋释卷卸契媾感额睫缠谊趾塞挤纽阻还配驰庄亨洛祚亪享津沪畿郊慈菴枇杷膏亭阁锃丽亳亶亹诛初责翻疯偶杰丛稠妖拖寰居吸授慧蜗吞壮魅狗矛盾益渣患忧稀描猿梦暂涯畜祸缘沸搜引擎臣横纭谁混援蒸兽狮税剖亻亼亽亡什献刹邡么仂仃仄仆富怨仈仉毕昔晨壳绍仍仏仒仕宦仗欺恃腰叹叹炬梓讫施仙后琼逝仚仝仞仟悔仡佬偿填泊拓扑簇羔购顿钦佩发棻阃驭养亿儆尤借帧赈凌叙帖李柔刚沃眦睚戒讹取飨读仨仫仮著泳卧躺韶夏裁仳仵唯贤凭钓诞仿似宋佛讽伀硕盼鹅伄儅伈伉俪柯始娃迈戈坦堡帕茨萨庙玛莉莎藤霍姆伋伍奢胥廷芳豪伎俩侍汛勒希羲雏伐憩整谟闲闲伕伙伴颐伜伝伢叔恒兹恩翰伱伲侣伶俜悧鼬伸懒缩喇叭伹伺伻伽倻辐伾似佃伫布乔妮墨佉卢佌贷劣廉昂档浓矮伞洼缓耗胸谷迷挡率龋宅沫舍疗佐贰佑占优据铧尝呢须鲁晓佗佘余坪寺瓜铳僧蒙芒陀龛哼呕坊奸孽弊揖祟茧缚誓贼佝偻瞀佟你夺赶佡佢佣佤佧贾佪佫佯佰佱洁绩酿肴佴卷佶佷佸佹佺佻佼佽佾具唤窘坏娱怒慨硬习惯聋膨胀蔓骇贵痹侀侁侂侃侄侅鸿燕侇侈糜靡侉侌妾侏儒仓鼠侐侑侔仑侘侚链侜偎傍钴循柳葫芦附価侮骂蔑侯岩截蚀局贴壶嬛宴捷携桶笺酌俣狭膝狄俅俉俊俏俎俑俓俔谚俚俛黎健呈固墒增守康箱湿祐镖镳杠盒靖膜龄俞豹猎噪孚封札筒托衍鸽剪撰稿炼厂禊练缮葺俯瞰撑冲效俳俴俵俶俷俺备俾伥倂倅储卒惶敷猝逃颉蓄崇隐倌倏忽刺蜡烛噍嚼坍扁抽毙葱楣灌灶粪背薮卖赔闭霉腾倓倔幸倘倜傥倝借箸挹浇阅倡狂倢倣値倥偬倨傲倩匡嗣冲柝珍倬倭寇猩倮倶倷倹勤赞偁偃充伪吏嗓寐惺扮拱芫茜藉虢钞偈伟晶偌宕距析滤殿疼瘫注颇偓偕鸭歇滞偝偟偢忘怡旺偨偩逼偫偭偯偰偱偲侦缉蹄偷减惰漏窥窃偸偺迹傀儡傅傈僳骂篱傎奎琳迪叟芭傒傔傕伧悉荒傜傞傢傣芽逼佣婢傮睨寄檄诵谣颂伛担辜弓惨蒿悼疤傺傻屄臆巢泄箧羡盖轧颓傿㑩僄僇佥僊働僎侨僔僖僚僝伪僣僤侥僦猴偾僩僬僭僮僯僰雇僵殖签静僾僿征陇儁侬儃儇侩朴薄儊儋儌儍傧儓俦侪拟尽儜儞儤儦儩汰哉寡渥裕酷儭儱罐儳儵儹傩俨儽兀臬臲鹫允勋勋宙宵帅憝彝谐嫂阋畅沛溢盈饥赫凶悍狠猛顽愚妣斩秦遣鞭耀敏荣槃泽爆碟磁秃缆辉霁卤朵娄孜烽酱勃汀箕裘钳耶蒙蕾彻兑软遭黜兎児韵媳爸兕觥兖兙兛兜售鍪肚兝兞兟兡兢兣樽殓涅睡禀籍赘泌啡肽奸幕涵涝熵疚眷稃衬讧赴焕椒歼植跏没试误猜栖窗肋袖颊兪卦撇胡岐廓轿疸枫茴珑厕秩募勺吨寓斤历亩迫筷厘最淫螺韬兮宽匪筛襄赢轭复兲诈刃堰戎痞蚁饷它冀铸冂冃円冇冉册嫁厉砺竭醮冏牧冑冓冔冕冖冗冘冞冢窄抑诬冥冫烘菇蛰冷凝坨橇淇淋炭饼砖碛窖醋雕雹霜冱冶炉艳嘲峻滩淡漠煖飕饮冼冽凃凄怆梗凅凇净凊凋敝蒙凔凛遵汞脢凞几凢処凰凯凵凶焰凸折刷纹预丧喽奔巡榜殡芙蓉租笼辑鞘萃凼锯镬刁蛮刂娩崩批拆摊掰蘖骤歧颗秒袂赃勿嘱忌磋琢肤刈羽刎讼戮舂桨艇刓刖霹雳刜创犊刡恙墅帜筵致劫劫刨昏默攸尿欲熏润薰圭删刮痧铲刱刲刳刴刵踏磅戳柏槐绣芹苋猬舟铭鹄鹜劫剁剃辫刭锉履铅克剌姻咽哨廊掠桅沿召瞻翅赵卜渺茫郭剒剔剕沥剚愎毅讷才剜剥啄采剞剟剡剣剤䌽剐肾驶黏剰袍剀紊铲剸剺剽剿劁劂札劈啪柴扳啦刘奭姥夼昫涓熙禅禹锡翔雁鹗刽刿弩柄蜻蛉劒劓劖劘劙澜篑赏矶釜晋甜薪逐劦熔纣虐赤囚劬劭労劵效劻劼劾峭艮勅勇励勍勐腊脖庞漫饲荡粥辄勖勗勘骄馁碌泮雇捐竹骑殊阱绩朴恳谨剿勧勩勯勰劢勋勷劝惩慰诫谏勹芡践阑匁庇拯粟扎袱裹饺匆遽匈匉匊匋匍匐茎匏匕妆痰脓蛹斋苑烤蹈塘羌熊阀螳螂疆碚竿纬荷茵邙魏匚匜匝匟扶稷匣匦拢匸匹耦匽匾匿卂叮疮禧轸堤棚迢钧炼卄卆遐卉瓷盲瓶当胱腱裸卋卌卍卐怯污贱鄙龌龊陋卓溪唐梯渔陈枣泥漳浔涧梨芬谯赡辕迦郑単驴弈洽鳌卛占筮卝卞卟吩啉屎翠厄卣卨卪卬卮榫袄玺绶钮蚤惧殆笃耸卲帘帙绕恤卼卽厂厎厓厔厖厗奚厘厍厜厝谅厕厤厥厪腻孢厮厰厳厣厹厺粕垢芜菁厼厾叁悟茸薯叄吵笄悌哺讥坫垄弧芯杠潜婴刍袁诘贪谍煽馈驳収岳缔灾贿骗叚叡吻拦蘑蜜诀燧玩砚筝椎蔺铜逗骊另觅叨唠谒杵姓喊嚷嚣咚咛塑寻恼憎擦只泣渗蝠叱吒咄咤喝籀黛舵舷叵叶铎懿昭穰苴辽叻叼吁堑嫖赌瞧爬众抒吅吆夥卺橡涤抱纵摩郡唁坠扇篮膀袜颈吋忾谘酬哭妓媛暗表缰迩妃羿絮蕃浑拐葵暮隅吔吖啶嗪戚吜啬噬咽吟哦咏吠吧唧嗒咐吪隽咀征燐苞茹钙哧吮吰吱嘎吲哚吴栋娇窟孟箫忠晗淞阖闾趼宇呐睛嘘拂捧疵熄竽笛糠吼吽呀吕韦蒙呃呆笨呇贡呉罄呋喃呎呏呔呠呡痴呣呤呦呧瑛眩扒晬淑姬瑜璇鹃呪呫哔嚅嗫呬呯呰呱呲咧噌钝呴呶呷呸呺呻哱咻啸噜吁坎坷逻呿咁咂咆哮咇咈咋蟹煦珅蔼咍咑咒诅咔哒嚓咾哝哩喱咗咠咡咢咣咥咦咨嗟询咩咪咫啮啮咭咮咱咲咳呛嗽咴啕咸咹咺呙喉咿婉恸悯赋矜绿茗蓝哂抢瞒哆嗦啰噻啾滨彗哋哌哎唷哟哏哐哞哢哤哪里哫啼喘哰哲萎蚌哳咩哽哿呗唅唆唈唉唎唏哗尧棣殇璜睿肃唔睇唕吣唞唣喳唪唬唰喏唲唳唵嘛唶唸唹唻唼唾唿啁啃鹦鹉啅埠栈榷祺铺鞅飙啊啍啎啐啓啕啖啗啜哑祈啢衔啤啥啫啱啲啵啺饥啽噶昆沁喁喂喆裙喈咙喋喌喎喑喒喓喔粗喙幛庆滋鹊喟喣喤喥喦喧骚喨喩梆吃葡萄喭驼挑吓碰枞瓣纯疱藻趟铬喵営喹喺喼喿嗀嗃嗄嗅嗈嗉嗊嗍嗐嗑嗔诟嗕嗖嗙嗛嗜痂癖嗝嗡嗤嗥嗨唢嗬嗯嗰嗲嗵叽嗷嗹嗾嗿嘀嘁嘂嘅惋嘈峪禾荫啀嘌嘏嘐嘒啯啧嘚唛嘞嘟囔嘣嘥嘦嘧嘬嘭这谑严敞馋松哓嘶嗥呒虾嘹嘻啴嘿噀噂噅噇噉噎噏噔噗噘噙噚咝噞噢噤蝉皿噩噫噭嗳噱哙噳嚏涌洒欲巫霏噷噼嚃嚄嚆抖哜尝嚔苏嚚嚜嚞嚟呖嚬嚭嚮嚯亸喾饬按竣苛嚵嘤啭冁呓膪谦囍囒囓囗囘萧酚飘溅谛囝溯眸纥銮鹘囟殉囡団囤囥囧囨囱囫囵囬囮囯囲図囶囷囸囹圄圉拟囻囿圀圂圃圊粹蠹赦圌垦圏滚鲱凿枘圕圛圜圞坯埂壤骸炕祠窑豚绅魠鲮鳖圧握圩圪垯圬圮圯炸岬幔毯祇窨菩溉圳圴圻圾坂坆沾坋坌舛壈昆垫墩椅坒坓坩埚坭坰坱坳坴坵坻坼杨挣涎帘垃垈垌垍垓垔垕垗垚垛垝垣垞垟垤垧垮垵垺垾垿埀畔埄埆埇埈埌殃隍埏埒埕埗埜垭埤埦埧埭埯埰埲埳埴埵埶绋埸培怖桩础辅埼埽堀诃侄庑堃堄摧磐贞韧砌堈堉垩堋堌堍堎垴堙堞堠礁堧堨舆堭堮蜓摘堲堳堽堿塁塄塈煤茔棵塍垲埘塓绸塕鸦沽虱塙冢塝缪塡坞埙塥塩塬塱场螨塼塽塾塿墀墁墈墉墐夯増毁墝墠墦渍钵墫墬堕墰墺墙橱壅壆壊壌壎壒榨蒜壔壕壖圹垆壜壝垅壡壬壭壱売壴壹壻壸寝壿夂夅夆変夊夌漱邑夓腕泄甥御骼夗夘夙衮瑙妊娠醣枭珊莺鹭戗幻魇夤蹀秘擂鸫姚宛闺屿庾挞拇賛蛤裨菠氅漓捞湄蚊霆鲨箐篆篷荆肆舅荔鲆巷惭骰辟邱镕镰阪漂烩鲵鲽鳄鸨胪鹏妒峨谭枰晏玑癸祝秤竺牡籁恢罡蝼蝎赐绒御梭夬夭砣榆怙枕夶夹馅奄崛葩谲奈贺祀赠奌奂奓奕䜣詝奘奜奠奡奣陶奨奁魁奫奬奰娲孩贬隶酥宄狡猾她姹嫣妁毡荼皋膻蝇嫔妄妍嫉媚娆妗趣妚妞妤碍妬娅妯娌妲妳妵妺姁姅姉姗姒姘姙姜姝姞姣姤姧姫姮娥姱姸姺姽婀娀诱慑胁娉婷娑娓娟娣娭娯娵娶娸娼婊婐婕婞婤婥溪孺婧婪婬婹婺婼婽媁媄媊媕媞媟媠媢媬媮妫媲媵媸媺媻媪眯媿嫄嫈袅嫏嫕妪嫘嫚嫜嫠嫡嫦嫩嫪毐嫫嫬嫰妩嫺娴嫽嫿妫嬃嬅嬉耍婵痴艳嬔嬖嬗嫱袅嫒嬢嬷嬦嬬嬭幼嬲嬴婶嬹嬾嬿孀娘孅娈孏曰癫屏孑孓雀孖斟篓谜摺孛矻鸠崮轲祜鸾孥邈毓棠膑孬孭孰孱孳孵泛罔衔孻孪宀宁冗拙株薇掣抚琪瓿榴谧弥宊濂祁瑕宍宏碁宓邸谳実潢町宥宧宨宬徵崎骏掖阙臊煮禽蚕宸豫寀寁寥寃檐庶寎暄碜寔寖寘寙寛寠苫寤肘洱滥蒗陕核寪弘绰螽宝擅疙瘩晷対檐専尃尅赎绌缭畴衅尌峙醌襟痲碧屁昊槌淘恵瀑牝畑莓缸羚觑蔻脏躁尔尓锐尗尙尜尟尢��尨尪尬尭尰擒尲尶尴尸尹潽蠖蛾尻扣梢蚴鳍脬蹲屇屌蚵屐屃挪屖屘屙屛屝屡屣峦嶂岩舄屧屦屩屪屃屮戍驻钾崖嵛巅旮旯楂榄榉芋茱萸靛麓屴屹屺屼岀岊岌岍阜岑彭巩岒岝岢岚岣岧岨岫岱岵岷峁峇峋峒峓峞峠嵋峨峰峱岘峹峿崀崁崆祯崋崌崃岖昆崒崔嵬巍萤颢崚崞崟崠峥巆崤崦崧殂岽崱崳崴崶崿嵂嵇嵊泗嵌嵎嵒嵓岁嵙嵞嵡嵩嵫嵯嵴嵼嵾嵝崭崭晴嶋嶌嶒嶓嵚崂嶙嶝嶞峤嶡嶢峄嶨嶭嶮嶰嶲岙嵘巂巃巇巉岿巌巓巘巛滇芎巟巠弋回巣巤炊擘蜥蟒蛊觋巰蜀彦淖杏茂甫楞巻巽帼巿帛斐鲫蕊帑帔帗帚琉汶帟帡帣帨裙帯帰帷帹暆帏幄帮幋幌幏帻幙帮幞幠幡幢幦幨幩幪帱幭幯幰遥蹉跎馀庚鉴幵幷稚邃庀庁広庄庈庉笠庋跋庖牺庠庤庥鲸庬庱庳庴庵馨衢庹庿廃厩廆廋廌廎廏廐廑廒荫廖廛厮搏锣廞弛袤廥廧廨廪廱绵踵髓廸迫瓯邺廻廼廾廿躔弁皱弇弌弍弎弐弑吊诡憾荐弝弢弣弤弨弭弮弰弪霖繇焘斌旭溥骞弶弸弼弾彀彄别累纠强彔彖彘彟彟陌彤贻彧绘虹彪炳雕蔚鸥彰瘅彲彳彴仿彷徉徨彸彽踩敛旆徂徇徊渭畲铉裼従筌徘徙徜徕膳苏萌渐徬徭醺徯徳徴潘徻徼忀瘁胖燎怦悸颤扉犀澎湃砰恍惚绞隘忉惮挨饿忐忑忒忖応忝忞耿忡忪忭忮忱忸怩忻悠懑怏遏怔怗怚怛怞怼黍讶怫怭懦怱怲恍怵惕怸怹恁恂恇恉恌恏恒恓恔恘恚恛恝恞恟恠恣恧眄恪恫恬澹恰恿悀悁悃悄悆悊悐悒晦悚悛悜悝悤您悩悪悮悰悱凄恻德悴怅惘闷悻悾惄愫钟蒐惆惇惌惎惏惓惔惙惛耄惝疟浊恿惦德恽惴蠢惸拈愀愃愆愈愊愍愐愑愒愓愔愕恪氓蠢騃昵惬赧悫愬愮愯恺愼慁恿慅慆慇霭慉慊愠慝慥怄怂慬慱悭慴慵慷戚焚憀灼郁憃惫憋憍眺捏轼愦憔憖憙憧憬憨憪憭怃憯憷憸憹憺懃懅懆邀懊懋怿懔懐懞懠懤懥恹懫懮懰懱毖懵遁梁雍忏懽戁戄戆戉戋戕戛戝戛戠戡戢戣戤戥戦戬戭戯轰戱披菊牖戸戹戺戻卯戽锹扂楔扃扆扈扊杖牵绢铐镯赉扐搂搅烊盹瞌跟趸镲靶鼾払扗玫腮扛扞扠扡扢盔押扤扦扱罾揄绥鞍郤窾扻扼扽抃抆抈抉抌抏瞎抔缳缢擞抜拗択抨摔歉蹿牾抶抻搐泵菸拃拄拊髀抛拌脯拎拏拑擢秧沓曳挛迂拚拝拠拡拫拭拮踢拴拶拷攒拽掇芥橐簪摹疔挈瓢骥捺蹻挌挍挎挐拣挓挖掘浚挙揍聩挲挶挟挿捂捃捄捅捆捉捋胳膊揎捌捍捎躯蛛捗捘捙捜捥捩扪捭据捱捻捼捽掀掂抡臀膘掊掎掏掐笙掔掗掞棉芍掤搪阐掫掮掯揉掱掲掽掾揃揅揆搓揌诨揕揗揘揜揝揞揠揥揩揪揫橥遒麈揰揲揵揶揸背揺搆搉搊搋搌搎搔搕撼橹捣搘搠搡搢搣搤搥搦搧搨搬楦裢讪赸掏搰搲搳搴揾搷搽搾搿摀摁摂摃摎掴摒摓跤摙摛掼摞摠摦喉羯摭摮挚摰摲抠摴抟摷掺摽撂撃撅稻撊撋挦锏泼撕撙撚㧑挢撢掸撦撅撩撬撱朔揿蚍蜉挝捡擀掳闯擉缶觚擐擕擖擗擡擣擤澡腚擧擨擩擫擭摈拧撷擸撸擽擿攃摅撵攉攥攐攓撄搀撺每攩攫辔澄攮攰攲攴轶攷砭讦攽碘敁敃敇敉叙敎筏敔敕敖闰诲敜煌敧敪敳敹敺敻敿斁衽斄牒绉诌斉斎斓鹑谰驳鳢斒筲斛斝斞斠斡斢斨斫斮晾沂潟颖绛邵斲斸釳於琅斾斿旀旗旃旄涡旌旎旐旒旓旖旛旝旟旡旣浴旰獭魃旴时旻旼旽昀昃昄昇昉晰躲澈熹皎皓矾昑昕昜昝昞昡昤晖笋昦昨是昱昳昴昶昺昻晁蹇隧蔬髦晄晅晒晛晜晞晟晡晢晤晥曦晩萘莹顗晿暁暋暌暍暐暔暕煅旸暝暠暡曚暦暨暪朦胧昵暲殄冯暵暸暹暻暾曀晔昙曈曌曏曐暧曘曙曛叠昽曩骆曱甴肱曷牍禺锟曽沧耽朁朅朆杪栓夸竟粘绦朊膺朏朐朓朕朘朙瞄觐溘饔飧朠朢朣栅椆淀虱朩朮朰朱炆璋钰炽鹮朳槿朵朾朿杅杇杌陧欣钊湛漼楷瀍煜玟缨翱肇舜贽适逵杓杕杗杙荀蘅杝杞脩珓筊杰榔狍閦颦缅莞杲杳眇杴杶杸杻杼枋枌枒枓衾葄翘纾逋枙狸桠枟槁枲枳枴枵枷枸橼枹枻柁柂柃柅柈柊柎某柑橘柒柘柙柚柜柞栎柟柢柣柤柩柬柮柰柲橙柶柷柸柺査柿栃栄栒栔栘栝栟柏栩栫栭栱栲栳栴檀栵栻桀骜桁镁桄桉桋桎梏椹葚桓桔桕桜桟桫椤桭杯桯桲桴桷桹湘溟梃梊梍梐潼栀枧梜梠梡梣梧梩梱梲梳梴梵梹棁棃樱棐棑棕榈簑绷蓑枨棘棜棨棩棪棫棬棯棰棱棳棸棹椁棼碗椄苕椈椊椋椌椐椑椓椗検椤椪椰椳椴椵椷椸椽椿楀匾楅篪楋楍楎楗楘楙楛楝楟楠楢楥桢楩楪楫楬楮楯楰梅楸楹楻楽榀榃榊榎槺榕榖榘榛狉莽搒笞榠榡榤榥榦榧杩榭榰榱梿霰榼榾桤槊闩槎槑槔槖様槜槢槥椠槪槭椮槱槲槻槼槾樆樊樏樑樕樗樘樛樟樠樧樨権樲樴樵猢狲桦樻罍樾樿橁橄橆桡笥龠橕橚橛辆椭橤橧竖膈跨橾橿檩檃檇柽檍檎檑檖檗桧槚檠樯檨檫檬梼槟檴檵柠棹櫆櫌栉櫜椟櫡槠栌枥榇栊櫹棂茄櫽欀欂欃欐欑栾欙棂溴欨欬欱欵欶欷歔欸欹欻欼欿歁歃歆艎歈歊莳蝶歓歕歘歙歛歜欤歠蹦诠镶蹒跚升陟歩歮歯歰歳歴璞歺瞑歾殁夭殈殍殑殗殜殙殛殒殢殣殥殪殚僵殰殳荃殷殸殹蛟殻肴谤殴毈毉喂毎���蕈毗毘毚茛邓毧毬毳毷毹毽毾毵牦氄氆靴氉氊氇氍氐聊氕氖気氘氙氚氛氜氝氡汹焊痉氤氲氥氦铝锌氪烃氩铵痤汪浒漉痘盂碾菖蒲蕹蛭螅氵冰氹氺氽烫氾氿渚汆汊汋汍汎汏汐汔汕褟汙汚汜蓠沼秽蔑汧汨汩汭汲汳汴堤汾沄沅沆瀣沇沈葆浸沦湎溺痼疴沌沍沏沐沔沕沘浜畹砾沚沢沬沭沮沰沱灢沴沷籽沺烹濡洄泂肛泅泆涌肓泐泑泒泓泔泖泙泚泜泝泠漩馍涛粼泞藓鳅泩泫泭泯铢泱泲洇洊泾琵琶荽蓟箔洌洎洏洑潄濯洙洚洟洢洣洧洨洩痢滔洫洮洳洴洵洸洹洺洼洿淌蜚浄浉浙赣渫浠浡浤浥淼瀚浬浭翩萍浯浰蜃淀苔蛞蝓蜇螵蛸煲鲤浃浼浽溦涂涊涐涑涒涔滂莅涘涙涪涫涬涮涴涶涷涿淄淅淆淊凄黯淓淙涟淜淝淟淠淢淤渌淦淩猥藿亵淬淮淯淰淳诣涞纺淸淹炖癯绮渇済渉渋渓渕涣渟渢滓渤澥渧渨渮渰渲渶渼湅湉湋湍湑湓湔黔湜湝浈湟湢湣湩湫湮麟湱湲湴涅満沩溍溎溏溛舐漭溠溤溧驯溮溱溲溳溵溷溻溼溽溾滁滃滉滊荥滏稽滕滘汇滝滫滮羼耷卤滹浐煎漈漊漎绎漕漖漘漙沤漜漪漾漥漦漯漰溆漶漷濞潀颍潎潏潕潗潚潝潞潠潦祉疡潲潵滗潸潺潾涠澁澂澃澉澌澍澐澒澔澙渑澣澦澧澨澫澬浍澰澴澶澼熏郁濆濇濈濉濊貊濔疣濜濠濩觞浚濮盥潍濲泺瀁滢渎渖瀌浏瀒瀔濒泸瀛潇潆瀡潴泷濑瀬弥潋瀳瀵瀹瀺瀼沣滠灉灋灒漓灖灏灞灠滦灥灨滟灪蜴灮烬獴灴灸灺炁炅鱿炗炘炙炤炫疽烙钎炯炰炱炲炴炷毁炻烀烋瘴鲳烓烔焙烜烝烳饪烺焃焄耆焌焐焓焗焜焞焠焢焮焯焱焼煁煃煆煇煊熠煍熬煐炜煕暖熏硷霾煚煝煟煠茕矸煨琐炀萁煳煺煻熀熅熇熉罴荧穹炝熘熛熜稔谙烁熤熨熯熰眶蚂颎熳熸熿燀烨燂燄盏燊燋燏燔隼燖焖燠燡灿燨燮燹燻燽燿爇爊爓爚爝爟爨蟾爯爰为爻丬爿牀牁牂牄牋窗牏牓窗釉牚腩蒡虻牠虽蛎牣牤牮牯牲牳牴牷牸牼绊牿靬犂犄犆犇犉犍犎犒荦犗犛犟犠犨犩犪犮犰狳犴犵犺狁甩狃狆狎狒獾狘狙黠狨狩狫狴狷狺狻豕狈蜘猁猇猈猊猋猓猖獗猗猘狰狞犸猞猟獕猭猱猲猳猷猸猹猺玃獀獃獉獍獏獐獒毙獙獚獜獝獞獠獢獣獧鼇蹊狯猃獬豸狝獯鬻獳犷猕猡玁菟玅玆玈珉糁禛郅玍玎玓瓅玔玕玖玗玘玞玠玡玢玤玥玦珏瑰玭玳瑁玶玷玹玼珂珇珈瑚珌馐馔珔珖珙珛珞珡珣珥珧珩珪佩珶珷珺珽琀琁陨玡琇琖琚琠琤琦琨琫琬琭琮琯琰琱琲琅琴珐珲瑀瑂瑄瑉玮瑑瑔瑗瑢瑭瑱瑲瑳瑽瑾瑿璀璨璁璅璆璈琏璊璐璘璚璝璟璠璡璥瑷璩璪璫璯璲玙璸璺璿瓀璎瓖瓘瓒瓛脐瓞瓠瓤瓧瓩瓮瓰瓱瓴瓸瓻瓼甀甁甃甄甇甋甍甎甏甑甒甓甔瓮甖甗饴蔗甙诧钜粱盎锈团甡褥産甪甬甭甮宁铠甹甽甾甿畀畁畇畈畊畋畎畓畚畛畟鄂畤畦畧荻畯畳畵畷畸畽畾疃叠疋疍疎箪疐疒疕疘疝疢疥疧疳疶疿痁痄痊痌痍痏痐痒痔痗瘢痚痠痡痣痦痩痭痯痱痳痵痻痿瘀痖瘃瘈瘉瘊瘌瘏瘐痪瘕瘖瘙瘚瘛疭瘜瘝瘗瘠瘥瘨瘭瘆瘯瘰疬瘳疠瘵瘸瘺瘘瘼癃痨痫癈癎癐癔癙癜癠疖症癞蟆癪瘿痈発踔绀蔫酵皙砬砒翎翳蔹钨镴皑鹎驹暨粤褶皀皁荚皃镈皈皌皋皒朱皕皖皘皜皝皞皤皦皨皪皫皭糙绽皴皲皻皽盅盋碗盍盚盝踞盦盩秋千盬盭眦睁瞤盯盱眙裰盵盻睐眂眅眈眊県眑眕眚眛眞眢眣眭眳眴眵眹瞓眽郛睃睅睆睊睍睎困睒睖睙睟睠睢睥睪睾睯睽睾眯瞈瞋瞍逛瞏瞕瞖眍䁖瞟瞠瞢瞫瞭瞳瞵瞷瞹瞽阇瞿眬矉矍铄矔矗矙瞩矞矟矠矣矧矬矫矰矱硪碇磙罅舫阡、矼矽礓砃砅砆砉砍砑砕砝砟砠砢砦砧砩砫砮砳艏砵砹砼硇硌硍硎硏硐硒硜硖砗磲茚钡硭硻硾碃碉碏碣碓碔碞碡碪碫碬砀碯碲砜碻礴磈磉磎硙磔磕磖磛磟磠磡磤磥蹭磪磬磴磵磹磻硗礀硚礅礌礐礚礜礞礤礧礮砻礲礵礽礿祂祄祅祆禳祊祍祏祓祔祕祗祘祛祧祫祲祻祼饵脔锢禂禇禋祦禔祎隋禖禘禚禜禝禠祃禢禤禥禨禫祢禴禸秆秈秊闱飒秋秏秕笈蘵赁秠秣秪秫秬秭秷秸稊稌稍稑稗稙稛稞稬秸稲稹稼颡稿穂穄穇穈穉穋稣贮穏穜穟秾穑穣穤穧穨穭穮穵穸窿阒窀窂窅窆窈窕窊窋窌窒窗窔窞窣窬黩蹙窑窳窴窵窭窸窗竁竃竈竑竜并竦竖篦篾笆鲛竾笉笊笎笏笐靥笓笤箓笪笫笭笮笰笱笲笳笵笸笻筀筅筇筈筎筑筘筠筤筥筦笕筒筭箸筰筱筳筴宴筸箂个箊箎箑箒箘箙箛箜篌箝箠箬镞箯箴箾篁筼筜篘篙篚篛篜篝篟篠篡篢篥篧篨篭篰篲筚篴篶篹篼箦簁簃簆簉簋簌簏簜簟簠簥簦簨簬簰簸簻籊藤籒籓籔签籚篯箨籣籥籧笾簖籫籯芾麴籵籸籹籼粁秕粋粑粔粝粛粞粢粧粨粲粳稗粻粽辟粿糅糆糈糌糍糒糔萼糗蛆蹋糢糨糬粽糯糱籴粜糸糺紃蹼鲣霉纡纨绔纫闽襻紑纰纮锭鸢鹞纴紞紟扎紩紬绂绁纻紽紾绐絁絃絅経絍绗絏缡褵絓絖絘絜绚絣螯絪絫聒絰絵绝絺絻絿綀绡綅绠绨绣綌綍綎捆綖綘継続缎绻綦綪线綮綯绾罟蝽綷縩绺绫緁绲緅緆缁绯緌緎総緑绱緖缃缄缂绵缗緤褓缌纂緪緰缑缈缏缇縁縃縄萦缙缒縏缣縕缞縚缜缟缛縠縡縢縦绦縯縰骋缧縳纤缦絷缥縻衙縿繄缫繈繊繋繐缯繖繘繙繠缋繣繨缰缲繸繻缱纁纆纇缬缵纩纑纕缵纙纚纛缾罃罆坛罋罂罎罏罖罘罛罝罠罣罥罦罨罫罭锾罳罶罹罻罽罿羂羃羇芈蕉51鸵羑羖羌羜羝羢羣羟羧羭羮羰羱羵羶羸藜鲐翀翃翅翊翌翏翕翛翟翡翣翥翦跹翪翫翚翮翯翱翽翾翿板饕鸹锨耋耇耎耏专耒耜耔耞耡耤耨耩耪耧耰鬓耵聍聃聆聎聝聡聦聱聴聂聼阈聿肄肏肐肕腋肙肜肟肧胛肫肬肭肰肴肵肸肼胊胍胏胑胔胗胙胝胠铨胤胦胩胬胭胯胰胲胴胹胻胼胾脇脘脝脞脡脣脤脥脧脰脲脳腆腊腌臜腍腒腓胨腜腠脶腥腧腬腯踝蹬镣腴腶蠕诽膂腽嗉膇膋膔腘膗膙膟黐膣膦膫膰膴膵膷脍臃臄臇臈臌臐臑臓膘臖臙臛臝臞臧蓐诩臽臾臿舀舁鳑鲏舋舎舔舗馆舝舠舡舢舨舭舲舳舴舸舺艁艄艅艉艋艑艕艖艗艘艚艜艟艣舣艨艩舻艬艭荏艴艳艸艹艻艿芃芄芊萰陂藭芏芔芘芚蕙芟芣芤茉芧芨芩芪芮芰鲢芴芷芸荛豢芼芿苄苒苘苙苜蓿苠苡苣荬苤苎苪镑苶苹苺苻苾茀茁范蠡萣茆茇茈茌茍茖茞茠茢茥茦菰茭茯茳藨茷藘茼荁荄荅荇荈菅蜢鸮荍荑荘豆荵荸荠莆莒莔莕莘莙莚莛莜莝莦莨菪莩莪莭莰莿菀菆菉菎菏菐菑菓菔芲菘菝菡菢菣菥蓂菧菫毂蓥菶菷菹醢菺菻菼菾萅萆苌萋萏萐萑萜萩萱萴莴扁萻葇葍葎葑荭葖葙葠葥苇葧葭药葳葴葶葸葹葽蒄蒎莼茏薹莅蒟蒻蒢蒦蒨蒭藁蒯蒱鉾蒴蒹蒺蒽荪蓁蓆蓇蓊蓌蓍蓏蓓蓖蓧蓪蓫荜跣藕苁蓰蓱莼蓷蓺蓼蔀蔂蔃蔆蔇蔉蔊蔋蔌蔎蔕蔘蔙蒌蔟锷蒋雯茑蔯蔳麻蔵蔸蔾荨蒇蕋蕍荞蕐蕑芸莸蕖蕗蕝蕞蕠蕡蒉蕣蕤蕨蕳蓣蕸蕺蕻薀薁薃薅薆荟薉芗薏薐蔷薖薘剃谔钗薜薠薢薤薧薨薫薬薳薶薷薸薽薾薿藄藇藋荩藐藙藚藟藦藳藴苈藷藾蘀蘁蕲苹蘗蘘蘝蘤蘧蘩蘸蘼虀虆虍蟠虒虓虖虡虣虥虩虬虰蛵蛇虷鳟虺虼蚆蚈蚋蚓蚔蚖蚘蚜蚡蚣蚧蚨蚩蚪蚯蚰蜒蚱蚳蚶蚹蚺蚻蚿蛀蛁蛄蛅蝮蛌蛍蛐蟮蛑蛓蛔蛘蛚蛜蛡蛣蜊蛩蛱蜕螫蜅蚬蜈蝣蜋蜍蜎蜑蠊蜛饯蜞蜣蜨蜩蜮蜱蜷蜺蜾蜿蝀蝃蝋蝌蝍蝎蝏蝗蝘蝙蝝鲼蝡蝤蝥猿蝰虻蝲蝴蝻螃蠏蛳螉螋螒螓螗螘螙螚蟥螟螣螥螬螭䗖螾螀蟀蟅蝈蟊蟋蟑蟓蟛蟜蟟蟢虮蟨蟪蟭蛲蟳蛏蟷蟺蟿蠁蠂蠃虿蠋蛴蠓蚝蠗蠙蠚蠛蠜蠧蟏蠩蜂蠮蠰蠲蠵蠸蠼蠽衁衄衄衇衈衉衋衎衒同衖胡衞裳钩衭衲衵衹衺衿袈裟袗袚袟袢袪袮袲袴袷袺袼褙袽裀裉袅裋夹裍裎裒裛裯裱裲裴裾褀褂褉褊裈褎褐褒褓褔褕袆褚褡褢褦褧褪褫袅褯褰褱裆褛褽褾襁褒襆裥襉襋襌襏襚襛襜裣襞襡襢褴襦襫襬襭襮襕襶襼襽襾覂覃覅霸覉覊覌覗觇覚覜觍觎覧覩觊觏覰観觌觔觕觖觜觽觝觡酲觩觫觭觱觳觯觷觼觾觿言赅讣訇訏訑訒诂讬訧訬訳訹证訾詀詅诋毁詈詊讵詑诒诐詗诎察詨诜詶詸詹詻诙诖誂誃诔锄诓誋诳诶悖誙诮诰誧説読誯谇訚谄谆諆諌诤诹诼諕谂谀諝谝諟喧谥諴諵谌谖誊謆謇歌謍謏謑谡谥謡謦謪谪讴謷謼谩哗譅譆譈譊讹譒撰谮鑫譞噪譩谵譬譱譲谴譸譹谫讅讆詟䜩雠讐谗谶讙谠讟谽豁豉豇岂豊豋豌豏豔豞豖豗豜豝豣豦豨豭豱豳豵豶豷豺豻貅貆狸猊貔貘䝙貜貤餍贳餸贶贲赂賏赊赇赒賝赓赕賨赍斗賮賵賸赚赙赜赟贉赆赑贕赝赬赭赱赳迄趁趂趄趐趑趒趔趡趦趫趮趯趱趴趵趷趹趺趿跁跂跅跆踬跄跐跕跖跗跙跛跦跧跩跫跬跮跱跲跴跺跼跽踅踆踈踉踊踒踖踘踜踟躇蹰踠踡踣踤踥踦踧跷踫踮逾踱踊踶踹踺踼踽躞蹁蹂躏蹎蹐蹓蹔跸蹚蹜蹝迹蹠蹡蹢跶蹧蹩蹪蹯鞠蹽躃躄躅踌跻躐踯跞躘躙躗躝躠蹑躜躧躩躭躰躬躶軃軆辊軏轫軘軜軝腭転軥軨軭軱轱辘軷轵轺軽軿輀輂辇辂辁輈挽輗辄辎辋輠輤輬輭輮辏輴輵輶輹輼辗辒轇轏轑轒辚轕轖轗轘轙轝轞轹轳罪辣辞辵辶辺込辿迅迋迍麿迓迣迤逦迥迨迮迸迺迻迿逄逅逌逍逑逓迳逖逡逭逯逴逶逹遄遅侦遘遛遝遢遨遫遯遰遴绕遹遻邂邅邉邋邎邕邗邘邛邠邢邧邨邯郸邰邲邳邴邶邷邽邾邿郃郄郇郈郔郕郗郙郚郜郝郞郏郠郢郪郫郯郰郲郳郴郷郹郾郿鄀鄄郓鄇鄈鄋鄍鄎鄏鄐鄑邹邬鄕郧鄗鄘鄚鄜鄞鄠鄢鄣鄤鄦鄩鄫鄬鄮鄯鄱郐鄷鄹邝鄻鄾鄿酃酅酆酇郦酊酋酎酏酐酣酔酕醄酖酗酞酡酢酤酩酴酹酺醁醅醆醊醍醐醑醓醖醝酝醡醤醨醪醭醯醰酦醲醴醵醸醹醼醽醾釂酾酽釆釈鲈镏阊钆钇钌钯钋鼢鼹钐钏釪釬釭釱钍釸钕钫鈃钭鈆鈇钚鈊鈌钤钣鈒鈤钬钪鈬铌铈钶铛钹铍钸钿鉄鉆铊铇鉌铋鉏铂钷铆钵鉥钲鉨钼钽鉱鉲鉶铰铒鉼铪銍銎铣銕镂铫铦铑铷銤铱铟銧铥铕铯銭銰焊銶锑锉汞鋂锒鋆鋈鋊铤鋍铗鋐鋑鋕鋘鋙锊锓锔锇铓鋭铖锆锂铽鋳鋹鋺鉴镚钎錀锞锖锫锩錍铔锕錔锱铮锛錞锬锜錤錩錬録铼錼锝钔锴鍉镀鍏鍐铡鍚锻锽锸锲锘鍫鍭鍱鍴锶鍹锗针锺锿镅鎉鎋鎌鎍鎏鎒鎓鎗镉鎚鎞镃鎤铩锼鎭鎯镒镍鎴镓��鎹镎镟鏊镆镠镝鏖铿锵鏚镗镘镛鏠鏦錾镤鏸镪鏻鏽鏾铙鐄鐇鐏铹镦镡鐗馗镫镢镨鐡锎镄鐩镌鐬鐱镭鐶鐻鐽镱鑀鑅镔鑐鑕鑚鑛鑢鑤镥鑪镧鑯鑱鑴鑵镊镢钃镻闫闬闶闳閒闵閗閟阂関合閤哄阆閲阉閺阎阏阍阌暗闉阕阗闑闒闿闘闚阚闟闠闤闼阞阢阤阨阬阯阹阼阽陁陑陔陛陜陡陥陬骘陴険陼陾阴隃隈隒隗隞隠隣隤隩隮隰颧隳隷隹雂雈雉雊雎雑雒雗雘雚雝雟雩雰雱驿霂霅霈霊沾霒霓霙霝霢霣霤霨霩霪霫霮靁叇叆靑靓靣腼靪靮靰靳靷靸靺靼靿鞀鞃鞄鞍鞗鞙鞚鞝鞞鞡鞣鞨鞫鞬鞮鞶鞹鞾鞑韅鞯驮韍韎韔韖韘韝韫韡韣韭韭韱韹韺頀刮頄顸顼頍颀颃颁頖頞頠頫頬颅頯頲颕頼悴顋顑颙颛颜顕顚顜颟顣颥颞飐飑台飓颸飏飖颽颾颿飀飂飚飌翻飡飣饲飥饨饫飮飧飶餀餂饸饹餇餈饽哺馂餖餗餚馄馃餟餠餤餧餩餪餫糊餮糇餲饧馎糕饩馈馊馌馒饇馑馓膳饎饐饘饟馕馘馥馝馡馣骝骡馵馹駃駄駅駆駉駋驽駓驵駗骀驸駜骂骈駪駬骃駴骎駹駽駾騂騄骓騆騉騋骒骐麟騑騒験騕骛騠騢騣騤騧骧騵驺骟騺蓦骖骠骢驆驈骅驌骁驎骣驒驔驖驙驦驩驫骺鲠骫骭肮骱骴骶骷髅骾髁髂髄髆膀髇髑髌髋髙髝髞髟髡髣髧髪髫髭髯髲髳髹髺髽髾鬁鬃鬅鬈鬋鬎鬏鬐鬑鬒鬖鬗鬘鬙鬠鬣斗鬫鬬阄鬯鬰鬲鬵鬷魆魈魊魋魍魉魑魖鳔魛魟魣魦魨魬鲂魵魸鮀鲅鮆鲧鲇鲍鲋鮓鲒鲕鮟鱇鮠鮦鮨鲔鲑鮶鮸鮿鲧鯄鯆鲩鯈鲻鯕鲭鲞鯙鯠鲲鯥鲰鲶鳀鯸鳊鲗䲠鹣鳇鰋鳄鳆鰕鰛鰜鲥鰤鳏鰦鳎鳐鳁鳓鰶鲦鲡鰼鰽鱀鱄鳙鱆鳕鱎鱐鳝鳝鳜鲟鲎鱠鳣鱨鲚鱮鱲鱵鱻鲅鳦凫鳯鳲鳷鳻鴂鴃鴄鸩鴈鴎鸰鴔鴗鸳鸯鸲鹆鸱鴠鴢鸪鴥鸸鹋鴳鸻鴷鴽鵀鵁鸺鹁鵖鵙鹈鹕鹅鵟鵩鹌鵫鵵鵷鵻鹍鶂鶊鶏鶒鹙鶗鶡鶤鶦鶬鶱鹟鶵鶸鶹鹡鶿鹚鷁鷃鷄鷇䴘䴘鷊鷏鹧鷕鹥鸷鷞鷟鸶鹪鹩鷩鷫鷭鹇鹇鸴鷾䴙鸂鸇䴙鸏鸑鸒鸓鸬鹳鸜鹂鹸咸鹾麀麂麃麄麇麋麌麐麑麒麚麛麝麤麸面麫麮麯麰麺麾黁黈黉黢黒黓黕黙黝黟黥黦黧黮黰黱黪黶黹黻黼黾鼋鼂鼃鼅鼈鼍鼏鼐鼒冬鼖鼙鼚鼛鼡鼩鼱鼪鼫鼯鼷鼽齁齆齇齈齉齌赍齑龀齕齗龅齚龇齞龃龉龆齢出齧齩齮齯齰齱齵齾厐龑龒龚龖龘龝龡龢龤'
18
+
19
+ traditional_characters = '制咖片型超聲盤鑒定仔點他命書歌粉巾字帳恤手指記憶棒形轉彎溝光○〇㐄㐅㐆㐌㐖毒㐜㐡㐤㐰㐺㑇㑳㒳㒸㔾㗂㗎㝵㞎㞙㞞㠯㢲㢴㤅㥁㥯㨗㫺㬎㮎㮚㮸㲋㲱㲾㳮㵎㵪㶸㷖㷭㹢㹴犬㺢狓㺵㼝㽮㿝䍃䔢䖟䖸䗈䗥䗪䝓䠶䥯䦉䯝䰾魚䲔䳗䳘䵹鼄䶑一對應映射丁不識下兒子做二休世丘之貉並中台原則串為甚謂乾淨了百事無成八變五十些人得道雞升天代如併來去個國政策勁幽靈在歐洲遊蕩接樣蘿蔔坑側化傳價元論醇共再准刀兩斷切分耕耘收穫錢貨物向看舊就緒險刻千金動勞永逸匙零夜半卡通回復返影蹤反常態口咬氣句話同吐快吹周味呼諾嗚品紅鍋哄而散起唱和問三知生熟團漆黑火糟堆場空塊麵塌糊塗塵染壁廂夔已足多情露水大早到晚夫妻當關萬莫開失古恨套所料既往孔見提師要家主審寸陰難買鬥牛小撮部陣局展身層巴掌帆風順席地帶過年計於春頭載四季期被蛇怕井繩度願式份彈頃深前律徑心意念差愁孤行俱全房廳交遮打技長把抓死拿眼淚鼻涕鑰鎖折段抿拍即合掃排掬揮撥擁上入擊洞擲攬改故轍敗文值名斑方面旁族日秋餐隔雅里終父旦時晌會霎間晃暴寒曝更月望垠際朝夕本正經利杯羹東西板枝獨秀根筋桿進條龍服務概模次函數又性程總付步腳印趨登毛拔呵氧氮碳決雌雄波未平派謊言流清楚白準溜煙潭有獲聞是處降琴鶴甲病發可拾沙目然瞭直以相眨穿睹瞥瞬矢的解石鳥神教秉虔誠秘種窩蜂窮竅笑置筆苟勾銷抹殺煞等獎箍節吃箭仇雙鵰詩籌籮筐系列紙級士官統絲毫掛維網盡線微吭響股腦胎脈承腔臂力致效資源址器舉功投般說講規貿易葉障著慎滿皆輸號木電池衣傾鐘高低視仁覺醒覽遺角銀幣觸潰九鼎蔽抄出駟馬追重語破貧洗貫走路安蹴至幾蹶振躍役膽汗較輩輪辭贊退六連遍遞邊針血錘音錯門思閃真倒項栽霧類保護川先驚乍體鬨鱗爪鳴滴泡鄰域黨專鼓作齊炒丑烯亥克內酯冬加奴卯肝炎基尺梁街褲鎬客寵庭巳汝昌烷玲磊糖肇酉醛啷青縣韙良香骨鯛丂七集河市弦喜嘴張舌堵區工業姊妹星架構巧彩扭歪拼湊餘熱曜武州爺浮屠美鄉老階樹葷素碎落能魄鰓鰻珠丄丅丆万俟丈尚摸母娘量管群亞虎必我堂令申件裝伏位博俠義界表女墟臺戲臭皮匠勝諸葛亮賽頂倍催請運算包立叉戟離疫苗土史志演圍揭瓦曬夷姑婆帝村寶爛尖杉鹼屜桌山岔島由紀峽壩庫鎮廢從德後拗湯治旬食明昧曹朋友框欄極權冪曲歸依貓民氟硼氯磷鐵江侗自旅法司洋浦梅園溫暖灣焦班幸用田略番疊皇炮捶硝苯酸腺苷稜草鏡穗跳遠索錦綱聚氰胺聯店胚膲愛色堇紫羅蘭芝茶飯菱雲蟲藏藩亂叛蘇親債凳學座恐戀柱測肌腹衩錐係貂企烏跪叩軍車農題迭都甘油屯奏鍵短阿姨陪姐隻顧茅廬槽駕魂鮮鹿頁其菜單乘任供勢午齒漢組織吊調瀉唇坡城報墳外夸將尉建築岸崗公床揚新劍昇杭林栗校樓標款汽社浣海商館劇院鋼華港機械廣媒環球融第醫科證券綜財樂育游漲猶嶺疏癮瞼確兵領導繳肢膛船艾瑟爾蒼蔡虞傚衫覆訪訴課諭議軌述野鉤限敵鞋頜頷顎饒首齦站例修凡劃垂屆屬崽頦廚拜挫擺放旋削棋榻檻禮沉注滑營獄畫确儀聘花葬詔員跌轄週達酒錨閘陷陸雨雪飛威丌于丹久乏予理評產亢卑亦乎舞己悲矩圓詞害誌但住佞佳便俗信票案幅翁倦倫假偏倚斜虧鬼敲停備傷脾胃僅此像儉匱免宜穴焉戴兼容許凍伯仲負彼晝皂軒輊實刊划顛衛戰哥比省非好黃飾別拘束掩奶睬選擇搖擾煩苦枚寫協厭及格受歡迎約只估侵犯割狀告或缺抗拒挽撤救藥喻磨滅端倪少逆逾越避靠適吉譽吝玉含延咎歹聽啻淵善謀均勻堪忍夠太惹妙妥妨孕症孝術室完納推冠積宣疑辯慄碴稱屈撓屑干涉衡待很忙惡忿怎麼怠急恥恭息悅惑惜惟想愉愧怍慌憤啟懂懈懷材才緊招認扣抵拉捨也罷插揣冒搭撞南牆擴核支攻敢雷攀敬裡嗎需景智暇曾罪遇朽枉止況競爭辱求癒渝溶濟左右袒困補爽特寂寞示弱找謝畏強疾徐痛癢冤符眠睦瞅董何厚云措活疲羞者輕玻璃祥兆禁移稂莠穩佛換答簡結果盟絕縷途給談否羈翼耐肖脛毋寧興舒若菲萊痕跡窠臼虛衰臉兔撒鷹棺範該詳諱抬泰讓鬚眉象眾貲賬費灰賴奇慮訓輟辨菽麥辛近送透逞徒速續逮捕遂遑違遜斧鉞艱醉鏽隨觀棄顯飽脂肪使丏丐幫丒且慢末丕替桃宗王尊涼爵各圖屋脊糧署錄壇吾祿職胄襲君廈丗北壑桐疹損逢陵鷸丙寅戌氨腈唑綸辰酮脫氫酶醚丞丟現掉紗帽弄扯砲碗丠両丣坐存激肩臻蒂蓮悖序驅丨丩丫挺杈髻鬟細介俄伊犁京尼布訂普渡央委監察檢查劑圈設警隊斯督剩震境航舶革防托播促質版蠑螈鋒研藝歷殘消頻譜精密製造陲郵候埔堅壓壢凹匯執府究邦俘攝寮彬狼嶽肺腫庸英訊診埋粒胞括控碼韓暑槍樞砥澳哇牟壽甸鑽探篇簽綴縫繼耳肯照婦埃懸璧軸櫃檯辣擱淺邪跑纖阮陽私囊魔丮丰姿采丱燒丳丵丶丷丸參寨朗桂瑞砂衷霞貌鳳僕艦因嫌宰峰幹絡牌持旨祭禱簿編罰賓辦丼丿乀乂乃乄仰慕盛曠留考驗闊乆乇么醜麼乊湖燃乑乒乓乕乖僻忤戾离謬迕乗危肥劫除隙浪婿乙炔腸酰吡咯鹽乚乛乜嘢卿玄宮尾狐龜塔嶷兄弟泉章霄釘耙乞扎哀憐恕討乢乣乤乥乧乨乩童乪乫乭乳暈汁液瑤漿牙癌突竇罩腐膠豬酪蛋糕菌瘤乴乵乶乷乸乹乺乼乾俸冰嘉噦嚎坤媽屍壘旱枯涸俐渴潮澀煸豆燥爹瘦癟癬瞪袋脆薑貝隆餾乿亀亁叫咕攘扔搞男砸竄蓬麻亃亄亅卻亇遲典今臨繁累卵奉婚聰躬巨與遷添裂副宿歲怪噁尕崙愣杆硅硫鈦鈾錳芑雜異鈉砷胂磺琥珀艙棍簧胡茬盜浩盆販郎腿亍洪亐互欠助勉惠操斥諉繫戶譯亓墓碑刑鈴卅渠繽紛斗米旗憲釩燈徽瘟祖拳福穀豐臟腑綁肉醃苓蘊橋鋪霸顏鬧判噴岡底蛙陘礦亖亙亜罕們娜桑那努哈喀弗烈曼松森杜氏盃奧琛敦戊穆聖裔彙薛孫亟亡佚虜羊牢奮釋卷卸契媾感額睫纏誼趾塞擠紐阻還配馳莊亨洛祚亪享津滬畿郊慈菴枇杷膏亭閣鋥麗亳亶亹誅初責翻瘋偶傑叢稠妖拖寰居吸授慧蝸吞壯魅狗矛盾益渣患憂稀描猿夢暫涯畜禍緣沸搜引擎臣橫紜誰混援蒸獸獅稅剖亻亼亽亾什獻剎邡麽仂仃仄仆富怨仈仉畢昔晨殼紹仍仏仒仕宦仗欺恃腰嘆歎炬梓訖施仙后瓊逝仚仝仞仟悔仡佬償填泊拓撲簇羔購頓欽佩髮棻閫馭養億儆尤藉幀賑凌敘帖李柔剛沃眥睚戒訛取饗讀仨仫仮著泳臥躺韶夏裁仳仵唯賢憑釣誕仿似宋彿諷伀碩盼鵝伄儅伈伉儷柯始娃邁戈坦堡帕茨薩廟瑪莉莎藤霍姆伋伍奢胥廷芳豪伎倆侍汛勒希羲雛伐憩整謨閑閒伕伙伴頤伜伝伢叔恆茲恩翰伱伲侶伶俜悧鼬伸懶縮喇叭伹伺伻伽倻輻伾佀佃佇佈喬妮墨佉盧佌貸劣廉昂檔濃矮傘窪緩耗胸谷迷擋率齲宅沫舍療佐貳佑佔優據鏵嘗呢須魯曉佗佘余坪寺瓜銃僧蒙芒陀龕哼嘔坊姦孽弊揖祟繭縛誓賊佝僂瞀佟你奪趕佡佢佣佤佧賈佪佫佯佰佱潔績釀餚佴捲佶佷佸佹佺佻佼佽佾具喚窘壞娛怒慨硬習慣聾膨脹蔓駭貴痺侀侁侂侃侄侅鴻燕侇侈糜靡侉侌妾侏儒倉鼠侐侑侔侖侘侚鏈侜偎傍鈷循柳葫蘆附価侮罵蔑侯岩截蝕侷貼壺嬛宴捷攜桶箋酌俁狹膝狄俅俉俊俏俎俑俓俔諺俚俛黎健呈固墒增守康箱濕祐鏢鑣槓盒靖膜齡俞豹獵噪孚封札筒託衍鴿剪撰稿煉廠禊練繕葺俯瞰撐衝俲俳俴俵俶俷俺俻俾倀倂倅儲卒惶敷猝逃頡蓄崇隱倌倏忽刺蠟燭噍嚼坍扁抽斃蔥楣灌灶糞背藪賣賠閉霉騰倓倔倖倘倜儻倝借箸挹澆閱倡狂倢倣値倥傯倨��倩匡嗣沖柝珍倬倭寇猩倮倶倷倹勤讚偁偃充偽吏嗓寐惺扮拱芫茜藉虢鈔偈偉晶偌宕距析濾殿疼癱註頗偓偕鴨歇滯偝偟偢忘怡旺偨偩偪偫偭偯偰偱偲偵緝蹄偷減惰漏窺竊偸偺迹傀儡傅傈僳傌籬傎奎琳迪叟芭傒傔傕傖悉荒傜傞傢傣芽逼傭婢傮睨寄檄誦謠頌傴擔辜弓慘蒿悼疤傺傻屄臆巢洩篋羨蓋軋頹傿儸僄僇僉僊働僎僑僔僖僚僝僞僣僤僥僦猴僨僩僬僭僮僯僰僱僵殖籤靜僾僿征隴儁儂儃儇儈朴薄儊儋儌儍儐儓儔儕儗儘儜儞儤儦儩汰哉寡渥裕酷儭儱罐儳儵儹儺儼儽兀臬臲鷲允勛勳宙宵帥憝彞諧嫂鬩暢沛溢盈飢赫兇悍狠猛頑愚妣斬秦遣鞭耀敏榮槃澤爆碟磁禿纜輝霽鹵朵婁孜烽醬勃汀箕裘鉗耶懞蕾徹兌軟遭黜兎児韻媳爸兕觥兗兙兛兜售鍪肚兝兞兟兡兢兣樽殮涅睡稟籍贅泌啡肽奸幕涵澇熵疚眷稃襯訌赴煥椒殲植跏沒試誤猜棲窗肋袖頰兪卦撇鬍岐廓轎疸楓茴瓏廁秩募勺噸寓斤曆畝迫筷釐最淫螺韜兮寬匪篩襄贏軛複兲詐刃堰戎痞蟻餉它冀鑄冂冃円冇冉冊嫁厲礪竭醮冏牧冑冓冔冕冖冗冘冞冢窄抑誣冥冫烘菇蟄冷凝坨橇淇淋炭餅磚磧窖醋雕雹霜冱冶爐艷嘲峻灘淡漠煖颼飲冼冽凃凄愴梗凅凇凈凊凋敝濛凔凜遵汞脢凞几凢処凰凱凵凶焰凸摺刷紋預喪嘍奔巡榜殯芙蓉租籠輯鞘萃凼鋸鑊刁蠻刂娩崩批拆攤掰櫱驟歧顆秒袂贓勿囑忌磋琢膚刈羽刎訟戮舂槳艇刓刖霹靂刜創犢刡恙墅幟筵緻刦刧刨昏默攸尿慾薰潤薰圭刪刮痧鏟刱刲刳刴刵踏磅戳柏槐繡芹莧蝟舟銘鵠鶩刼剁剃辮剄剉履鉛剋剌姻咽哨廊掠桅沿召瞻翅趙卜渺茫郭剒剔剕瀝剚愎毅訥纔剜剝啄採剞剟剡剣剤綵剮腎駛黏剰袍剴紊剷剸剺剽剿劁劂劄劈啪柴扳啦劉奭姥夼昫涓熙禪禹錫翔雁鶚劊劌弩柄蜻蛉劒劓劖劘劙瀾簣賞磯釜晉甜薪逐劦熔紂虐赤囚劬劭労劵効劻劼劾峭艮勅勇勵勍勐臘脖龐漫飼盪粥輒勖勗勘驕餒碌泮雇捐竹騎殊阱勣樸懇謹勦勧勩勯勰勱勲勷勸懲慰誡諫勹芡踐闌匁庇拯粟紮袱裹餃匆遽匈匉匊匋匍匐莖匏匕妝痰膿蛹齋苑烤蹈塘羌熊閥螳螂疆碚竿緯荷茵邙魏匚匜匝匟扶稷匣匭攏匸匹耦匽匾匿卂叮瘡禧軫堤棚迢鈞鍊卄卆遐卉瓷盲瓶噹胱腱裸卋卌卍卐怯污賤鄙齷齪陋卓溪唐梯漁陳棗泥漳潯澗梨芬譙贍轅迦鄭単驢弈洽鰲卛占筮卝卞卟吩啉屎翠厄卣卨卪卬卮榫襖璽綬鈕蚤懼殆篤聳卲帘帙繞卹卼卽厂厎厓厔厖厗奚厘厙厜厝諒厠厤厥厪膩孢厮厰厳厴厹厺粕垢蕪菁厼厾叁悟茸薯叄吵笄悌哺譏坫壟弧芯杠潛嬰芻袁詰貪諜煽饋駁収岳締災賄騙叚叡吻攔蘑蜜訣燧玩硯箏椎藺銅逗驪另覓叨嘮謁杵姓喊嚷囂咚嚀塑尋惱憎擦祇泣滲蝠叱吒咄咤喝籀黛舵舷叵叶鐸懿昭穰苴遼叻叼吁塹嫖賭瞧爬衆抒吅吆夥巹橡滌抱縱摩郡唁墜扇籃膀襪頸吋愾諮酬哭妓媛暗錶韁邇妃羿絮蕃渾拐葵暮隅吔吖啶嗪戚吜嗇噬嚥吟哦詠吠吧唧嗒咐吪雋咀徵燐苞茹鈣哧吮吰吱嘎吲哚吳棟嬌窟孟簫忠晗淞闔閭趼宇吶睛噓拂捧疵熄竽笛糠吼吽呀呂韋矇呃呆笨呇貢呉罄呋喃呎呏呔呠呡癡呣呤呦呧瑛眩扒晬淑姬瑜璇鵑呪呫嗶嚅囁呬呯呰呱呲咧噌鈍呴呶呷呸呺呻哱咻嘯嚕籲坎坷邏呿咁咂咆哮咇咈咋蟹煦珅藹咍咑咒詛咔噠嚓咾噥哩喱咗咠咡咢咣咥咦咨嗟詢咩咪咫嚙齧咭咮咱咲咳嗆嗽咴咷咸咹咺咼喉咿婉慟憫賦矜綠茗藍哂搶瞞哆嗦囉噻啾濱彗哋哌哎唷喲哏哐哞哢哤哪裏哫啼喘哰哲萎蚌哳哶哽哿唄唅唆唈唉唎唏嘩堯棣殤璜睿肅唔睇唕唚唞唣喳唪唬唰喏唲唳唵嘛唶唸唹唻唼唾唿啁啃鸚鵡啅埠棧榷祺舖鞅飆啊啍啎啐啓啕啖啗啜啞祈啢啣啤啥啫啱啲啵啺饑啽噶崑沁喁喂喆裙喈嚨喋喌喎喑喒喓喔粗喙幛慶滋鵲喟喣喤喥喦喧騷喨喩梆喫葡萄喭駝挑嚇碰樅瓣純皰藻趟鉻喵営喹喺喼喿嗀嗃嗄嗅嗈嗉嗊嗍嗐嗑嗔詬嗕嗖嗙嗛嗜痂癖嗝嗡嗤嗥嗨嗩嗬嗯嗰嗲嗵嘰嗷嗹嗾嗿嘀嘁嘂嘅惋嘈峪禾蔭嘊嘌嘏嘐嘒嘓嘖嘚嘜嘞嘟囔嘣嘥嘦嘧嘬嘭這謔嚴敞饞鬆嘵嘶嘷嘸蝦嘹嘻嘽嘿噀噂噅噇噉噎噏噔噗噘噙噚噝噞噢噤蟬皿噩噫噭噯噱噲噳嚏涌灑欲巫霏噷噼嚃嚄嚆抖嚌嚐嚔囌嚚嚜嚞嚟嚦嚬嚭嚮嚯嚲嚳飭按竣苛嚵嚶囀囅囈膪謙囍囒囓囗囘蕭酚飄濺諦囝溯眸紇鑾鶻囟殉囡団囤囥囧囨囪囫圇囬囮囯囲図囶囷囸囹圄圉擬囻囿圀圂圃圊粹蠹赦圌墾圏滾鯡鑿枘圕圛圜圞坯埂壤骸炕祠窯豚紳魠鯪鱉圧握圩圪垯圬圮圯炸岬幔毯祇窨菩溉圳圴圻圾坂坆沾坋坌舛壈昆墊墩椅坒坓坩堝坭坰坱坳坴坵坻坼楊掙涎簾垃垈垌垍垓垔垕垗垚垛垝垣垞垟垤垧垮垵垺垾垿埀畔埄埆埇埈埌殃隍埏埒埕埗埜埡埤埦埧埭埯埰埲埳埴埵埶紼埸培怖樁礎輔埼埽堀訶姪廡堃堄摧磐貞韌砌堈堉堊堋堌堍堎堖堙堞堠礁堧堨輿堭堮蜓摘堲堳堽堿塁塄塈煤塋棵塍塏塒塓綢���鴉沽虱塙塚塝繆塡塢塤塥塩塬塱塲蟎塼塽塾塿墀墁墈墉墐夯増毀墝墠墦漬缽墫墬墮墰墺墻櫥壅壆壊壌壎壒榨蒜壔壕壖壙壚壜壝壠壡壬壭壱売壴壹壻壼寢壿夂夅夆変夊夌漱邑夓腕泄甥禦骼夗夘夙袞瑙妊娠醣梟珊鶯鷺戧幻魘夤蹀祕擂鶇姚宛閨嶼庾撻拇賛蛤裨菠氅漓撈湄蚊霆鯊箐篆篷荊肆舅荔鮃巷慚骰辟邱鎔鐮阪漂燴鯢鰈鱷鴇臚鵬妒峨譚枰晏璣癸祝秤竺牡籟恢罡螻蠍賜絨御梭夬夭砣榆怙枕夶夾餡奄崛葩譎奈賀祀贈奌奐奓奕訢詝奘奜奠奡奣陶奨奩魁奫奬奰媧孩貶隸酥宄狡猾她奼嫣妁氈荼皋膻蠅嬪妄妍嫉媚嬈妗趣妚妞妤礙妬婭妯娌妲妳妵妺姁姅姉姍姒姘姙姜姝姞姣姤姧姫姮娥姱姸姺姽婀娀誘懾脅娉婷娑娓娟娣娭娯娵娶娸娼婊婐婕婞婤婥谿孺婧婪婬婹婺婼婽媁媄媊媕媞媟媠媢媬媮媯媲媵媸媺媻媼眯媿嫄嫈嫋嫏嫕嫗嫘嫚嫜嫠嫡嫦嫩嫪毐嫫嫬嫰嫵嫺嫻嫽嫿嬀嬃嬅嬉耍嬋痴豔嬔嬖嬗嬙嬝嬡嬢嬤嬦嬬嬭幼嬲嬴嬸嬹嬾嬿孀孃孅孌孏曰癲屏孑孓雀孖斟簍謎摺孛矻鳩崮軻祜鸞孥邈毓棠臏孬孭孰孱孳孵泛罔銜孻孿宀宁宂拙株薇掣撫琪瓿榴謐彌宊濂祁瑕宍宏碁宓邸讞実潢町宥宧宨宬徵崎駿掖闕臊煮禽蠶宸豫寀寁寥寃簷庶寎暄磣寔寖寘寙寛寠苫寤肘洱濫蒗陝覈寪弘綽螽寳擅疙瘩晷対檐専尃尅贖絀繚疇釁尌峙醌襟痲碧屁昊槌淘恵瀑牝畑莓缸羚覷蔻髒躁尒尓銳尗尙尜尟尢尥尨尪尬尭尰擒尲尶尷尸尹潽蠖蛾尻釦梢蚴鰭脬蹲屇屌蚵屐屓挪屖屘屙屛屝屢屣巒嶂巖舄屧屨屩屪屭屮戍駐鉀崖嵛巔旮旯楂欖櫸芋茱萸靛麓屴屹屺屼岀岊岌岍阜岑彭鞏岒岝岢嵐岣岧岨岫岱岵岷峁峇峋峒峓峞峠嵋峩峯峱峴峹峿崀崁崆禎崋崌崍嶇崐崒崔嵬巍螢顥崚崞崟崠崢巆崤崦崧殂崬崱崳崴崶崿嵂嵇嵊泗嵌嵎嵒嵓嵗嵙嵞嵡嵩嵫嵯嵴嵼嵾嶁嶃嶄晴嶋嶌嶒嶓嶔嶗嶙嶝嶞嶠嶡嶢嶧嶨嶭嶮嶰嶲嶴嶸巂巃巇巉巋巌巓巘巛滇芎巟巠弋迴巣巤炊擘蜥蟒蠱覡巰蜀彥淖杏茂甫楞巻巽幗巿帛斐鯽蕊帑帔帗帚琉汶帟帡帣帨帬帯帰帷帹暆幃幄幇幋幌幏幘幙幚幞幠幡幢幦幨幩幪幬幭幯幰遙蹉跎餘庚鑑幵幷稚邃庀庁広庄庈庉笠庋跋庖犧庠庤庥鯨庬庱庳庴庵馨衢庹庿廃廄廆廋廌廎廏廐廑廒廕廖廛廝搏鑼廞弛袤廥廧廨廩廱綿踵髓廸廹甌鄴廻廼廾廿躔弁皺弇弌弍弎弐弒弔詭憾薦弝弢弣弤弨弭弮弰弳霖繇燾斌旭溥騫弶弸弼弾彀彄彆纍糾彊彔彖彘彟彠陌彤貽彧繪虹彪炳彫蔚鷗彰癉彲彳彴彷彷徉徨彸彽踩斂旆徂徇徊渭畬鉉裼従筌徘徙徜徠膳甦萌漸徬徭醺徯徳徴潘徻徼忀瘁胖燎怦悸顫扉犀澎湃砰恍惚絞隘忉憚挨餓忐忑忒忖応忝忞耿忡忪忭忮忱忸怩忻悠懣怏遏怔怗怚怛怞懟黍訝怫怭懦怱怲怳怵惕怸怹恁恂恇恉恌恏恒恓恔恘恚恛恝恞恟恠恣恧眄恪恫恬澹恰恿悀悁悃悄悆悊悐悒晦悚悛悜悝悤您悩悪悮悰悱悽惻悳悴悵惘悶悻悾惄愫鍾蒐惆惇惌惎惏惓惔惙惛耄惝瘧濁惥惦惪惲惴惷惸拈愀愃愆愈愊愍愐愑愒愓愔愕愙氓蠢騃昵愜赧愨愬愮愯愷愼慁慂慅慆慇靄慉慊慍慝慥慪慫慬慱慳慴慵慷慼焚憀灼鬱憃憊憋憍眺捏軾憒憔憖憙憧憬憨憪憭憮憯憷憸憹憺懃懅懆邀懊懋懌懍懐懞懠懤懥懨懫懮懰懱毖懵遁樑雍懺懽戁戄戇戉戔戕戛戝戞戠戡戢戣戤戥戦戩戭戯轟戱披菊牖戸戹戺戻戼戽鍬扂楔扃扆扈扊杖牽絹銬鐲賚扐摟攪烊盹瞌跟躉鑔靶鼾払扗玫腮扛扞扠扡扢盔押扤扦扱罾揄綏鞍郤窾扻扼扽抃抆抈抉抌抏瞎抔繯縊擻抜抝択抨摔歉躥牾抶抻搐泵菸拃拄拊髀拋拌脯拎拏拑擢秧沓曳攣迂拚拝拠拡拫拭拮踢拴拶拷攢拽掇芥橐簪摹疔挈瓢驥捺蹻挌挍挎挐揀挓挖掘浚挙揍聵挲挶挾挿捂捃捄捅捆捉捋胳膊揎捌捍捎軀蛛捗捘捙捜捥捩捫捭据捱捻捼捽掀掂掄臀膘掊掎掏掐笙掔掗掞棉芍掤搪闡掫掮掯揉掱掲掽掾揃揅揆搓揌諢揕揗揘揜揝揞揠揥揩揪揫櫫遒麈揰揲揵揶揸揹揺搆搉搊搋搌搎搔搕撼櫓搗搘搠搡搢搣搤搥搦搧搨搬楦褳訕赸搯搰搲搳搴搵搷搽搾搿摀摁摂摃摎摑摒摓跤摙摛摜摞摠摦睺羯摭摮摯摰摲摳摴摶摷摻摽撂撃撅稻撊撋撏鐧潑撕撙撚撝撟撢撣撦撧撩撬撱朔撳蚍蜉撾撿擀擄闖擉缶觚擐擕擖擗擡擣擤澡腚擧擨擩擫擭擯擰擷擸擼擽擿攃攄攆攉攥攐攓攖攙攛每攩攫轡澄攮攰攲攴軼攷砭訐攽碘敁敃敇敉敍敎筏敔敕敖閏誨敜煌敧敪敱敹敺敻敿斁衽斄牒縐謅斉斎斕鶉讕駮鱧斒筲斛斝斞斠斡斢斨斫斮晾沂潟穎絳邵斲斸釳於琅斾斿旀旂旃旄渦旌旎旐旒旓旖旛旝旟旡旣浴旰獺魃旴旹旻旼旽昀昃昄昇昉晰躲澈熹皎皓礬昑昕昜昝昞昡昤暉筍昦昨昰昱昳昴昶昺昻晁蹇隧蔬髦晄晅晒晛晜晞晟晡晢晤晥曦晩萘瑩顗晿暁暋暌暍暐暔暕煅暘暝暠暡曚暦暨暪朦朧暱暲殄馮暵暸暹暻暾曀曄曇曈曌曏曐曖曘曙曛曡曨曩駱曱甴肱曷牘禺錕曽滄耽朁朅朆杪栓誇竟粘絛朊膺朏朐朓朕朘朙瞄覲溘饔飧朠朢朣柵椆澱蝨朩朮朰朱炆璋鈺熾鹮朳槿朶朾朿杅杇杌隉欣釗湛漼楷瀍煜玟纓翱肈舜贄适逵杓杕杗杙荀蘅杝杞脩珓筊杰榔狍閦顰緬莞杲杳眇杴杶杸杻杼枋枌枒枓衾葄翹紓逋枙狸椏枟槁枲枳枴枵枷枸櫞枹枻柁柂柃柅柈柊柎某柑橘柒柘柙柚柜柞櫟柟柢柣柤柩柬柮柰柲橙柶柷柸柺査柿栃栄栒栔栘栝栟栢栩栫栭栱栲栳栴檀栵栻桀驁桁鎂桄桉桋桎梏椹葚桓桔桕桜桟桫欏桭桮桯桲桴桷桹湘溟梃梊梍梐潼梔梘梜梠梡梣梧梩梱梲梳梴梵梹棁棃櫻棐棑棕櫚簑繃蓑棖棘棜棨棩棪棫棬棯棰棱棳棸棹槨棼椀椄苕椈椊椋椌椐椑椓椗検椤椪椰椳椴椵椷椸椽椿楀楄楅篪楋楍楎楗楘楙楛楝楟楠楢楥楨楩楪楫楬楮楯楰楳楸楹楻楽榀榃榊榎槺榕榖榘榛狉莽榜笞榠榡榤榥榦榧榪榭榰榱槤霰榼榾榿槊閂槎槑槔槖様槜槢槥槧槪槭槮槱槲槻槼槾樆樊樏樑樕樗樘樛樟樠樧樨権樲樴樵猢猻樺樻罍樾樿橁橄橆橈笥龠橕橚橛輛橢橤橧豎膈跨橾橿檁檃檇檉檍檎檑檖檗檜檟檠檣檨檫檬檮檳檴檵檸櫂櫆櫌櫛櫜櫝櫡櫧櫨櫪櫬櫳櫹櫺茄櫽欀欂欃欐欑欒欙欞溴欨欬欱欵欶欷歔欸欹欻欼欿歁歃歆艎歈歊蒔蝶歓歕歘歙歛歜歟歠蹦詮鑲蹣跚陞陟歩歮歯歰歳歴璞歺瞑歾歿殀殈殍殑殗殜殙殛殞殢殣殥殪殫殭殰殳荃殷殸殹蛟殻殽謗毆毈毉餵毎毑蕈毗毘毚茛鄧毧毬毳毷毹毽毾毿氂氄氆靴氉氊氌氍氐聊氕氖気氘氙氚氛氜氝氡洶焊痙氤氳氥氦鋁鋅氪烴氬銨痤汪滸漉痘盂碾菖蒲蕹蛭螅氵氷氹氺氽燙氾氿渚汆汊汋汍汎汏汐汔汕褟汙汚汜蘺沼穢衊汧汨汩汭汲汳汴隄汾沄沅沆瀣沇沈葆浸淪湎溺痼痾沌沍沏沐沔沕沘浜畹礫沚沢沬沭沮沰沱灢沴沷籽沺烹濡洄泂肛泅泆湧肓泐泑泒泓泔泖泙泚泜泝泠漩饃濤粼濘蘚鰍泩泫泭泯銖泱泲洇洊涇琵琶荽薊箔洌洎洏洑潄濯洙洚洟洢洣洧洨洩痢滔洫洮洳洴洵洸洹洺洼洿淌蜚浄浉浙贛渫浠浡浤浥淼瀚浬浭翩萍浯浰蜃淀苔蛞蝓蜇螵蛸煲鯉浹浼浽溦涂涊涐涑涒涔滂涖涘涙涪涫涬涮涴涶涷涿淄淅淆淊淒黯淓淙漣淜淝淟淠淢淤淥淦淩猥藿褻淬淮淯淰淳詣淶紡淸淹燉癯綺渇済渉渋渓渕渙渟渢滓渤澥渧渨渮渰渲渶渼湅湉湋湍湑湓湔黔湜湝湞湟湢湣湩湫湮麟湱湲湴湼満溈溍溎溏溛舐漭溠溤溧馴溮溱溲溳溵溷溻溼溽溾滁滃滉滊滎滏稽滕滘滙滝滫滮羼耷滷滹滻煎漈漊漎繹漕漖漘漙漚漜漪漾漥漦漯漰漵漶漷濞潀潁潎潏潕潗潚潝潞潠潦祉瘍潲潵潷潸潺潾潿澁澂澃澉澌澍澐澒澔澙澠澣澦澧澨澫澬澮澰澴澶澼熏郁濆濇濈濉濊貊濔疣濜濠濩觴濬濮盥濰濲濼瀁瀅瀆瀋瀌瀏瀒瀔瀕瀘瀛瀟瀠瀡瀦瀧瀨瀬瀰瀲瀳瀵瀹瀺瀼灃灄灉灋灒灕灖灝灞灠灤灥灨灩灪蜴灮燼獴灴灸灺炁炅魷炗炘炙炤炫疽烙釺炯炰炱炲炴炷燬炻烀烋瘴鯧烓烔焙烜烝烳飪烺焃焄耆焌焐焓焗焜焞焠焢焮焯焱焼煁煃煆煇煊熠煍熬煐煒煕煗燻礆霾煚煝煟煠煢矸煨瑣煬萁煳煺煻熀熅熇熉羆熒穹熗熘熛熜稔諳爍熤熨熯熰眶螞熲熳熸熿燀燁燂燄盞燊燋燏燔隼燖燜燠燡燦燨燮燹燻燽燿爇爊爓爚爝爟爨蟾爯爰爲爻爿爿牀牁牂牄牋牎牏牓牕釉牚腩蒡虻牠雖蠣牣牤牮牯牲牳牴牷牸牼絆牿靬犂犄犆犇犉犍犎犒犖犗犛犟犠犨犩犪犮犰狳犴犵犺狁甩狃狆狎狒獾狘狙黠狨狩狫狴狷狺狻豕狽蜘猁猇猈猊猋猓猖獗猗猘猙獰獁猞猟獕猭猱猲猳猷猸猹猺玃獀獃獉獍獏獐獒獘獙獚獜獝獞獠獢獣獧鼇蹊獪獫獬豸獮獯鬻獳獷獼玀玁菟玅玆玈珉糝禛郅玍玎玓瓅玔玕玖玗玘玞玠玡玢玤玥玦玨瑰玭玳瑁玶玷玹玼珂珇珈瑚珌饈饌珔珖珙珛珞珡珣珥珧珩珪珮珶珷珺珽琀琁隕琊琇琖琚琠琤琦琨琫琬琭琮琯琰琱琲瑯琹琺琿瑀瑂瑄瑉瑋瑑瑔瑗瑢瑭瑱瑲瑳瑽瑾瑿璀璨璁璅璆璈璉璊璐璘璚璝璟璠璡璥璦璩璪璫璯璲璵璸璺璿瓀瓔瓖瓘瓚瓛臍瓞瓠瓤瓧瓩瓮瓰瓱瓴瓸瓻瓼甀甁甃甄甇甋甍甎甏甑甒甓甔甕甖甗飴蔗甙詫鉅粱盎銹糰甡褥産甪甬甭甮甯鎧甹甽甾甿畀畁畇畈畊畋畎畓畚畛畟鄂畤畦畧荻畯畳畵畷畸畽畾疃疉疋疍疎簞疐疒疕疘疝疢疥疧疳疶疿痁痄痊痌痍痏痐痒痔痗瘢痚痠痡痣痦痩痭痯痱痳痵痻痿瘀瘂瘃瘈瘉瘊瘌瘏瘐瘓瘕瘖瘙瘚瘛瘲瘜瘝瘞瘠瘥瘨瘭瘮瘯瘰癧瘳癘瘵瘸瘺瘻瘼癃癆癇癈癎癐癔癙癜癠癤癥癩蟆癪癭癰発踔紺蔫酵皙砬砒翎翳蘞鎢鑞皚鵯駒鱀粵褶皀皁莢皃鎛皈皌皐皒硃皕皖皘皜皝皞皤皦皨皪皫皭糙綻皴皸皻皽盅盋盌盍盚盝踞盦盩鞦韆盬盭眦睜瞤盯盱眙裰盵盻睞眂眅眈眊県眑眕眚眛眞眢眣眭眳眴眵眹瞓眽郛睃睅睆睊睍睎睏睒睖睙睟睠睢睥睪睪睯睽睾瞇瞈瞋瞍逛瞏瞕瞖瞘瞜瞟瞠瞢瞫瞭瞳瞵瞷瞹瞽闍瞿矓矉矍鑠矔矗矙矚矞矟矠矣矧矬矯矰矱硪碇磙��舫阡、矼矽礓砃砅砆砉砍砑砕砝砟砠砢砦砧砩砫砮砳艏砵砹砼硇硌硍硎硏硐硒硜硤硨磲茚鋇硭硻硾碃碉碏碣碓碔碞碡碪碫碬碭碯碲碸碻礡磈磉磎磑磔磕磖磛磟磠磡磤磥蹭磪磬磴磵磹磻磽礀礄礅礌礐礚礜礞礤礧礮礱礲礵礽礿祂祄祅祆禳祊祍祏祓祔祕祗祘祛祧祫祲祻祼餌臠錮禂禇禋禑禔禕隋禖禘禚禜禝禠禡禢禤禥禨禫禰禴禸稈秈秊闈颯秌秏秕笈蘵賃秠秣秪秫秬秭秷秸稊稌稍稑稗稙稛稞稬稭稲稹稼顙稾穂穄穇穈穉穋穌貯穏穜穟穠穡穣穤穧穨穭穮穵穸窿闃窀窂窅窆窈窕窊窋窌窒窓窔窞窣窬黷蹙窰窳窴窵窶窸窻竁竃竈竑竜竝竦竪篦篾笆鮫竾笉笊笎笏笐靨笓笤籙笪笫笭笮笰笱笲笳笵笸笻筀筅筇筈筎筑筘筠筤筥筦筧筩筭筯筰筱筳筴讌筸箂箇箊箎箑箒箘箙箛箜篌箝箠箬鏃箯箴箾篁篔簹篘篙篚篛篜篝篟篠篡篢篥篧篨篭篰篲篳篴篶篹篼簀簁簃簆簉簋簌簏簜簟簠簥簦簨簬簰簸簻籊籐籒籓籔籖籚籛籜籣籥籧籩籪籫籯芾麴籵籸籹籼粁粃粋粑粔糲粛粞粢粧粨粲粳粺粻粽闢粿糅糆糈糌糍糒糔萼糗蛆蹋糢糨糬糭糯糱糴糶糸糺紃蹼鰹黴紆紈絝紉閩襻紑紕紘錠鳶鷂紝紞紟紥紩紬紱紲紵紽紾紿絁絃絅経絍絎絏縭褵絓絖絘絜絢絣螯絪絫聒絰絵絶絺絻絿綀綃綅綆綈綉綌綍綎綑綖綘継続緞綣綦綪綫綮綯綰罟蝽綷縩綹綾緁緄緅緆緇緋緌緎総緑緔緖緗緘緙緜緡緤緥緦纂緪緰緱緲緶緹縁縃縄縈縉縋縏縑縕縗縚縝縞縟縠縡縢縦縧縯縰騁縲縳縴縵縶縹縻衙縿繄繅繈繊繋繐繒繖繘繙繠繢繣繨繮繰繸繻繾纁纆纇纈纉纊纑纕纘纙纚纛缾罃罆罈罋罌罎罏罖罘罛罝罠罣罥罦罨罫罭鍰罳罶罹罻罽罿羂羃羇羋蕉51鴕羑羖羗羜羝羢羣羥羧羭羮羰羱羵羶羸藜鮐翀翃翄翊翌翏翕翛翟翡翣翥翦躚翪翫翬翮翯翺翽翾翿闆饕鴰鍁耋耇耎耏耑耒耜耔耞耡耤耨耩耪耬耰鬢耵聹聃聆聎聝聡聦聱聴聶聼閾聿肄肏肐肕腋肙肜肟肧胛肫肬肭肰肴肵肸肼胊胍胏胑胔胗胙胝胠銓胤胦胩胬胭胯胰胲胴胹胻胼胾脇脘脝脞脡脣脤脥脧脰脲脳腆腊腌臢腍腒腓腖腜腠腡腥腧腬腯踝蹬鐐腴腶蠕誹膂膃膆膇膋膔膕膗膙膟黐膣膦膫膰膴膵膷膾臃臄臇臈臌臐臑臓臕臖臙臛臝臞臧蓐詡臽臾臿舀舁鰟鮍舋舎舔舗舘舝舠舡舢舨舭舲舳舴舸舺艁艄艅艉艋艑艕艖艗艘艚艜艟艣艤艨艩艫艬艭荏艴艶艸艹艻艿芃芄芊萰陂藭芏芔芘芚蕙芟芣芤茉芧芨芩芪芮芰鰱芴芷芸蕘豢芼芿苄苒苘苙苜蓿苠苡苣蕒苤苧苪鎊苶苹苺苻苾茀茁范蠡萣茆茇茈茌茍茖茞茠茢茥茦菰茭茯茳藨茷藘茼荁荄荅荇荈菅蜢鴞荍荑荘荳荵荸薺莆莒莔莕莘莙莚莛莜莝莦莨菪莩莪莭莰莿菀菆菉菎菏菐菑菓菔菕菘菝菡菢菣菥蓂菧菫轂鎣菶菷菹醢菺菻菼菾萅萆萇萋萏萐萑萜萩萱萴萵萹萻葇葍葎葑葒葖葙葠葥葦葧葭葯葳葴葶葸葹葽蒄蒎蒓蘢薹蒞蒟蒻蒢蒦蒨蒭藁蒯蒱鉾蒴蒹蒺蒽蓀蓁蓆蓇蓊蓌蓍蓏蓓蓖蓧蓪蓫蓽跣藕蓯蓰蓱蓴蓷蓺蓼蔀蔂蔃蔆蔇蔉蔊蔋蔌蔎蔕蔘蔙蔞蔟鍔蔣雯蔦蔯蔳蔴蔵蔸蔾蕁蕆蕋蕍蕎蕐蕑蕓蕕蕖蕗蕝蕞蕠蕡蕢蕣蕤蕨蕳蕷蕸蕺蕻薀薁薃薅薆薈薉薌薏薐薔薖薘薙諤釵薜薠薢薤薧薨薫薬薳薶薷薸薽薾薿藄藇藋藎藐藙藚藟藦藳藴藶藷藾蘀蘁蘄蘋蘗蘘蘝蘤蘧蘩蘸蘼虀虆虍蟠虒虓虖虡虣虥虩虯虰蛵虵虷鱒虺虼蚆蚈蚋蚓蚔蚖蚘蚜蚡蚣蚧蚨蚩蚪蚯蚰蜒蚱蚳蚶蚹蚺蚻蚿蛀蛁蛄蛅蝮蛌蛍蛐蟮蛑蛓蛔蛘蛚蛜蛡蛣蜊蛩蛺蛻螫蜅蜆蜈蝣蜋蜍蜎蜑蠊蜛餞蜞蜣蜨蜩蜮蜱蜷蜺蜾蜿蝀蝃蝋蝌蝍蝎蝏蝗蝘蝙蝝鱝蝡蝤蝥蝯蝰蝱蝲蝴蝻螃蠏螄螉螋螒螓螗螘螙螚蟥螟螣螥螬螭螮螾螿蟀蟅蟈蟊蟋蟑蟓蟛蟜蟟蟢蟣蟨蟪蟭蟯蟳蟶蟷蟺蟿蠁蠂蠃蠆蠋蠐蠓蠔蠗蠙蠚蠛蠜蠧蠨蠩蠭蠮蠰蠲蠵蠸蠼蠽衁衂衄衇衈衉衋衎衒衕衖衚衞裳鈎衭衲衵衹衺衿袈裟袗袚袟袢袪袮袲袴袷袺袼褙袽裀裉裊裋裌裍裎裒裛裯裱裲裴裾褀褂褉褊褌褎褐褒褓褔褕褘褚褡褢褦褧褪褫褭褯褰褱襠褸褽褾襁襃襆襇襉襋襌襏襚襛襜襝襞襡襢襤襦襫襬襭襮襴襶襼襽襾覂覃覅覇覉覊覌覗覘覚覜覥覦覧覩覬覯覰観覿觔觕觖觜觽觝觡酲觩觫觭觱觳觶觷觼觾觿言賅訃訇訏訑訒詁託訧訬訳訹証訾詀詅詆譭詈詊詎詑詒詖詗詘詧詨詵詶詸詹詻詼詿誂誃誄鋤誆誋誑誒誖誙誚誥誧説読誯誶誾諂諄諆諌諍諏諑諕諗諛諝諞諟諠諡諴諵諶諼謄謆謇謌謍謏謑謖謚謡謦謪謫謳謷謼謾譁譅譆譈譊譌譒譔譖鑫譞譟譩譫譬譱譲譴譸譹譾讅讆讋讌讎讐讒讖讙讜讟谽豁豉豇豈豊豋豌豏豔豞豖豗豜豝豣豦豨豭豱豳豵豶豷豺豻貅貆貍貎貔貘貙貜貤饜貰餸貺賁賂賏賒賕賙賝賡賧賨賫鬭賮賵賸賺賻賾贇贉贐贔贕贗赬赭赱赳迄趁趂趄趐趑趒趔趡趦趫趮趯趲趴趵趷趹趺趿跁跂跅跆躓蹌跐跕跖跗跙跛跦跧跩跫跬跮跱跲跴跺跼跽踅踆踈踉踊踒���踘踜踟躇躕踠踡踣踤踥踦踧蹺踫踮踰踱踴踶踹踺踼踽躞蹁蹂躪蹎蹐蹓蹔蹕蹚蹜蹝蹟蹠蹡蹢躂蹧蹩蹪蹯鞠蹽躃躄躅躊躋躐躑躒躘躙躛躝躠躡躦躧躩躭躰躳躶軃軆輥軏軔軘軜軝齶転軥軨軭軱軲轆軷軹軺軽軿輀輂輦輅輇輈輓輗輙輜輞輠輤輬輭輮輳輴輵輶輹輼輾轀轇轏轑轒轔轕轖轗轘轙轝轞轢轤辠辢辤辵辶辺込辿迅迋迍麿迓迣迤邐迥迨迮迸迺迻迿逄逅逌逍逑逓逕逖逡逭逯逴逶逹遄遅遉遘遛遝遢遨遫遯遰遴遶遹遻邂邅邉邋邎邕邗邘邛邠邢邧邨邯鄲邰邲邳邴邶邷邽邾邿郃郄郇郈郔郕郗郙郚郜郝郞郟郠郢郪郫郯郰郲郳郴郷郹郾郿鄀鄄鄆鄇鄈鄋鄍鄎鄏鄐鄑鄒鄔鄕鄖鄗鄘鄚鄜鄞鄠鄢鄣鄤鄦鄩鄫鄬鄮鄯鄱鄶鄷鄹鄺鄻鄾鄿酃酅酆酇酈酊酋酎酏酐酣酔酕醄酖酗酞酡酢酤酩酴酹酺醁醅醆醊醍醐醑醓醖醝醞醡醤醨醪醭醯醰醱醲醴醵醸醹醼醽醾釂釃釅釆釈鱸鎦閶釓釔釕鈀釙鼢鼴釤釧釪釬釭釱釷釸釹鈁鈃鈄鈆鈇鈈鈊鈌鈐鈑鈒鈤鈥鈧鈬鈮鈰鈳鐺鈸鈹鈽鈿鉄鉆鉈鉋鉌鉍鉏鉑鉕鉚鉢鉥鉦鉨鉬鉭鉱鉲鉶鉸鉺鉼鉿銍銎銑銕鏤銚銛銠銣銤銥銦銧銩銪銫銭銰銲銶銻銼銾鋂鋃鋆鋈鋊鋌鋍鋏鋐鋑鋕鋘鋙鋝鋟鋦鋨鋩鋭鋮鋯鋰鋱鋳鋹鋺鋻鏰鐱錀錁錆錇錈錍錏錒錔錙錚錛錞錟錡錤錩錬録錸錼鍀鍆鍇鍉鍍鍏鍐鍘鍚鍛鍠鍤鍥鍩鍫鍭鍱鍴鍶鍹鍺鍼鍾鎄鎇鎉鎋鎌鎍鎏鎒鎓鎗鎘鎚鎞鎡鎤鎩鎪鎭鎯鎰鎳鎴鎵鎸鎹鎿鏇鏊鏌鏐鏑鏖鏗鏘鏚鏜鏝鏞鏠鏦鏨鏷鏸鏹鏻鏽鏾鐃鐄鐇鐏鐒鐓鐔鐗馗鐙鐝鐠鐡鐦鐨鐩鐫鐬鐱鐳鐶鐻鐽鐿鑀鑅鑌鑐鑕鑚鑛鑢鑤鑥鑪鑭鑯鑱鑴鑵鑷钁钃镻閆閈閌閎閒閔閗閟閡関閤閤閧閬閲閹閺閻閼閽閿闇闉闋闐闑闒闓闘闚闞闟闠闤闥阞阢阤阨阬阯阹阼阽陁陑陔陛陜陡陥陬騭陴険陼陾隂隃隈隒隗隞隠隣隤隩隮隰顴隳隷隹雂雈雉雊雎雑雒雗雘雚雝雟雩雰雱驛霂霅霈霊霑霒霓霙霝霢霣霤霨霩霪霫霮靁靆靉靑靚靣靦靪靮靰靳靷靸靺靼靿鞀鞃鞄鞌鞗鞙鞚鞝鞞鞡鞣鞨鞫鞬鞮鞶鞹鞾韃韅韉馱韍韎韔韖韘韝韞韡韣韭韮韱韹韺頀颳頄頇頊頍頎頏頒頖頞頠頫頬顱頯頲頴頼顇顋顑顒顓顔顕顚顜顢顣顬顳颭颮颱颶颸颺颻颽颾颿飀飂飈飌飜飡飣飤飥飩飫飮飱飶餀餂餄餎餇餈餑餔餕餖餗餚餛餜餟餠餤餧餩餪餫餬餮餱餲餳餺餻餼餽餿饁饅饇饉饊饍饎饐饘饟饢馘馥馝馡馣騮騾馵馹駃駄駅駆駉駋駑駓駔駗駘駙駜駡駢駪駬駰駴駸駹駽駾騂騄騅騆騉騋騍騏驎騑騒験騕騖騠騢騣騤騧驤騵騶騸騺驀驂驃驄驆驈驊驌驍驎驏驒驔驖驙驦驩驫骺鯁骫骭骯骱骴骶骷髏骾髁髂髄髆髈髐髑髕髖髙髝髞髟髡髣髧髪髫髭髯髲髳髹髺髽髾鬁鬃鬅鬈鬋鬎鬏鬐鬑鬒鬖鬗鬘鬙鬠鬣鬪鬫鬬鬮鬯鬰鬲鬵鬷魆魈魊魋魍魎魑魖鰾魛魟魣魦魨魬魴魵魸鮀鮁鮆鮌鮎鮑鮒鮓鮚鮞鮟鱇鮠鮦鮨鮪鮭鮶鮸鮿鯀鯄鯆鯇鯈鯔鯕鯖鯗鯙鯠鯤鯥鯫鯰鯷鯸鯿鰂鰆鶼鰉鰋鰐鰒鰕鰛鰜鰣鰤鰥鰦鰨鰩鰮鰳鰶鰷鱺鰼鰽鱀鱄鱅鱆鱈鱎鱐鱓鱔鱖鱘鱟鱠鱣鱨鱭鱮鱲鱵鱻鲅鳦鳧鳯鳲鳷鳻鴂鴃鴄鴆鴈鴎鴒鴔鴗鴛鴦鴝鵒鴟鴠鴢鴣鴥鴯鶓鴳鴴鴷鴽鵀鵁鵂鵓鵖鵙鵜鶘鵞鵟鵩鵪鵫鵵鵷鵻鵾鶂鶊鶏鶒鶖鶗鶡鶤鶦鶬鶱鶲鶵鶸鶹鶺鶿鷀鷁鷃鷄鷇鷈鷉鷊鷏鷓鷕鷖鷙鷞鷟鷥鷦鷯鷩鷫鷭鷳鷴鷽鷾鷿鸂鸇鸊鸏鸑鸒鸓鸕鸛鸜鸝鹸鹹鹺麀麂麃麄麇麋麌麐麑麒麚麛麝麤麩麪麫麮麯麰麺麾黁黈黌黢黒黓黕黙黝黟黥黦黧黮黰黱黲黶黹黻黼黽黿鼂鼃鼅鼈鼉鼏鼐鼒鼕鼖鼙鼚鼛鼡鼩鼱鼪鼫鼯鼷鼽齁齆齇齈齉齌齎齏齔齕齗齙齚齜齞齟齬齠齢齣齧齩齮齯齰齱齵齾龎龑龒龔龖龘龝龡龢龤'
20
+
21
+ assert len(simplified_charcters) == len(simplified_charcters)
22
+
23
+ s2t_dict = {}
24
+ t2s_dict = {}
25
+ for i, item in enumerate(simplified_charcters):
26
+ s2t_dict[item] = traditional_characters[i]
27
+ t2s_dict[traditional_characters[i]] = item
28
+
29
+
30
+ def tranditional_to_simplified(text: str) -> str:
31
+ return "".join(
32
+ [t2s_dict[item] if item in t2s_dict else item for item in text])
33
+
34
+
35
+ def simplified_to_traditional(text: str) -> str:
36
+ return "".join(
37
+ [s2t_dict[item] if item in s2t_dict else item for item in text])
38
+
39
+
40
+ if __name__ == "__main__":
41
+ text = "一般是指存取一個應用程式啟動時始終顯示在網站或網頁瀏覽器中的一個或多個初始網頁等畫面存在的站點"
42
+ print(text)
43
+ text_simple = tranditional_to_simplified(text)
44
+ print(text_simple)
45
+ text_traditional = simplified_to_traditional(text_simple)
46
+ print(text_traditional)
zh_normalization/chronology.py ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ import re
15
+
16
+ from .num import DIGITS
17
+ from .num import num2str
18
+ from .num import verbalize_cardinal
19
+ from .num import verbalize_digit
20
+
21
+
22
+ def _time_num2str(num_string: str) -> str:
23
+ """A special case for verbalizing number in time."""
24
+ result = num2str(num_string.lstrip('0'))
25
+ if num_string.startswith('0'):
26
+ result = DIGITS['0'] + result
27
+ return result
28
+
29
+
30
+ # 时刻表达式
31
+ RE_TIME = re.compile(r'([0-1]?[0-9]|2[0-3])'
32
+ r':([0-5][0-9])'
33
+ r'(:([0-5][0-9]))?')
34
+
35
+ # 时间范围,如8:30-12:30
36
+ RE_TIME_RANGE = re.compile(r'([0-1]?[0-9]|2[0-3])'
37
+ r':([0-5][0-9])'
38
+ r'(:([0-5][0-9]))?'
39
+ r'(~|-)'
40
+ r'([0-1]?[0-9]|2[0-3])'
41
+ r':([0-5][0-9])'
42
+ r'(:([0-5][0-9]))?')
43
+
44
+
45
+ def replace_time(match) -> str:
46
+ """
47
+ Args:
48
+ match (re.Match)
49
+ Returns:
50
+ str
51
+ """
52
+
53
+ is_range = len(match.groups()) > 5
54
+
55
+ hour = match.group(1)
56
+ minute = match.group(2)
57
+ second = match.group(4)
58
+
59
+ if is_range:
60
+ hour_2 = match.group(6)
61
+ minute_2 = match.group(7)
62
+ second_2 = match.group(9)
63
+
64
+ result = f"{num2str(hour)}点"
65
+ if minute.lstrip('0'):
66
+ if int(minute) == 30:
67
+ result += "半"
68
+ else:
69
+ result += f"{_time_num2str(minute)}分"
70
+ if second and second.lstrip('0'):
71
+ result += f"{_time_num2str(second)}秒"
72
+
73
+ if is_range:
74
+ result += "至"
75
+ result += f"{num2str(hour_2)}点"
76
+ if minute_2.lstrip('0'):
77
+ if int(minute) == 30:
78
+ result += "半"
79
+ else:
80
+ result += f"{_time_num2str(minute_2)}分"
81
+ if second_2 and second_2.lstrip('0'):
82
+ result += f"{_time_num2str(second_2)}秒"
83
+
84
+ return result
85
+
86
+
87
+ RE_DATE = re.compile(r'(\d{4}|\d{2})年'
88
+ r'((0?[1-9]|1[0-2])月)?'
89
+ r'(((0?[1-9])|((1|2)[0-9])|30|31)([日号]))?')
90
+
91
+
92
+ def replace_date(match) -> str:
93
+ """
94
+ Args:
95
+ match (re.Match)
96
+ Returns:
97
+ str
98
+ """
99
+ year = match.group(1)
100
+ month = match.group(3)
101
+ day = match.group(5)
102
+ result = ""
103
+ if year:
104
+ result += f"{verbalize_digit(year)}年"
105
+ if month:
106
+ result += f"{verbalize_cardinal(month)}月"
107
+ if day:
108
+ result += f"{verbalize_cardinal(day)}{match.group(9)}"
109
+ return result
110
+
111
+
112
+ # 用 / 或者 - 分隔的 YY/MM/DD 或者 YY-MM-DD 日期
113
+ RE_DATE2 = re.compile(
114
+ r'(\d{4})([- /.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])')
115
+
116
+
117
+ def replace_date2(match) -> str:
118
+ """
119
+ Args:
120
+ match (re.Match)
121
+ Returns:
122
+ str
123
+ """
124
+ year = match.group(1)
125
+ month = match.group(3)
126
+ day = match.group(4)
127
+ result = ""
128
+ if year:
129
+ result += f"{verbalize_digit(year)}年"
130
+ if month:
131
+ result += f"{verbalize_cardinal(month)}月"
132
+ if day:
133
+ result += f"{verbalize_cardinal(day)}日"
134
+ return result
zh_normalization/constants.py ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ import re
15
+ import string
16
+
17
+ from pypinyin.constants import SUPPORT_UCS4
18
+
19
+ # 全角半角转换
20
+ # 英文字符全角 -> 半角映射表 (num: 52)
21
+ F2H_ASCII_LETTERS = {
22
+ ord(char) + 65248: ord(char)
23
+ for char in string.ascii_letters
24
+ }
25
+
26
+ # 英文字符半角 -> 全角映射表
27
+ H2F_ASCII_LETTERS = {value: key for key, value in F2H_ASCII_LETTERS.items()}
28
+
29
+ # 数字字符全角 -> 半角映射表 (num: 10)
30
+ F2H_DIGITS = {ord(char) + 65248: ord(char) for char in string.digits}
31
+ # 数字字符半角 -> 全角映射表
32
+ H2F_DIGITS = {value: key for key, value in F2H_DIGITS.items()}
33
+
34
+ # 标点符号全角 -> 半角映射表 (num: 32)
35
+ F2H_PUNCTUATIONS = {ord(char) + 65248: ord(char) for char in string.punctuation}
36
+ # 标点符号半角 -> 全角映射表
37
+ H2F_PUNCTUATIONS = {value: key for key, value in F2H_PUNCTUATIONS.items()}
38
+
39
+ # 空格 (num: 1)
40
+ F2H_SPACE = {'\u3000': ' '}
41
+ H2F_SPACE = {' ': '\u3000'}
42
+
43
+ # 非"有拼音的汉字"的字符串,可用于NSW提取
44
+ if SUPPORT_UCS4:
45
+ RE_NSW = re.compile(r'(?:[^'
46
+ r'\u3007' # 〇
47
+ r'\u3400-\u4dbf' # CJK扩展A:[3400-4DBF]
48
+ r'\u4e00-\u9fff' # CJK基本:[4E00-9FFF]
49
+ r'\uf900-\ufaff' # CJK兼容:[F900-FAFF]
50
+ r'\U00020000-\U0002A6DF' # CJK扩展B:[20000-2A6DF]
51
+ r'\U0002A703-\U0002B73F' # CJK扩展C:[2A700-2B73F]
52
+ r'\U0002B740-\U0002B81D' # CJK扩展D:[2B740-2B81D]
53
+ r'\U0002F80A-\U0002FA1F' # CJK兼容扩展:[2F800-2FA1F]
54
+ r'])+')
55
+ else:
56
+ RE_NSW = re.compile( # pragma: no cover
57
+ r'(?:[^'
58
+ r'\u3007' # 〇
59
+ r'\u3400-\u4dbf' # CJK扩展A:[3400-4DBF]
60
+ r'\u4e00-\u9fff' # CJK基本:[4E00-9FFF]
61
+ r'\uf900-\ufaff' # CJK兼容:[F900-FAFF]
62
+ r'])+')
zh_normalization/num.py ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ """
15
+ Rules to verbalize numbers into Chinese characters.
16
+ https://zh.wikipedia.org/wiki/中文数字#現代中文
17
+ """
18
+ import re
19
+ from collections import OrderedDict
20
+ from typing import List
21
+
22
+ DIGITS = {str(i): tran for i, tran in enumerate('零一二三四五六七八九')}
23
+ UNITS = OrderedDict({
24
+ 1: '十',
25
+ 2: '百',
26
+ 3: '千',
27
+ 4: '万',
28
+ 8: '亿',
29
+ })
30
+
31
+ COM_QUANTIFIERS = '(封|艘|把|目|套|段|人|所|朵|匹|张|座|回|场|尾|条|个|首|阙|阵|网|炮|顶|丘|棵|只|支|袭|辆|挑|担|颗|壳|窠|曲|墙|群|腔|砣|座|客|贯|扎|捆|刀|令|打|手|罗|坡|山|岭|江|溪|钟|队|单|双|对|出|口|头|脚|板|跳|枝|件|贴|针|线|管|名|位|身|堂|课|本|页|家|户|层|丝|毫|厘|分|钱|两|斤|担|铢|石|钧|锱|忽|(千|毫|微)克|毫|厘|(公)分|分|寸|尺|丈|里|寻|常|铺|程|(千|分|厘|毫|微)米|米|撮|勺|合|升|斗|石|盘|碗|碟|叠|桶|笼|盆|盒|杯|钟|斛|锅|簋|篮|盘|桶|罐|瓶|壶|卮|盏|箩|箱|煲|啖|袋|钵|年|月|日|季|刻|时|周|天|秒|分|小时|旬|纪|岁|世|更|夜|春|夏|秋|冬|代|伏|辈|丸|泡|粒|颗|幢|堆|条|根|支|道|面|片|张|颗|块|元|(亿|千万|百万|万|千|百)|(亿|千万|百万|万|千|百|美|)元|(亿|千万|百万|万|千|百|十|)吨|(亿|千万|百万|万|千|百|)块|角|毛|分)'
32
+
33
+ # 分数表达式
34
+ RE_FRAC = re.compile(r'(-?)(\d+)/(\d+)')
35
+
36
+
37
+ def replace_frac(match) -> str:
38
+ """
39
+ Args:
40
+ match (re.Match)
41
+ Returns:
42
+ str
43
+ """
44
+ sign = match.group(1)
45
+ nominator = match.group(2)
46
+ denominator = match.group(3)
47
+ sign: str = "负" if sign else ""
48
+ nominator: str = num2str(nominator)
49
+ denominator: str = num2str(denominator)
50
+ result = f"{sign}{denominator}分之{nominator}"
51
+ return result
52
+
53
+
54
+ # 百分数表达式
55
+ RE_PERCENTAGE = re.compile(r'(-?)(\d+(\.\d+)?)%')
56
+
57
+
58
+ def replace_percentage(match) -> str:
59
+ """
60
+ Args:
61
+ match (re.Match)
62
+ Returns:
63
+ str
64
+ """
65
+ sign = match.group(1)
66
+ percent = match.group(2)
67
+ sign: str = "负" if sign else ""
68
+ percent: str = num2str(percent)
69
+ result = f"{sign}百分之{percent}"
70
+ return result
71
+
72
+
73
+ # 整数表达式
74
+ # 带负号的整数 -10
75
+ RE_INTEGER = re.compile(r'(-)' r'(\d+)')
76
+
77
+
78
+ def replace_negative_num(match) -> str:
79
+ """
80
+ Args:
81
+ match (re.Match)
82
+ Returns:
83
+ str
84
+ """
85
+ sign = match.group(1)
86
+ number = match.group(2)
87
+ sign: str = "负" if sign else ""
88
+ number: str = num2str(number)
89
+ result = f"{sign}{number}"
90
+ return result
91
+
92
+
93
+ # 编号-无符号整形
94
+ # 00078
95
+ RE_DEFAULT_NUM = re.compile(r'\d{3}\d*')
96
+
97
+
98
+ def replace_default_num(match):
99
+ """
100
+ Args:
101
+ match (re.Match)
102
+ Returns:
103
+ str
104
+ """
105
+ number = match.group(0)
106
+ return verbalize_digit(number, alt_one=True)
107
+
108
+
109
+ # 数字表达式
110
+ # 纯小数
111
+ RE_DECIMAL_NUM = re.compile(r'(-?)((\d+)(\.\d+))' r'|(\.(\d+))')
112
+ # 正整数 + 量词
113
+ RE_POSITIVE_QUANTIFIERS = re.compile(r"(\d+)([多余几\+])?" + COM_QUANTIFIERS)
114
+ RE_NUMBER = re.compile(r'(-?)((\d+)(\.\d+)?)' r'|(\.(\d+))')
115
+
116
+
117
+ def replace_positive_quantifier(match) -> str:
118
+ """
119
+ Args:
120
+ match (re.Match)
121
+ Returns:
122
+ str
123
+ """
124
+ number = match.group(1)
125
+ match_2 = match.group(2)
126
+ if match_2 == "+":
127
+ match_2 = "多"
128
+ match_2: str = match_2 if match_2 else ""
129
+ quantifiers: str = match.group(3)
130
+ number: str = num2str(number)
131
+ result = f"{number}{match_2}{quantifiers}"
132
+ return result
133
+
134
+
135
+ def replace_number(match) -> str:
136
+ """
137
+ Args:
138
+ match (re.Match)
139
+ Returns:
140
+ str
141
+ """
142
+ sign = match.group(1)
143
+ number = match.group(2)
144
+ pure_decimal = match.group(5)
145
+ if pure_decimal:
146
+ result = num2str(pure_decimal)
147
+ else:
148
+ sign: str = "负" if sign else ""
149
+ number: str = num2str(number)
150
+ result = f"{sign}{number}"
151
+ return result
152
+
153
+
154
+ # 范围表达式
155
+ # match.group(1) and match.group(8) are copy from RE_NUMBER
156
+
157
+ RE_RANGE = re.compile(
158
+ r'((-?)((\d+)(\.\d+)?)|(\.(\d+)))[-~]((-?)((\d+)(\.\d+)?)|(\.(\d+)))')
159
+
160
+
161
+ def replace_range(match) -> str:
162
+ """
163
+ Args:
164
+ match (re.Match)
165
+ Returns:
166
+ str
167
+ """
168
+ first, second = match.group(1), match.group(8)
169
+ first = RE_NUMBER.sub(replace_number, first)
170
+ second = RE_NUMBER.sub(replace_number, second)
171
+ result = f"{first}到{second}"
172
+ return result
173
+
174
+
175
+ def _get_value(value_string: str, use_zero: bool=True) -> List[str]:
176
+ stripped = value_string.lstrip('0')
177
+ if len(stripped) == 0:
178
+ return []
179
+ elif len(stripped) == 1:
180
+ if use_zero and len(stripped) < len(value_string):
181
+ return [DIGITS['0'], DIGITS[stripped]]
182
+ else:
183
+ return [DIGITS[stripped]]
184
+ else:
185
+ largest_unit = next(
186
+ power for power in reversed(UNITS.keys()) if power < len(stripped))
187
+ first_part = value_string[:-largest_unit]
188
+ second_part = value_string[-largest_unit:]
189
+ return _get_value(first_part) + [UNITS[largest_unit]] + _get_value(
190
+ second_part)
191
+
192
+
193
+ def verbalize_cardinal(value_string: str) -> str:
194
+ if not value_string:
195
+ return ''
196
+
197
+ # 000 -> '零' , 0 -> '零'
198
+ value_string = value_string.lstrip('0')
199
+ if len(value_string) == 0:
200
+ return DIGITS['0']
201
+
202
+ result_symbols = _get_value(value_string)
203
+ # verbalized number starting with '一十*' is abbreviated as `十*`
204
+ if len(result_symbols) >= 2 and result_symbols[0] == DIGITS[
205
+ '1'] and result_symbols[1] == UNITS[1]:
206
+ result_symbols = result_symbols[1:]
207
+ return ''.join(result_symbols)
208
+
209
+
210
+ def verbalize_digit(value_string: str, alt_one=False) -> str:
211
+ result_symbols = [DIGITS[digit] for digit in value_string]
212
+ result = ''.join(result_symbols)
213
+ if alt_one:
214
+ result = result.replace("一", "幺")
215
+ return result
216
+
217
+
218
+ def num2str(value_string: str) -> str:
219
+ integer_decimal = value_string.split('.')
220
+ if len(integer_decimal) == 1:
221
+ integer = integer_decimal[0]
222
+ decimal = ''
223
+ elif len(integer_decimal) == 2:
224
+ integer, decimal = integer_decimal
225
+ else:
226
+ raise ValueError(
227
+ f"The value string: '${value_string}' has more than one point in it."
228
+ )
229
+
230
+ result = verbalize_cardinal(integer)
231
+
232
+ decimal = decimal.rstrip('0')
233
+ if decimal:
234
+ # '.22' is verbalized as '零点二二'
235
+ # '3.20' is verbalized as '三点二
236
+ result = result if result else "零"
237
+ result += '点' + verbalize_digit(decimal)
238
+ return result
zh_normalization/phonecode.py ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ import re
15
+
16
+ from .num import verbalize_digit
17
+
18
+ # 规范化固话/手机号码
19
+ # 手机
20
+ # http://www.jihaoba.com/news/show/13680
21
+ # 移动:139、138、137、136、135、134、159、158、157、150、151、152、188、187、182、183、184、178、198
22
+ # 联通:130、131、132、156、155、186、185、176
23
+ # 电信:133、153、189、180、181、177
24
+ RE_MOBILE_PHONE = re.compile(
25
+ r"(?<!\d)((\+?86 ?)?1([38]\d|5[0-35-9]|7[678]|9[89])\d{8})(?!\d)")
26
+ RE_TELEPHONE = re.compile(
27
+ r"(?<!\d)((0(10|2[1-3]|[3-9]\d{2})-?)?[1-9]\d{7,8})(?!\d)")
28
+
29
+ # 全国统一的号码400开头
30
+ RE_NATIONAL_UNIFORM_NUMBER = re.compile(r"(400)(-)?\d{3}(-)?\d{4}")
31
+
32
+
33
+ def phone2str(phone_string: str, mobile=True) -> str:
34
+ if mobile:
35
+ sp_parts = phone_string.strip('+').split()
36
+ result = ','.join(
37
+ [verbalize_digit(part, alt_one=True) for part in sp_parts])
38
+ return result
39
+ else:
40
+ sil_parts = phone_string.split('-')
41
+ result = ','.join(
42
+ [verbalize_digit(part, alt_one=True) for part in sil_parts])
43
+ return result
44
+
45
+
46
+ def replace_phone(match) -> str:
47
+ """
48
+ Args:
49
+ match (re.Match)
50
+ Returns:
51
+ str
52
+ """
53
+ return phone2str(match.group(0), mobile=False)
54
+
55
+
56
+ def replace_mobile(match) -> str:
57
+ """
58
+ Args:
59
+ match (re.Match)
60
+ Returns:
61
+ str
62
+ """
63
+ return phone2str(match.group(0))
zh_normalization/quantifier.py ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ import re
15
+
16
+ from .num import num2str
17
+
18
+ # 温度表达式,温度会影响负号的读法
19
+ # -3°C 零下三度
20
+ RE_TEMPERATURE = re.compile(r'(-?)(\d+(\.\d+)?)(°C|℃|度|摄氏度)')
21
+ measure_dict = {
22
+ "cm2": "平方厘米",
23
+ "cm²": "平方厘米",
24
+ "cm3": "立方厘米",
25
+ "cm³": "立方厘米",
26
+ "cm": "厘米",
27
+ "db": "分贝",
28
+ "ds": "毫秒",
29
+ "kg": "千克",
30
+ "km": "千米",
31
+ "m2": "平方米",
32
+ "m²": "平方米",
33
+ "m³": "立方米",
34
+ "m3": "立方米",
35
+ "ml": "毫升",
36
+ "m": "米",
37
+ "mm": "毫米",
38
+ "s": "秒"
39
+ }
40
+
41
+
42
+ def replace_temperature(match) -> str:
43
+ """
44
+ Args:
45
+ match (re.Match)
46
+ Returns:
47
+ str
48
+ """
49
+ sign = match.group(1)
50
+ temperature = match.group(2)
51
+ unit = match.group(3)
52
+ sign: str = "零下" if sign else ""
53
+ temperature: str = num2str(temperature)
54
+ unit: str = "摄氏度" if unit == "摄氏度" else "度"
55
+ result = f"{sign}{temperature}{unit}"
56
+ return result
57
+
58
+
59
+ def replace_measure(sentence) -> str:
60
+ for q_notation in measure_dict:
61
+ if q_notation in sentence:
62
+ sentence = sentence.replace(q_notation, measure_dict[q_notation])
63
+ return sentence
zh_normalization/text_normlization.py ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ import re
15
+ from typing import List
16
+
17
+ from .char_convert import tranditional_to_simplified
18
+ from .chronology import RE_DATE
19
+ from .chronology import RE_DATE2
20
+ from .chronology import RE_TIME
21
+ from .chronology import RE_TIME_RANGE
22
+ from .chronology import replace_date
23
+ from .chronology import replace_date2
24
+ from .chronology import replace_time
25
+ from .constants import F2H_ASCII_LETTERS
26
+ from .constants import F2H_DIGITS
27
+ from .constants import F2H_SPACE
28
+ from .num import RE_DECIMAL_NUM
29
+ from .num import RE_DEFAULT_NUM
30
+ from .num import RE_FRAC
31
+ from .num import RE_INTEGER
32
+ from .num import RE_NUMBER
33
+ from .num import RE_PERCENTAGE
34
+ from .num import RE_POSITIVE_QUANTIFIERS
35
+ from .num import RE_RANGE
36
+ from .num import replace_default_num
37
+ from .num import replace_frac
38
+ from .num import replace_negative_num
39
+ from .num import replace_number
40
+ from .num import replace_percentage
41
+ from .num import replace_positive_quantifier
42
+ from .num import replace_range
43
+ from .phonecode import RE_MOBILE_PHONE
44
+ from .phonecode import RE_NATIONAL_UNIFORM_NUMBER
45
+ from .phonecode import RE_TELEPHONE
46
+ from .phonecode import replace_mobile
47
+ from .phonecode import replace_phone
48
+ from .quantifier import RE_TEMPERATURE
49
+ from .quantifier import replace_measure
50
+ from .quantifier import replace_temperature
51
+
52
+
53
+ class TextNormalizer():
54
+ def __init__(self):
55
+ self.SENTENCE_SPLITOR = re.compile(r'([:、,;。?!,;?!][”’]?)')
56
+
57
+ def _split(self, text: str, lang="zh") -> List[str]:
58
+ """Split long text into sentences with sentence-splitting punctuations.
59
+ Args:
60
+ text (str): The input text.
61
+ Returns:
62
+ List[str]: Sentences.
63
+ """
64
+ # Only for pure Chinese here
65
+ if lang == "zh":
66
+ text = text.replace(" ", "")
67
+ # 过滤掉特殊字符
68
+ text = re.sub(r'[——《》【】<=>{}()()#&@“”^_|…\\]', '', text)
69
+ text = self.SENTENCE_SPLITOR.sub(r'\1\n', text)
70
+ text = text.strip()
71
+ sentences = [sentence.strip() for sentence in re.split(r'\n+', text)]
72
+ return sentences
73
+
74
+ def _post_replace(self, sentence: str) -> str:
75
+ sentence = sentence.replace('/', '每')
76
+ sentence = sentence.replace('~', '至')
77
+ sentence = sentence.replace('~', '至')
78
+ sentence = sentence.replace('①', '一')
79
+ sentence = sentence.replace('②', '二')
80
+ sentence = sentence.replace('③', '三')
81
+ sentence = sentence.replace('④', '四')
82
+ sentence = sentence.replace('⑤', '五')
83
+ sentence = sentence.replace('⑥', '六')
84
+ sentence = sentence.replace('⑦', '七')
85
+ sentence = sentence.replace('⑧', '八')
86
+ sentence = sentence.replace('⑨', '九')
87
+ sentence = sentence.replace('⑩', '十')
88
+ sentence = sentence.replace('α', '阿尔法')
89
+ sentence = sentence.replace('β', '贝塔')
90
+ sentence = sentence.replace('γ', '伽玛').replace('Γ', '伽玛')
91
+ sentence = sentence.replace('δ', '德尔塔').replace('Δ', '德尔塔')
92
+ sentence = sentence.replace('ε', '艾普西龙')
93
+ sentence = sentence.replace('ζ', '捷塔')
94
+ sentence = sentence.replace('η', '依塔')
95
+ sentence = sentence.replace('θ', '西塔').replace('Θ', '西塔')
96
+ sentence = sentence.replace('ι', '艾欧塔')
97
+ sentence = sentence.replace('κ', '喀帕')
98
+ sentence = sentence.replace('λ', '拉姆达').replace('Λ', '拉姆达')
99
+ sentence = sentence.replace('μ', '缪')
100
+ sentence = sentence.replace('ν', '拗')
101
+ sentence = sentence.replace('ξ', '克西').replace('Ξ', '克西')
102
+ sentence = sentence.replace('ο', '欧米克伦')
103
+ sentence = sentence.replace('π', '派').replace('Π', '派')
104
+ sentence = sentence.replace('ρ', '肉')
105
+ sentence = sentence.replace('ς', '西格玛').replace('Σ', '西格玛').replace(
106
+ 'σ', '西格玛')
107
+ sentence = sentence.replace('τ', '套')
108
+ sentence = sentence.replace('υ', '宇普西龙')
109
+ sentence = sentence.replace('φ', '服艾').replace('Φ', '服艾')
110
+ sentence = sentence.replace('χ', '器')
111
+ sentence = sentence.replace('ψ', '普赛').replace('Ψ', '普赛')
112
+ sentence = sentence.replace('ω', '欧米伽').replace('Ω', '欧米伽')
113
+ # re filter special characters, have one more character "-" than line 68
114
+ sentence = re.sub(r'[-——《》【】<=>{}()()#&@“”^_|…\\]', '', sentence)
115
+ return sentence
116
+
117
+ def normalize_sentence(self, sentence: str) -> str:
118
+ # basic character conversions
119
+ sentence = tranditional_to_simplified(sentence)
120
+ sentence = sentence.translate(F2H_ASCII_LETTERS).translate(
121
+ F2H_DIGITS).translate(F2H_SPACE)
122
+
123
+ # number related NSW verbalization
124
+ sentence = RE_DATE.sub(replace_date, sentence)
125
+ sentence = RE_DATE2.sub(replace_date2, sentence)
126
+
127
+ # range first
128
+ sentence = RE_TIME_RANGE.sub(replace_time, sentence)
129
+ sentence = RE_TIME.sub(replace_time, sentence)
130
+
131
+ sentence = RE_TEMPERATURE.sub(replace_temperature, sentence)
132
+ sentence = replace_measure(sentence)
133
+ sentence = RE_FRAC.sub(replace_frac, sentence)
134
+ sentence = RE_PERCENTAGE.sub(replace_percentage, sentence)
135
+ sentence = RE_MOBILE_PHONE.sub(replace_mobile, sentence)
136
+
137
+ sentence = RE_TELEPHONE.sub(replace_phone, sentence)
138
+ sentence = RE_NATIONAL_UNIFORM_NUMBER.sub(replace_phone, sentence)
139
+
140
+ sentence = RE_RANGE.sub(replace_range, sentence)
141
+ sentence = RE_INTEGER.sub(replace_negative_num, sentence)
142
+ sentence = RE_DECIMAL_NUM.sub(replace_number, sentence)
143
+ sentence = RE_POSITIVE_QUANTIFIERS.sub(replace_positive_quantifier,
144
+ sentence)
145
+ sentence = RE_DEFAULT_NUM.sub(replace_default_num, sentence)
146
+ sentence = RE_NUMBER.sub(replace_number, sentence)
147
+ sentence = self._post_replace(sentence)
148
+
149
+ return sentence
150
+
151
+ def normalize(self, text: str) -> List[str]:
152
+ sentences = self._split(text)
153
+ sentences = [self.normalize_sentence(sent) for sent in sentences]
154
+ return sentences