liumaolin commited on
Commit
86f7d71
·
1 Parent(s): 8f68d0a

docs: add comprehensive usage manual for MoYoYo.tts

Browse files

- Create `USAGE.md` covering installation, configuration, usage, API reference, and troubleshooting.
- Include detailed guides for Quick and Advanced Modes, setup instructions, and pipeline visualization.
- Document backend and frontend configurations, running processes, and language-specific setups.
- Add examples for training, inference, and voice library management via both API and UI.

Files changed (3) hide show
  1. USAGE.md +1718 -0
  2. USAGE_CN.md +1776 -0
  3. development.md +0 -0
USAGE.md ADDED
@@ -0,0 +1,1718 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MoYoYo.tts Usage Manual
2
+
3
+ ## Table of Contents
4
+
5
+ - [Introduction](#introduction)
6
+ - [System Requirements](#system-requirements)
7
+ - [Installation Guide](#installation-guide)
8
+ - [Install uv Package Manager](#31-install-uv-package-manager)
9
+ - [Python Environment Setup](#32-python-environment-setup)
10
+ - [Download Required Data Files](#33-download-required-data-files)
11
+ - [Frontend Setup](#34-frontend-setup)
12
+ - [Configuration](#configuration)
13
+ - [Backend API Configuration](#41-backend-api-configuration)
14
+ - [Frontend Configuration](#42-frontend-configuration)
15
+ - [Running the Application](#running-the-application)
16
+ - [Start Backend API Server](#51-start-backend-api-server)
17
+ - [Start Frontend Electron App](#52-start-frontend-electron-app)
18
+ - [Usage Guide](#usage-guide)
19
+ - [First-Time Setup](#61-first-time-setup)
20
+ - [Quick Mode - Voice Cloning for Beginners](#62-quick-mode---voice-cloning-for-beginners)
21
+ - [Advanced Mode - Expert Voice Cloning](#63-advanced-mode---expert-voice-cloning)
22
+ - [Text-to-Speech Generation](#64-text-to-speech-generation)
23
+ - [Voice Library Management](#65-voice-library-management)
24
+ - [API Reference](#api-reference)
25
+ - [Troubleshooting](#troubleshooting)
26
+ - [Development](#development)
27
+
28
+ ---
29
+
30
+ ## Introduction
31
+
32
+ MoYoYo.tts is a comprehensive voice cloning and text-to-speech system that combines:
33
+
34
+ - **Backend API**: FastAPI-based REST API for voice training and inference
35
+ - **Frontend Application**: Electron + Vue desktop app with intuitive UI
36
+
37
+ The system is built on GPT-SoVITS technology, enabling high-quality voice cloning with minimal training data (as little as 5 seconds of audio).
38
+
39
+ **Target Audience**:
40
+ - End users who want to create custom voices for text-to-speech
41
+ - Developers integrating voice synthesis into applications
42
+ - Researchers experimenting with voice cloning technology
43
+
44
+ **Key Features**:
45
+ - Quick Mode: One-click voice cloning for beginners
46
+ - Advanced Mode: Fine-grained control over training pipeline
47
+ - Real-time progress tracking via Server-Sent Events (SSE)
48
+ - Multi-language support (Chinese, English, Japanese)
49
+ - GPU acceleration with CUDA support
50
+
51
+ ---
52
+
53
+ ## System Requirements
54
+
55
+ ### Software Requirements
56
+
57
+ | Component | Version | Notes |
58
+ |-----------|---------|-------|
59
+ | **Python** | 3.10 - 3.12 | Python 3.11 recommended |
60
+ | **Node.js** | >= 18.x | For frontend development |
61
+ | **uv** | Latest | Python package manager |
62
+ | **CUDA** | 12.6 or 12.8 | Optional, for GPU acceleration |
63
+
64
+ ### Hardware Requirements
65
+
66
+ | Component | Minimum | Recommended |
67
+ |-----------|---------|-------------|
68
+ | **CPU** | Dual-core | Quad-core or better |
69
+ | **RAM** | 16 GB | 32 GB (for training) |
70
+ | **GPU** | None (CPU mode) | NVIDIA GPU with 6GB+ VRAM |
71
+ | **Storage** | 20 GB free | 50 GB+ for multiple voices |
72
+
73
+ **GPU Notes**:
74
+ - GPU is optional but significantly speeds up training (5-10x faster)
75
+ - NVIDIA GPUs with CUDA 12.6 or 12.8 support recommended
76
+ - AMD GPUs and Apple Silicon currently not supported for training
77
+
78
+ ---
79
+
80
+ ## Installation Guide
81
+
82
+ ### 3.1 Install uv Package Manager
83
+
84
+ uv is a fast Python package installer and resolver that replaces pip.
85
+
86
+ **macOS / Linux**:
87
+ ```bash
88
+ curl -LsSf https://astral.sh/uv/install.sh | sh
89
+ ```
90
+
91
+ **Windows** (PowerShell):
92
+ ```powershell
93
+ powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
94
+ ```
95
+
96
+ Verify installation:
97
+ ```bash
98
+ uv --version
99
+ ```
100
+
101
+ ### 3.2 Python Environment Setup
102
+
103
+ The project uses `uv` for dependency management with a `pyproject.toml` configuration. The setup process is streamlined into a single command.
104
+
105
+ **Step 1: Navigate to Project Directory**
106
+ ```bash
107
+ cd GPT-SoVITS
108
+ ```
109
+
110
+ **Step 2: Sync All Dependencies**
111
+ ```bash
112
+ # This single command will:
113
+ # - Create a virtual environment (.venv)
114
+ # - Install Python 3.11 (or your specified version)
115
+ # - Install all dependencies from pyproject.toml
116
+ # - Install the correct PyTorch version for your platform
117
+ uv sync
118
+ ```
119
+
120
+ **Step 3: Activate Environment**
121
+
122
+ macOS / Linux:
123
+ ```bash
124
+ source .venv/bin/activate
125
+ ```
126
+
127
+ Windows:
128
+ ```cmd
129
+ .venv\Scripts\activate
130
+ ```
131
+
132
+ You should see `(.venv)` prefix in your terminal prompt.
133
+
134
+ **How Platform-Specific PyTorch Installation Works**:
135
+
136
+ The `pyproject.toml` automatically selects the appropriate PyTorch version:
137
+ - **macOS**: Installs CPU-only PyTorch (Apple Silicon uses CPU mode)
138
+ - **Linux**: Installs CUDA 12.6 PyTorch by default
139
+ - **Windows**: Manually select CUDA version (see below)
140
+
141
+ **Windows Users - Choose CUDA Version**:
142
+
143
+ For Windows, you need to specify the PyTorch index explicitly:
144
+
145
+ **CUDA 12.6** (default):
146
+ ```bash
147
+ uv sync
148
+ ```
149
+
150
+ **CUDA 12.8**:
151
+ ```bash
152
+ uv sync --index pytorch-cu128
153
+ ```
154
+
155
+ **CPU Only** (no GPU):
156
+ ```bash
157
+ uv sync --index pytorch-cpu
158
+ ```
159
+
160
+ **Verify Installation**:
161
+ ```bash
162
+ # Check Python version
163
+ python --version # Should show Python 3.11.x
164
+
165
+ # Check PyTorch installation
166
+ python -c "import torch; print(f'PyTorch: {torch.__version__}')"
167
+
168
+ # Check CUDA availability (if you have GPU)
169
+ python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}')"
170
+ ```
171
+
172
+ ### 3.3 Download Required Data Files
173
+
174
+ The following data files are required for text processing and voice training.
175
+
176
+ #### NLTK Data (Required for Text Processing)
177
+
178
+ NLTK (Natural Language Toolkit) data is used for text tokenization and processing.
179
+
180
+ ```bash
181
+ # Download from ModelScope
182
+ wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/nltk_data.zip
183
+
184
+ # Extract to Python environment
185
+ unzip -q -o nltk_data.zip -d .venv/
186
+
187
+ # Clean up
188
+ rm nltk_data.zip
189
+ ```
190
+
191
+ **Size**: ~10 MB
192
+ **Time**: < 1 minute
193
+
194
+ #### Open JTalk Dictionary (Required for Japanese)
195
+
196
+ Open JTalk is required for Japanese text-to-speech processing.
197
+
198
+ ```bash
199
+ # Get pyopenjtalk installation path
200
+ PYOPENJTALK_PATH=$(python -c "import os, pyopenjtalk; print(os.path.dirname(pyopenjtalk.__file__))")
201
+
202
+ # Download from ModelScope
203
+ wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/open_jtalk_dic_utf_8-1.11.tar.gz
204
+
205
+ # Extract to pyopenjtalk directory
206
+ tar -xzf open_jtalk_dic_utf_8-1.11.tar.gz -C "$PYOPENJTALK_PATH"
207
+
208
+ # Clean up
209
+ rm open_jtalk_dic_utf_8-1.11.tar.gz
210
+ ```
211
+
212
+ **Size**: ~50 MB
213
+ **Time**: < 2 minutes
214
+
215
+
216
+ ### 3.4 Frontend Setup
217
+
218
+ The frontend is an Electron application built with Vue.js.
219
+
220
+ ```bash
221
+ # Navigate to frontend directory
222
+ cd tts-voice-app
223
+
224
+ # Install Node.js dependencies
225
+ npm install
226
+ ```
227
+
228
+ **Time**: 2-5 minutes
229
+ **Note**: This installs all required Node.js packages including Electron, Vue, and UI components.
230
+
231
+ ---
232
+
233
+ ## Configuration
234
+
235
+ ### 4.1 Backend API Configuration
236
+
237
+ The backend uses environment variables for configuration. Create a `.env` file in the project root for custom settings.
238
+
239
+ **Create `.env` file** (optional, defaults work for local development):
240
+
241
+ ```bash
242
+ # Deployment Mode
243
+ # Options: local, server
244
+ DEPLOYMENT_MODE=local
245
+
246
+ # API Server Settings
247
+ API_HOST=0.0.0.0
248
+ API_PORT=8000
249
+
250
+ # Data Storage Paths
251
+ DATA_DIR=~/.moyoyo-tts/data
252
+ SQLITE_PATH=~/.moyoyo-tts/data/tasks.db
253
+
254
+ # Training Settings
255
+ LOCAL_MAX_WORKERS=1 # Number of concurrent training tasks
256
+ ```
257
+
258
+ **Configuration Options**:
259
+
260
+ | Variable | Default | Description |
261
+ |----------|---------|-------------|
262
+ | `DEPLOYMENT_MODE` | `local` | Deployment environment (local/server) |
263
+ | `API_HOST` | `0.0.0.0` | API server bind address |
264
+ | `API_PORT` | `8000` | API server port |
265
+ | `DATA_DIR` | `~/.moyoyo-tts/data` | Directory for data storage |
266
+ | `SQLITE_PATH` | `~/.moyoyo-tts/data/tasks.db` | SQLite database path |
267
+ | `LOCAL_MAX_WORKERS` | `1` | Max concurrent training tasks |
268
+
269
+ **Notes**:
270
+ - `API_HOST=0.0.0.0` allows connections from any network interface
271
+ - `LOCAL_MAX_WORKERS=1` prevents memory issues on systems with limited RAM
272
+ - Increase `LOCAL_MAX_WORKERS` on high-end systems to train multiple voices simultaneously
273
+
274
+ ### 4.2 Frontend Configuration
275
+
276
+ The frontend requires minimal configuration for local development.
277
+
278
+ **Default Settings**:
279
+ - **API Endpoint**: `http://localhost:8000`
280
+ - **Voice Storage**: `~/.moyoyo-tts/voices/`
281
+ - **Model Storage**: `GPT_SoVITS/pretrained_models/`
282
+
283
+ **Auto-Configuration**:
284
+ The Electron app will:
285
+ 1. Automatically detect and connect to the local API server
286
+ 2. Create required directories on first launch
287
+ 3. Download missing models via the Model Setup page
288
+
289
+ No manual configuration needed for standard usage.
290
+
291
+ ---
292
+
293
+ ## Running the Application
294
+
295
+ ### 5.1 Start Backend API Server
296
+
297
+ **Step 1: Activate Python Environment**
298
+
299
+ ```bash
300
+ # Navigate to project directory
301
+ cd GPT-SoVITS
302
+
303
+ # Activate virtual environment
304
+ source .venv/bin/activate # macOS/Linux
305
+ .venv\Scripts\activate # Windows
306
+ ```
307
+
308
+ **Step 2: Start the API Server**
309
+
310
+ Method 1 - Using the main script:
311
+ ```bash
312
+ cd api_server
313
+ python app/main.py
314
+ ```
315
+
316
+ Method 2 - Using uvicorn directly:
317
+ ```bash
318
+ uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
319
+ ```
320
+
321
+ **Expected Output**:
322
+ ```
323
+ INFO: Started server process [12345]
324
+ INFO: Waiting for application startup.
325
+ INFO: Application startup complete.
326
+ INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
327
+ ```
328
+
329
+ **API Documentation**:
330
+ Once the server is running, access interactive API documentation:
331
+
332
+ - **Swagger UI**: http://localhost:8000/docs
333
+ - **ReDoc**: http://localhost:8000/redoc
334
+ - **OpenAPI JSON**: http://localhost:8000/openapi.json
335
+
336
+ **Health Check**:
337
+ ```bash
338
+ curl http://localhost:8000/health
339
+ # Expected: {"status": "healthy"}
340
+ ```
341
+
342
+ ### 5.2 Start Frontend Electron App
343
+
344
+ **Step 1: Open New Terminal**
345
+
346
+ Keep the backend server running and open a new terminal window.
347
+
348
+ **Step 2: Navigate to Frontend Directory**
349
+
350
+ ```bash
351
+ cd tts-voice-app
352
+ ```
353
+
354
+ **Step 3: Start Development Mode**
355
+
356
+ ```bash
357
+ npm run dev
358
+ ```
359
+
360
+ **Expected Output**:
361
+ ```
362
+ > tts-voice-app@1.0.0 dev
363
+ > electron-vite dev
364
+
365
+ VITE v4.x.x ready in xxx ms
366
+ ➜ Local: http://localhost:5173/
367
+ ➜ Network: use --host to expose
368
+
369
+ Electron app starting...
370
+ ```
371
+
372
+ The Electron application will launch automatically with hot-reload enabled for development.
373
+
374
+ **Features in Development Mode**:
375
+ - Hot module replacement (HMR) for instant UI updates
376
+ - Vue DevTools integration
377
+ - Console logging for debugging
378
+ - Automatic restart on main process changes
379
+
380
+ ---
381
+
382
+ ## Usage Guide
383
+
384
+ ### 6.1 First-Time Setup
385
+
386
+ When you first launch the Electron app, you'll need to download required models.
387
+
388
+ **Setup Process**:
389
+
390
+ 1. **Launch the Electron App**
391
+ ```bash
392
+ cd tts-voice-app
393
+ npm run dev
394
+ ```
395
+
396
+ 2. **Model Setup Page**
397
+ - The app automatically detects missing models
398
+ - You'll be redirected to the Model Setup page
399
+
400
+ 3. **Download Models**
401
+ - Click "Download All Models" button
402
+ - Models to be downloaded:
403
+ - **Pretrained Models**: 4.56 GB
404
+ - **G2PW Model**: 588.86 MB
405
+ - **FunASR**: 1.09 GB
406
+ - **Faster Whisper**: 2.85 GB
407
+ - Total download size: ~9 GB
408
+
409
+ 4. **Monitor Progress**
410
+ - Real-time progress bars show download status
411
+ - Estimated time: 10-30 minutes (depends on connection)
412
+ - Downloads can be paused and resumed
413
+
414
+ 5. **Setup Complete**
415
+ - Once all models are downloaded, click "Continue"
416
+ - You'll be redirected to the main TTS page
417
+ - The app is now ready to use
418
+
419
+ **Troubleshooting**:
420
+ - If downloads fail, check your internet connection
421
+ - Verify you have ~10 GB free disk space
422
+ - For manual installation, see section 3.3
423
+
424
+ ### 6.2 Quick Mode - Voice Cloning for Beginners
425
+
426
+ Quick Mode provides a simplified workflow for users who want to create a voice clone quickly without technical knowledge.
427
+
428
+ #### Using the API
429
+
430
+ **Step 1: Upload Audio File**
431
+
432
+ ```bash
433
+ curl -X POST http://localhost:8000/api/v1/files \
434
+ -F "file=@path/to/voice_sample.wav" \
435
+ -F "purpose=training"
436
+ ```
437
+
438
+ **Response**:
439
+ ```json
440
+ {
441
+ "file_id": "550e8400-e29b-41d4-a716-446655440000",
442
+ "filename": "voice_sample.wav",
443
+ "size": 1234567,
444
+ "purpose": "training"
445
+ }
446
+ ```
447
+
448
+ **Step 2: Create Training Task**
449
+
450
+ ```bash
451
+ curl -X POST http://localhost:8000/api/v1/tasks \
452
+ -H "Content-Type: application/json" \
453
+ -d '{
454
+ "exp_name": "my_voice",
455
+ "audio_file_id": "550e8400-e29b-41d4-a716-446655440000",
456
+ "options": {
457
+ "version": "v2",
458
+ "language": "zh",
459
+ "quality": "standard"
460
+ }
461
+ }'
462
+ ```
463
+
464
+ **Response**:
465
+ ```json
466
+ {
467
+ "id": "task-uuid-here",
468
+ "status": "queued",
469
+ "exp_name": "my_voice",
470
+ "created_at": "2026-01-23T10:30:00Z"
471
+ }
472
+ ```
473
+
474
+ **Step 3: Monitor Progress**
475
+
476
+ Using Server-Sent Events (SSE):
477
+ ```bash
478
+ curl -N http://localhost:8000/api/v1/tasks/task-uuid-here/progress
479
+ ```
480
+
481
+ **Progress Events**:
482
+ ```
483
+ event: progress
484
+ data: {"stage": "audio_slice", "progress": 25, "message": "Slicing audio..."}
485
+
486
+ event: progress
487
+ data: {"stage": "sovits_train", "progress": 50, "message": "Training SoVITS model..."}
488
+
489
+ event: complete
490
+ data: {"status": "completed", "voice_id": "voice-uuid-here"}
491
+ ```
492
+
493
+ #### Quality Presets
494
+
495
+ | Preset | SoVITS Epochs | GPT Epochs | Est. Time | Quality |
496
+ |--------|---------------|------------|-----------|---------|
497
+ | **fast** | 4 | 8 | ~10 min | Good for testing |
498
+ | **standard** | 8 | 15 | ~20 min | Balanced quality/speed |
499
+ | **high** | 16 | 30 | ~40 min | Best quality |
500
+
501
+ **Recommendations**:
502
+ - Use `fast` for quick tests and previews
503
+ - Use `standard` for most production use cases
504
+ - Use `high` for professional applications requiring maximum quality
505
+
506
+ #### Using the UI
507
+
508
+ **Step 1: Navigate to Voice Clone Page**
509
+ - Click "Voice Clone" in the sidebar
510
+ - Or use keyboard shortcut: `Ctrl/Cmd + N`
511
+
512
+ **Step 2: Upload Audio Sample**
513
+ - Click "Upload Audio" button
514
+ - Select a WAV or MP3 file
515
+ - **Requirements**:
516
+ - Duration: 5-30 seconds recommended
517
+ - Quality: Clear voice, minimal background noise
518
+ - Content: Natural speech, not singing or shouting
519
+
520
+ **Step 3: Configure Training**
521
+ - **Voice Name**: Enter a unique name (e.g., "John's Voice")
522
+ - **Language**: Select primary language (Chinese, English, Japanese)
523
+ - **Quality Preset**: Choose from fast/standard/high
524
+
525
+ **Step 4: Start Training**
526
+ - Click "Start Training" button
527
+ - The task will be queued and processing will begin
528
+
529
+ **Step 5: Monitor Progress**
530
+ - Progress bar shows overall completion
531
+ - Current stage displayed (e.g., "Training SoVITS model...")
532
+ - Estimated time remaining shown
533
+ - You can navigate away and check back later
534
+
535
+ **Step 6: Training Complete**
536
+ - You'll receive a notification when complete
537
+ - The voice automatically appears in Voice Library
538
+ - You can immediately use it for TTS generation
539
+
540
+ **Tips for Best Results**:
541
+ - Use high-quality audio (preferably 48kHz WAV)
542
+ - Ensure consistent tone and speaking style
543
+ - Avoid audio with music or sound effects
544
+ - 10-15 seconds is the sweet spot for sample length
545
+ - Multiple short samples can be combined
546
+
547
+ ### 6.3 Advanced Mode - Expert Voice Cloning
548
+
549
+ Advanced Mode provides granular control over each stage of the voice training pipeline. This is recommended for users who want to fine-tune training parameters.
550
+
551
+ #### Training Pipeline Stages
552
+
553
+ The complete training pipeline consists of 7 stages:
554
+
555
+ 1. **Audio Slice**: Split audio into segments
556
+ 2. **ASR** (Automatic Speech Recognition): Transcribe audio to text
557
+ 3. **Text Feature**: Extract text embeddings
558
+ 4. **Hubert Feature**: Extract audio features
559
+ 5. **Semantic Token**: Generate semantic tokens
560
+ 6. **SoVITS Train**: Train voice synthesis model
561
+ 7. **GPT Train**: Train text-to-semantic model
562
+
563
+ #### Stage Dependencies
564
+
565
+ ```
566
+ audio_slice → asr → text_feature → sovits_train
567
+ ↘ ↗
568
+ hubert_feature → semantic_token → gpt_train
569
+ ```
570
+
571
+ **Important**: Each stage must wait for its dependencies to complete.
572
+
573
+ #### Using the API
574
+
575
+ **Step 1: Create Experiment**
576
+
577
+ ```bash
578
+ curl -X POST http://localhost:8000/api/v1/experiments \
579
+ -H "Content-Type: application/json" \
580
+ -d '{
581
+ "exp_name": "my_custom_voice",
582
+ "version": "v2",
583
+ "audio_file_id": "file-uuid-here"
584
+ }'
585
+ ```
586
+
587
+ **Response**:
588
+ ```json
589
+ {
590
+ "id": "exp-uuid-here",
591
+ "exp_name": "my_custom_voice",
592
+ "version": "v2",
593
+ "stages": {
594
+ "audio_slice": {"status": "pending"},
595
+ "asr": {"status": "pending"},
596
+ "text_feature": {"status": "pending"},
597
+ "hubert_feature": {"status": "pending"},
598
+ "semantic_token": {"status": "pending"},
599
+ "sovits_train": {"status": "pending"},
600
+ "gpt_train": {"status": "pending"}
601
+ }
602
+ }
603
+ ```
604
+
605
+ **Step 2: Execute Stages Individually**
606
+
607
+ **Stage 1 - Audio Slice**:
608
+ ```bash
609
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/audio_slice \
610
+ -H "Content-Type: application/json" \
611
+ -d '{
612
+ "threshold": -34,
613
+ "min_length": 4000,
614
+ "min_interval": 300,
615
+ "hop_size": 10,
616
+ "max_silence_kept": 500
617
+ }'
618
+ ```
619
+
620
+ **Parameters**:
621
+ - `threshold`: dB threshold for silence detection (-60 to 0, default: -34)
622
+ - `min_length`: Minimum segment length in ms (1000-10000, default: 4000)
623
+ - `min_interval`: Minimum silence interval in ms (0-3000, default: 300)
624
+ - `hop_size`: Analysis window hop size in ms (default: 10)
625
+ - `max_silence_kept`: Maximum silence to keep in ms (default: 500)
626
+
627
+ **Stage 2 - ASR**:
628
+ ```bash
629
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/asr \
630
+ -H "Content-Type: application/json" \
631
+ -d '{
632
+ "model": "达摩 ASR (中文)",
633
+ "language": "zh"
634
+ }'
635
+ ```
636
+
637
+ **ASR Models**:
638
+ - `达摩 ASR (中文)`: DamoASR for Chinese (best for Chinese)
639
+ - `Faster Whisper (多语言)`: Faster Whisper for multilingual
640
+
641
+ **Stage 3 - Text Feature**:
642
+ ```bash
643
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/text_feature \
644
+ -H "Content-Type: application/json" \
645
+ -d '{
646
+ "language": "zh"
647
+ }'
648
+ ```
649
+
650
+ **Stage 4 - Hubert Feature**:
651
+ ```bash
652
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/hubert_feature \
653
+ -H "Content-Type: application/json" \
654
+ -d '{}'
655
+ ```
656
+
657
+ **Stage 5 - Semantic Token**:
658
+ ```bash
659
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/semantic_token \
660
+ -H "Content-Type: application/json" \
661
+ -d '{}'
662
+ ```
663
+
664
+ **Stage 6 - SoVITS Train**:
665
+ ```bash
666
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/sovits_train \
667
+ -H "Content-Type: application/json" \
668
+ -d '{
669
+ "total_epoch": 8,
670
+ "batch_size": 4,
671
+ "save_every_epoch": 4,
672
+ "text_low_lr_rate": 0.4,
673
+ "if_save_latest": true,
674
+ "if_save_every_weights": true,
675
+ "version": "v2"
676
+ }'
677
+ ```
678
+
679
+ **Parameters**:
680
+ - `total_epoch`: Total training epochs (4-32, default: 8)
681
+ - `batch_size`: Batch size (1-40, default: 4)
682
+ - `save_every_epoch`: Save checkpoint every N epochs (1-50, default: 4)
683
+ - `text_low_lr_rate`: Text encoder learning rate multiplier (0.2-1.0, default: 0.4)
684
+
685
+ **Stage 7 - GPT Train**:
686
+ ```bash
687
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/gpt_train \
688
+ -H "Content-Type: application/json" \
689
+ -d '{
690
+ "total_epoch": 15,
691
+ "batch_size": 4,
692
+ "save_every_epoch": 5,
693
+ "if_save_latest": true,
694
+ "if_save_every_weights": true,
695
+ "version": "v2"
696
+ }'
697
+ ```
698
+
699
+ **Step 3: Monitor Stage Progress**
700
+
701
+ Each stage provides real-time progress via SSE:
702
+
703
+ ```bash
704
+ curl -N http://localhost:8000/api/v1/experiments/exp-uuid/stages/sovits_train/progress
705
+ ```
706
+
707
+ **Progress Events**:
708
+ ```
709
+ event: progress
710
+ data: {"epoch": 2, "total_epochs": 8, "progress": 25, "loss": 0.234}
711
+
712
+ event: progress
713
+ data: {"epoch": 4, "total_epochs": 8, "progress": 50, "loss": 0.189}
714
+
715
+ event: complete
716
+ data: {"status": "completed", "final_loss": 0.142}
717
+ ```
718
+
719
+ #### Using the UI
720
+
721
+ **Step 1: Create New Experiment**
722
+ - Navigate to "Advanced Mode" page
723
+ - Click "New Experiment"
724
+ - Enter experiment name and upload audio
725
+
726
+ **Step 2: Configure Each Stage**
727
+ - Click on a stage card to expand settings
728
+ - Adjust parameters (or use preset defaults)
729
+ - Click "Run Stage" to execute
730
+
731
+ **Step 3: Monitor Pipeline**
732
+ - Visual pipeline diagram shows stage status
733
+ - Green: Completed, Blue: Running, Gray: Pending
734
+ - Click any stage to view detailed logs
735
+
736
+ **Step 4: Iterate and Refine**
737
+ - Review results after each stage
738
+ - Adjust parameters and re-run if needed
739
+ - Export final model when satisfied
740
+
741
+ **Advanced Tips**:
742
+ - Use lower `batch_size` (2-4) on GPUs with limited memory
743
+ - Increase `total_epoch` for better quality with sufficient data
744
+ - Save checkpoints frequently (`save_every_epoch`) to recover from interruptions
745
+ - Monitor loss values - should decrease over epochs
746
+
747
+ ### 6.4 Text-to-Speech Generation
748
+
749
+ Once you have trained a voice, you can use it to generate speech from text.
750
+
751
+ #### Using the API
752
+
753
+ **Basic TTS Request**:
754
+ ```bash
755
+ curl -X POST http://localhost:8000/api/v1/inference/tts \
756
+ -H "Content-Type: application/json" \
757
+ -d '{
758
+ "text": "Hello, this is a test of text-to-speech synthesis.",
759
+ "voice_id": "voice-uuid-here",
760
+ "speed": 1.0,
761
+ "emotion": "auto"
762
+ }'
763
+ ```
764
+
765
+ **Response**:
766
+ ```json
767
+ {
768
+ "audio_url": "http://localhost:8000/api/v1/files/audio-uuid-here",
769
+ "duration": 3.2,
770
+ "format": "wav"
771
+ }
772
+ ```
773
+
774
+ **Parameters**:
775
+ - `text` (required): Text to synthesize (max 5000 characters)
776
+ - `voice_id` (required): UUID of trained voice
777
+ - `speed` (optional): Speaking speed multiplier (0.5 - 2.0, default: 1.0)
778
+ - `emotion` (optional): Emotion style (auto, neutral, happy, sad)
779
+ - `seed` (optional): Random seed for reproducibility
780
+
781
+ **Download Generated Audio**:
782
+ ```bash
783
+ curl -o output.wav http://localhost:8000/api/v1/files/audio-uuid-here
784
+ ```
785
+
786
+ #### Using the UI
787
+
788
+ **Step 1: Navigate to TTS Page**
789
+ - Click "Text to Speech" in sidebar
790
+ - Or use keyboard shortcut: `Ctrl/Cmd + T`
791
+
792
+ **Step 2: Select Voice**
793
+ - Open voice dropdown
794
+ - Select a trained voice from the list
795
+ - Preview button lets you hear a sample
796
+
797
+ **Step 3: Enter Text**
798
+ - Type or paste text into the text area
799
+ - Character count shown (max 5000)
800
+ - Supports multi-line text
801
+
802
+ **Step 4: Adjust Settings**
803
+ - **Speed**: Drag slider or enter value (0.5x - 2.0x)
804
+ - 0.5x: Very slow, clear enunciation
805
+ - 1.0x: Natural speaking pace
806
+ - 1.5x: Fast, still intelligible
807
+ - 2.0x: Very fast
808
+ - **Emotion**: Select from dropdown (if supported by model)
809
+ - Auto: Infer from text
810
+ - Neutral: Flat, factual delivery
811
+ - Happy: Upbeat, positive tone
812
+ - Sad: Somber, melancholic tone
813
+
814
+ **Step 5: Generate**
815
+ - Click "Generate" button
816
+ - Processing takes 2-5 seconds
817
+ - Progress indicator shown
818
+
819
+ **Step 6: Listen and Download**
820
+ - Audio player appears automatically
821
+ - Click play button to listen
822
+ - Click download button to save WAV file
823
+ - Share button to copy shareable link
824
+
825
+ **Text Guidelines**:
826
+ - Use proper punctuation for natural pauses
827
+ - Break long text into sentences
828
+ - Use quotation marks for dialogue
829
+ - All-caps for emphasis (use sparingly)
830
+
831
+ **Tips for Natural Speech**:
832
+ - Add commas for breath pauses
833
+ - Use ellipsis (...) for trailing off
834
+ - Question marks affect intonation
835
+ - Exclamation points add emphasis
836
+
837
+ ### 6.5 Voice Library Management
838
+
839
+ The Voice Library is where all your trained voices are stored and managed.
840
+
841
+ #### Using the API
842
+
843
+ **List All Voices**:
844
+ ```bash
845
+ curl http://localhost:8000/api/v1/files?purpose=training
846
+ ```
847
+
848
+ **Response**:
849
+ ```json
850
+ {
851
+ "files": [
852
+ {
853
+ "id": "voice-uuid-1",
854
+ "filename": "john_voice",
855
+ "created_at": "2026-01-20T10:30:00Z",
856
+ "size": 1234567,
857
+ "metadata": {
858
+ "language": "zh",
859
+ "quality": "standard",
860
+ "duration": 12.5
861
+ }
862
+ },
863
+ {
864
+ "id": "voice-uuid-2",
865
+ "filename": "mary_voice",
866
+ "created_at": "2026-01-21T14:20:00Z",
867
+ "size": 2345678,
868
+ "metadata": {
869
+ "language": "en",
870
+ "quality": "high",
871
+ "duration": 18.3
872
+ }
873
+ }
874
+ ]
875
+ }
876
+ ```
877
+
878
+ **Get Voice Details**:
879
+ ```bash
880
+ curl http://localhost:8000/api/v1/files/voice-uuid-1
881
+ ```
882
+
883
+ **Delete Voice**:
884
+ ```bash
885
+ curl -X DELETE http://localhost:8000/api/v1/files/voice-uuid-1
886
+ ```
887
+
888
+ **Export Voice Model**:
889
+ ```bash
890
+ curl -o voice_model.zip http://localhost:8000/api/v1/voices/voice-uuid-1/export
891
+ ```
892
+
893
+ #### Using the UI
894
+
895
+ **Browse Voice Library**:
896
+ - Navigate to "Voice Library" page
897
+ - Voices displayed as cards with:
898
+ - Voice name
899
+ - Language and quality badges
900
+ - Creation date
901
+ - Sample duration
902
+ - Preview waveform
903
+
904
+ **Voice Card Actions**:
905
+ - **Play**: Listen to voice sample
906
+ - **Edit**: Rename or update metadata
907
+ - **Export**: Download voice model files
908
+ - **Delete**: Remove voice (with confirmation)
909
+
910
+ **Search and Filter**:
911
+ - Search bar: Filter by voice name
912
+ - Language filter: Show only specific languages
913
+ - Quality filter: Show only specific quality presets
914
+ - Sort options:
915
+ - Name (A-Z)
916
+ - Date created (newest first)
917
+ - Date created (oldest first)
918
+ - File size
919
+
920
+ **Bulk Operations**:
921
+ - Select multiple voices (Shift+Click)
922
+ - Export selected voices as ZIP
923
+ - Delete selected voices
924
+ - Tag selected voices
925
+
926
+ **Voice Details Panel**:
927
+ Click on any voice card to view:
928
+ - Full training parameters
929
+ - Training history and logs
930
+ - Model file sizes
931
+ - Sample audio clips
932
+ - Export and sharing options
933
+
934
+ **Organization Tips**:
935
+ - Use descriptive names (e.g., "John_Professional", "Mary_Casual")
936
+ - Tag voices by project or use case
937
+ - Export important voices as backups
938
+ - Delete test voices to save space
939
+
940
+ ---
941
+
942
+ ## API Reference
943
+
944
+ ### Quick Mode Endpoints
945
+
946
+ #### Tasks
947
+
948
+ **Create Task** - Start a one-click voice training task
949
+ ```http
950
+ POST /api/v1/tasks
951
+ Content-Type: application/json
952
+
953
+ {
954
+ "exp_name": "string",
955
+ "audio_file_id": "uuid",
956
+ "options": {
957
+ "version": "v2",
958
+ "language": "zh|en|ja",
959
+ "quality": "fast|standard|high"
960
+ }
961
+ }
962
+ ```
963
+
964
+ **List Tasks** - Get all tasks
965
+ ```http
966
+ GET /api/v1/tasks?status=queued|running|completed|failed
967
+ ```
968
+
969
+ **Get Task** - Get specific task details
970
+ ```http
971
+ GET /api/v1/tasks/{task_id}
972
+ ```
973
+
974
+ **Cancel Task** - Cancel a running task
975
+ ```http
976
+ DELETE /api/v1/tasks/{task_id}
977
+ ```
978
+
979
+ **Task Progress** - Real-time progress via SSE
980
+ ```http
981
+ GET /api/v1/tasks/{task_id}/progress
982
+ Accept: text/event-stream
983
+ ```
984
+
985
+ ### Advanced Mode Endpoints
986
+
987
+ #### Experiments
988
+
989
+ **Create Experiment** - Initialize a new training experiment
990
+ ```http
991
+ POST /api/v1/experiments
992
+ Content-Type: application/json
993
+
994
+ {
995
+ "exp_name": "string",
996
+ "version": "v2",
997
+ "audio_file_id": "uuid"
998
+ }
999
+ ```
1000
+
1001
+ **Get Experiment** - Get experiment details
1002
+ ```http
1003
+ GET /api/v1/experiments/{exp_id}
1004
+ ```
1005
+
1006
+ **List Experiments** - Get all experiments
1007
+ ```http
1008
+ GET /api/v1/experiments?status=pending|running|completed
1009
+ ```
1010
+
1011
+ **Delete Experiment** - Delete experiment and all data
1012
+ ```http
1013
+ DELETE /api/v1/experiments/{exp_id}
1014
+ ```
1015
+
1016
+ #### Stages
1017
+
1018
+ **Execute Stage** - Run a specific pipeline stage
1019
+ ```http
1020
+ POST /api/v1/experiments/{exp_id}/stages/{stage_type}
1021
+ Content-Type: application/json
1022
+
1023
+ {
1024
+ // Stage-specific parameters
1025
+ }
1026
+ ```
1027
+
1028
+ **Stage Types**:
1029
+ - `audio_slice`
1030
+ - `asr`
1031
+ - `text_feature`
1032
+ - `hubert_feature`
1033
+ - `semantic_token`
1034
+ - `sovits_train`
1035
+ - `gpt_train`
1036
+
1037
+ **Get Stage Status** - Get status of a specific stage
1038
+ ```http
1039
+ GET /api/v1/experiments/{exp_id}/stages/{stage_type}
1040
+ ```
1041
+
1042
+ **Get All Stage Statuses** - Get status of all stages
1043
+ ```http
1044
+ GET /api/v1/experiments/{exp_id}/stages
1045
+ ```
1046
+
1047
+ **Stage Progress** - Real-time stage progress via SSE
1048
+ ```http
1049
+ GET /api/v1/experiments/{exp_id}/stages/{stage_type}/progress
1050
+ Accept: text/event-stream
1051
+ ```
1052
+
1053
+ **Get Stage Schema** - Get parameters schema for a stage
1054
+ ```http
1055
+ GET /api/v1/stages/{stage_type}/schema
1056
+ ```
1057
+
1058
+ ### Common Endpoints
1059
+
1060
+ #### Files
1061
+
1062
+ **Upload File** - Upload audio or data file
1063
+ ```http
1064
+ POST /api/v1/files
1065
+ Content-Type: multipart/form-data
1066
+
1067
+ file: binary
1068
+ purpose: training|inference
1069
+ ```
1070
+
1071
+ **List Files** - Get all uploaded files
1072
+ ```http
1073
+ GET /api/v1/files?purpose=training|inference
1074
+ ```
1075
+
1076
+ **Get File** - Download a specific file
1077
+ ```http
1078
+ GET /api/v1/files/{file_id}
1079
+ ```
1080
+
1081
+ **Delete File** - Delete a file
1082
+ ```http
1083
+ DELETE /api/v1/files/{file_id}
1084
+ ```
1085
+
1086
+ #### Inference
1087
+
1088
+ **Text-to-Speech** - Generate speech from text
1089
+ ```http
1090
+ POST /api/v1/inference/tts
1091
+ Content-Type: application/json
1092
+
1093
+ {
1094
+ "text": "string",
1095
+ "voice_id": "uuid",
1096
+ "speed": 1.0,
1097
+ "emotion": "auto|neutral|happy|sad",
1098
+ "seed": 42
1099
+ }
1100
+ ```
1101
+
1102
+ **Get Voice Info** - Get voice model information
1103
+ ```http
1104
+ GET /api/v1/voices/{voice_id}
1105
+ ```
1106
+
1107
+ #### Configuration
1108
+
1109
+ **Get Stage Presets** - Get preset configurations for stages
1110
+ ```http
1111
+ GET /api/v1/stages/presets
1112
+ ```
1113
+
1114
+ **Health Check** - Check API server health
1115
+ ```http
1116
+ GET /health
1117
+ ```
1118
+
1119
+ **Full OpenAPI specification available at**: http://localhost:8000/openapi.json
1120
+
1121
+ ---
1122
+
1123
+ ## Troubleshooting
1124
+
1125
+ ### Backend Issues
1126
+
1127
+ #### Port Already in Use
1128
+
1129
+ **Symptom**: Error message `Address already in use` when starting server.
1130
+
1131
+ **Solution 1** - Change port in `.env`:
1132
+ ```bash
1133
+ echo "API_PORT=8001" >> .env
1134
+ python app/main.py
1135
+ ```
1136
+
1137
+ **Solution 2** - Find and kill process using port:
1138
+ ```bash
1139
+ # macOS/Linux
1140
+ lsof -ti:8000 | xargs kill -9
1141
+
1142
+ # Windows
1143
+ netstat -ano | findstr :8000
1144
+ taskkill /PID <pid> /F
1145
+ ```
1146
+
1147
+ #### Database Errors
1148
+
1149
+ **Symptom**: `sqlite3.OperationalError` or database corruption messages.
1150
+
1151
+ **Solution** - Reset database:
1152
+ ```bash
1153
+ # Backup existing database (optional)
1154
+ cp ~/.moyoyo-tts/data/tasks.db ~/.moyoyo-tts/data/tasks.db.backup
1155
+
1156
+ # Remove corrupted database
1157
+ rm ~/.moyoyo-tts/data/tasks.db
1158
+
1159
+ # Restart API server (database will be recreated)
1160
+ python app/main.py
1161
+ ```
1162
+
1163
+ #### Training Fails Immediately
1164
+
1165
+ **Symptom**: Training starts but fails within seconds.
1166
+
1167
+ **Diagnosis**:
1168
+ ```bash
1169
+ # Check GPU availability
1170
+ python -c "import torch; print(torch.cuda.is_available())"
1171
+
1172
+ # Check CUDA version
1173
+ python -c "import torch; print(torch.version.cuda)"
1174
+
1175
+ # Check disk space
1176
+ df -h
1177
+ ```
1178
+
1179
+ **Solutions**:
1180
+ 1. **No GPU**: System will use CPU (slower but works)
1181
+ 2. **CUDA mismatch**: Reinstall PyTorch with correct CUDA version
1182
+ 3. **Out of disk space**: Free up at least 10GB
1183
+ 4. **Out of memory**: Reduce `batch_size` in training parameters
1184
+
1185
+ #### Python Environment Issues
1186
+
1187
+ **Symptom**: `ModuleNotFoundError` or import errors.
1188
+
1189
+ **Solution**:
1190
+ ```bash
1191
+ # Verify environment is activated
1192
+ which python # Should show path in .venv
1193
+
1194
+ # Reinstall all dependencies
1195
+ uv sync --reinstall
1196
+
1197
+ # Or force reinstall from scratch
1198
+ rm -rf .venv
1199
+ uv sync
1200
+
1201
+ # Check for missing packages
1202
+ uv pip list
1203
+ ```
1204
+
1205
+ ### Frontend Issues
1206
+
1207
+ #### Cannot Connect to API
1208
+
1209
+ **Symptom**: Frontend shows "Cannot connect to server" error.
1210
+
1211
+ **Diagnosis**:
1212
+ ```bash
1213
+ # Check if backend is running
1214
+ curl http://localhost:8000/health
1215
+
1216
+ # Check network connectivity
1217
+ ping localhost
1218
+ ```
1219
+
1220
+ **Solutions**:
1221
+ 1. **Backend not running**: Start backend server (see section 5.1)
1222
+ 2. **Wrong port**: Check backend is on port 8000
1223
+ 3. **Firewall**: Allow connections to localhost:8000
1224
+ 4. **CORS error**: Check CORS settings in backend `.env`
1225
+
1226
+ #### Models Not Downloading
1227
+
1228
+ **Symptom**: Model download fails or hangs indefinitely.
1229
+
1230
+ **Solutions**:
1231
+ 1. **Check internet connection**:
1232
+ ```bash
1233
+ curl -I https://www.modelscope.cn
1234
+ ```
1235
+
1236
+ 2. **Check disk space**:
1237
+ ```bash
1238
+ df -h # Need ~10GB free
1239
+ ```
1240
+
1241
+ 3. **Manual download**: See section 3.3 for manual installation
1242
+
1243
+ 4. **Proxy issues**: Configure proxy settings:
1244
+ ```bash
1245
+ export http_proxy=http://proxy.example.com:8080
1246
+ export https_proxy=http://proxy.example.com:8080
1247
+ ```
1248
+
1249
+ #### Electron App Won't Start
1250
+
1251
+ **Symptom**: App crashes on launch or shows blank screen.
1252
+
1253
+ **Solution 1** - Clear cache and rebuild:
1254
+ ```bash
1255
+ # Navigate to frontend directory
1256
+ cd tts-voice-app
1257
+
1258
+ # Clear cache
1259
+ rm -rf node_modules package-lock.json dist .vite
1260
+
1261
+ # Reinstall dependencies
1262
+ npm install
1263
+
1264
+ # Rebuild
1265
+ npm run dev
1266
+ ```
1267
+
1268
+ **Solution 2** - Check Node.js version:
1269
+ ```bash
1270
+ node --version # Should be >= 18.x
1271
+
1272
+ # Update Node.js if needed
1273
+ nvm install 18
1274
+ nvm use 18
1275
+ ```
1276
+
1277
+ **Solution 3** - Check Electron logs:
1278
+ ```bash
1279
+ # macOS
1280
+ ~/Library/Logs/tts-voice-app/
1281
+
1282
+ # Linux
1283
+ ~/.config/tts-voice-app/logs/
1284
+
1285
+ # Windows
1286
+ %APPDATA%\tts-voice-app\logs\
1287
+ ```
1288
+
1289
+ ### Common Errors
1290
+
1291
+ #### "PYTHONPATH not set" Error
1292
+
1293
+ **Symptom**: Import errors related to `GPT_SoVITS` module.
1294
+
1295
+ **Cause**: The API server needs to find the main project directory.
1296
+
1297
+ **Solution**: The API automatically sets `PYTHONPATH`, but verify:
1298
+ ```bash
1299
+ # Check project structure
1300
+ ls GPT-SoVITS/ # Should contain *.py files
1301
+
1302
+ # Set manually if needed
1303
+ export PYTHONPATH=/Users/coldish/workspace/GPT-SoVITS:$PYTHONPATH
1304
+ ```
1305
+
1306
+ #### "Model not found" Error
1307
+
1308
+ **Symptom**: Training fails with "Cannot find pretrained model" message.
1309
+
1310
+ **Diagnosis**:
1311
+ ```bash
1312
+ # Check if models exist
1313
+ ls GPT_SoVITS/pretrained_models/
1314
+ # Should show: s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt, s2G488k.pth, s2D488k.pth
1315
+ ```
1316
+
1317
+ **Solution**: Download pretrained models (see section 3.3):
1318
+ ```bash
1319
+ wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/pretrained_models.zip
1320
+ unzip -q -o pretrained_models.zip -d GPT_SoVITS
1321
+ ```
1322
+
1323
+ #### "Out of memory" Error
1324
+
1325
+ **Symptom**: Training crashes with `CUDA out of memory` or `MemoryError`.
1326
+
1327
+ **Solutions**:
1328
+ 1. **Reduce batch size**:
1329
+ ```json
1330
+ {
1331
+ "batch_size": 2 // Reduce from 4 to 2
1332
+ }
1333
+ ```
1334
+
1335
+ 2. **Close other applications**: Free up GPU/RAM
1336
+
1337
+ 3. **Use CPU mode**: Slower but uses system RAM instead of GPU:
1338
+ ```bash
1339
+ # Set environment variable
1340
+ export CUDA_VISIBLE_DEVICES=""
1341
+ python app/main.py
1342
+ ```
1343
+
1344
+ 4. **Increase system swap** (Linux):
1345
+ ```bash
1346
+ sudo dd if=/dev/zero of=/swapfile bs=1G count=8
1347
+ sudo mkswap /swapfile
1348
+ sudo swapon /swapfile
1349
+ ```
1350
+
1351
+ #### "NLTK Data Not Found" Error
1352
+
1353
+ **Symptom**: Text processing fails with NLTK data errors.
1354
+
1355
+ **Solution**: Download NLTK data (see section 3.3):
1356
+ ```bash
1357
+ wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/nltk_data.zip
1358
+ unzip -q -o nltk_data.zip -d .venv/
1359
+ ```
1360
+
1361
+ #### Audio Quality Issues
1362
+
1363
+ **Symptom**: Generated audio sounds robotic, distorted, or unclear.
1364
+
1365
+ **Solutions**:
1366
+ 1. **Use better training data**:
1367
+ - High-quality audio (48kHz WAV preferred)
1368
+ - Clear voice, minimal background noise
1369
+ - 10-15 seconds of audio
1370
+ - Natural, conversational speech
1371
+
1372
+ 2. **Increase training quality**:
1373
+ ```json
1374
+ {
1375
+ "quality": "high" // Use high instead of standard
1376
+ }
1377
+ ```
1378
+
1379
+ 3. **Train longer**:
1380
+ ```json
1381
+ {
1382
+ "total_epoch": 16 // Increase from 8 to 16
1383
+ }
1384
+ ```
1385
+
1386
+ 4. **Check reference audio**: Ensure uploaded audio is not corrupted
1387
+
1388
+ ---
1389
+
1390
+ ## Development
1391
+
1392
+ ### Backend Development
1393
+
1394
+ #### Running with Hot-Reload
1395
+
1396
+ Hot-reload automatically restarts the server when code changes are detected:
1397
+
1398
+ ```bash
1399
+ # Using uvicorn
1400
+ uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
1401
+
1402
+ # With custom reload directories
1403
+ uvicorn app.main:app --reload --reload-dir api_server/app
1404
+ ```
1405
+
1406
+ #### Running Tests
1407
+
1408
+ ```bash
1409
+ # Navigate to project root
1410
+ cd GPT-SoVITS
1411
+
1412
+ # Run all tests
1413
+ pytest api_server/tests/
1414
+
1415
+ # Run specific test file
1416
+ pytest api_server/tests/test_tasks.py
1417
+
1418
+ # Run with coverage report
1419
+ pytest --cov=api_server/app --cov-report=html
1420
+
1421
+ # View coverage report
1422
+ open htmlcov/index.html
1423
+ ```
1424
+
1425
+ #### Code Formatting
1426
+
1427
+ ```bash
1428
+ # Format Python code with Black
1429
+ black api_server/
1430
+
1431
+ # Sort imports with isort
1432
+ isort api_server/
1433
+
1434
+ # Lint with flake8
1435
+ flake8 api_server/
1436
+
1437
+ # Type checking with mypy
1438
+ mypy api_server/
1439
+ ```
1440
+
1441
+ #### Database Migrations
1442
+
1443
+ ```bash
1444
+ # Generate migration
1445
+ alembic revision --autogenerate -m "Add new column"
1446
+
1447
+ # Apply migrations
1448
+ alembic upgrade head
1449
+
1450
+ # Rollback migration
1451
+ alembic downgrade -1
1452
+ ```
1453
+
1454
+ #### Adding New Endpoints
1455
+
1456
+ 1. Create route in `api_server/app/routes/`
1457
+ 2. Add business logic in `api_server/app/services/`
1458
+ 3. Update models in `api_server/app/models/`
1459
+ 4. Add tests in `api_server/tests/`
1460
+ 5. Update OpenAPI documentation
1461
+
1462
+ ### Frontend Development
1463
+
1464
+ #### Development Mode
1465
+
1466
+ Development mode enables hot module replacement (HMR) for instant feedback:
1467
+
1468
+ ```bash
1469
+ # Start development server
1470
+ npm run dev
1471
+
1472
+ # Start with custom port
1473
+ npm run dev -- --port 5174
1474
+
1475
+ # Start with debug logging
1476
+ DEBUG=electron* npm run dev
1477
+ ```
1478
+
1479
+ #### Type Checking
1480
+
1481
+ ```bash
1482
+ # Run Vue type checking
1483
+ npm run type-check
1484
+
1485
+ # Run TypeScript compiler check
1486
+ npx tsc --noEmit
1487
+
1488
+ # Watch mode for continuous checking
1489
+ npm run type-check -- --watch
1490
+ ```
1491
+
1492
+ #### Building for Production
1493
+
1494
+ **Development Build** (with source maps):
1495
+ ```bash
1496
+ npm run build
1497
+ ```
1498
+
1499
+ **Production Build** (optimized):
1500
+ ```bash
1501
+ npm run build:prod
1502
+ ```
1503
+
1504
+ **Preview Production Build**:
1505
+ ```bash
1506
+ npm run preview
1507
+ ```
1508
+
1509
+ #### Building Distribution Packages
1510
+
1511
+ Build platform-specific installers:
1512
+
1513
+ **macOS**:
1514
+ ```bash
1515
+ npm run build:mac
1516
+ # Output: tts-voice-app/release/MoYoYo-TTS-1.0.0.dmg
1517
+ ```
1518
+
1519
+ **Windows**:
1520
+ ```bash
1521
+ npm run build:win
1522
+ # Output: tts-voice-app/release/MoYoYo-TTS-Setup-1.0.0.exe
1523
+ ```
1524
+
1525
+ **Linux**:
1526
+ ```bash
1527
+ npm run build:linux
1528
+ # Output: tts-voice-app/release/moyoyo-tts-1.0.0.AppImage
1529
+ ```
1530
+
1531
+ **Build All Platforms** (requires platform-specific dependencies):
1532
+ ```bash
1533
+ npm run build:all
1534
+ ```
1535
+
1536
+ **Build Configuration**:
1537
+ Edit `tts-voice-app/electron-builder.yml` to customize:
1538
+ - App name and ID
1539
+ - Icon files
1540
+ - File associations
1541
+ - Auto-update settings
1542
+ - Code signing
1543
+
1544
+ #### Component Development
1545
+
1546
+ **Create New Component**:
1547
+ ```bash
1548
+ # Navigate to components directory
1549
+ cd tts-voice-app/src/components
1550
+
1551
+ # Create component file
1552
+ touch MyComponent.vue
1553
+ ```
1554
+
1555
+ **Component Template**:
1556
+ ```vue
1557
+ <template>
1558
+ <div class="my-component">
1559
+ <!-- Template here -->
1560
+ </div>
1561
+ </template>
1562
+
1563
+ <script setup lang="ts">
1564
+ import { ref } from 'vue'
1565
+
1566
+ // Component logic here
1567
+ const myValue = ref('')
1568
+ </script>
1569
+
1570
+ <style scoped>
1571
+ .my-component {
1572
+ /* Styles here */
1573
+ }
1574
+ </style>
1575
+ ```
1576
+
1577
+ #### State Management
1578
+
1579
+ The app uses Vue Composition API with Pinia stores:
1580
+
1581
+ ```typescript
1582
+ // Create new store in src/stores/myStore.ts
1583
+ import { defineStore } from 'pinia'
1584
+
1585
+ export const useMyStore = defineStore('myStore', {
1586
+ state: () => ({
1587
+ items: []
1588
+ }),
1589
+ getters: {
1590
+ itemCount: (state) => state.items.length
1591
+ },
1592
+ actions: {
1593
+ addItem(item) {
1594
+ this.items.push(item)
1595
+ }
1596
+ }
1597
+ })
1598
+ ```
1599
+
1600
+ #### Debugging
1601
+
1602
+ **Vue DevTools**:
1603
+ - Automatically enabled in development mode
1604
+ - Access via browser DevTools panel
1605
+
1606
+ **Electron DevTools**:
1607
+ ```bash
1608
+ # Open DevTools on startup
1609
+ DEBUG_ELECTRON=true npm run dev
1610
+ ```
1611
+
1612
+ **Console Logging**:
1613
+ ```typescript
1614
+ // Main process logs
1615
+ console.log('Main:', data)
1616
+
1617
+ // Renderer process logs
1618
+ console.log('Renderer:', data)
1619
+
1620
+ // Check logs in terminal and DevTools console
1621
+ ```
1622
+
1623
+ #### Testing
1624
+
1625
+ ```bash
1626
+ # Run unit tests
1627
+ npm run test
1628
+
1629
+ # Run with coverage
1630
+ npm run test:coverage
1631
+
1632
+ # Run E2E tests
1633
+ npm run test:e2e
1634
+
1635
+ # Watch mode
1636
+ npm run test:watch
1637
+ ```
1638
+
1639
+ ### Project Structure
1640
+
1641
+ ```
1642
+ GPT-SoVITS/
1643
+ ├── api_server/ # Backend API
1644
+ │ ├── app/
1645
+ │ │ ├── main.py # FastAPI application
1646
+ │ │ ├── routes/ # API endpoints
1647
+ │ │ ├── services/ # Business logic
1648
+ │ │ ├── models/ # Data models
1649
+ │ │ └── utils/ # Utilities
1650
+ │ └── tests/ # Backend tests
1651
+ ├── tts-voice-app/ # Frontend Electron app
1652
+ │ ├── src/
1653
+ │ │ ├── main/ # Electron main process
1654
+ │ │ ├── renderer/ # Vue UI
1655
+ │ │ ├── components/ # Vue components
1656
+ │ │ └── stores/ # State management
1657
+ │ └── dist/ # Build output
1658
+ ├── GPT_SoVITS/ # Core ML models
1659
+ │ ├── pretrained_models/ # Base models
1660
+ │ └── text/ # Text processing
1661
+ └── .env # Configuration
1662
+ ```
1663
+
1664
+ ### Contribution Guidelines
1665
+
1666
+ 1. **Fork and clone the repository**
1667
+ 2. **Create feature branch**: `git checkout -b feature/my-feature`
1668
+ 3. **Make changes** and add tests
1669
+ 4. **Run tests and linting**: `pytest && black . && isort .`
1670
+ 5. **Commit changes**: `git commit -m "feat: add my feature"`
1671
+ 6. **Push to branch**: `git push origin feature/my-feature`
1672
+ 7. **Create Pull Request** with description
1673
+
1674
+ **Commit Message Format**:
1675
+ - `feat:` New feature
1676
+ - `fix:` Bug fix
1677
+ - `docs:` Documentation changes
1678
+ - `style:` Code style changes
1679
+ - `refactor:` Code refactoring
1680
+ - `test:` Test changes
1681
+ - `chore:` Build/tooling changes
1682
+
1683
+ ---
1684
+
1685
+ ## Additional Resources
1686
+
1687
+ ### Documentation
1688
+
1689
+ - **API Documentation**: http://localhost:8000/docs
1690
+ - **Design Document**: `frontend_design.md`
1691
+ - **Development Guide**: `development.md`
1692
+ - **OpenAPI Specification**: `openapi.json`
1693
+
1694
+ ### External Links
1695
+
1696
+ - **GPT-SoVITS Repository**: https://github.com/RVC-Boss/GPT-SoVITS
1697
+ - **ModelScope Models**: https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained
1698
+ - **FastAPI Documentation**: https://fastapi.tiangolo.com
1699
+ - **Vue 3 Documentation**: https://vuejs.org
1700
+ - **Electron Documentation**: https://www.electronjs.org
1701
+
1702
+ ### Support
1703
+
1704
+ For issues, questions, or feature requests:
1705
+ 1. Check this documentation first
1706
+ 2. Search existing GitHub issues
1707
+ 3. Create a new issue with detailed description
1708
+ 4. Include error messages, logs, and system info
1709
+
1710
+ ### License
1711
+
1712
+ This project is licensed under the MIT License. See `LICENSE` file for details.
1713
+
1714
+ ---
1715
+
1716
+ **Last Updated**: 2026-01-23
1717
+ **Version**: 1.0.0
1718
+ **Maintainers**: MoYoYo.tts Development Team
USAGE_CN.md ADDED
@@ -0,0 +1,1776 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MoYoYo.tts 使用手册
2
+
3
+ ## 目录
4
+
5
+ - [简介](#简介)
6
+ - [快速开始](#快速开始)
7
+ - [系统要求](#系统要求)
8
+ - [安装指南](#安装指南)
9
+ - [安装 uv 包管理器](#31-安装-uv-包管理器)
10
+ - [Python 环境设置](#32-python-环境设置)
11
+ - [下载必需的数据文件](#33-下载必需的数据文件)
12
+ - [前端设置](#34-前端设置)
13
+ - [配置](#配置)
14
+ - [后端 API 配置](#41-后端-api-配置)
15
+ - [前端配置](#42-前端配置)
16
+ - [运行应用](#运行应用)
17
+ - [启动后端 API 服务器](#51-启动后端-api-服务器)
18
+ - [启动前端 Electron 应用](#52-启动前端-electron-应用)
19
+ - [使用指南](#使用指南)
20
+ - [首次设置](#61-首次设置)
21
+ - [快速模式 - 初学者声音克隆](#62-快速模式---初学者声音克隆)
22
+ - [高级模式 - 专家声音克隆](#63-高级模式---专家声音克隆)
23
+ - [文本转语音生成](#64-文本转语音生成)
24
+ - [声音库管理](#65-声音库管理)
25
+ - [API 参考](#api-参考)
26
+ - [故障排除](#故障排除)
27
+ - [开发](#开发)
28
+
29
+ ---
30
+
31
+ ## 简介
32
+
33
+ MoYoYo.tts 是一个综合性的声音克隆和文本转语音系统,结合了:
34
+
35
+ - **后端 API**:基于 FastAPI 的 REST API,用于声音训练和推理
36
+ - **前端应用**:Electron + Vue 桌面应用,具有直观的用户界面
37
+
38
+ 该系统基于 GPT-SoVITS 技术构建,能够使用最少的训练数据(最短 5 秒音频)实现高质量的声音克隆。
39
+
40
+ **目标用户**:
41
+ - 想要创建自定义文本转语音声音的最终用户
42
+ - 将语音合成集成到应用程序中的开发人员
43
+ - 从事声音克隆技术研究的研究人员
44
+
45
+ **主要功能**:
46
+ - 快速模式:为初学者提供一键式声音克隆
47
+ - 高级模式:对训练管道进行精细控制
48
+ - 通过服务器发送事件(SSE)进行实时进度跟踪
49
+ - 多语言支持(中文、英文、日文)
50
+ - 支持 GPU 加速的 CUDA
51
+
52
+ ---
53
+
54
+ ## 快速开始
55
+
56
+ 通过以下基本步骤,在 5 分钟内启动并运行:
57
+
58
+ **1. 安装 uv**(Python 包管理器):
59
+ ```bash
60
+ # macOS/Linux
61
+ curl -LsSf https://astral.sh/uv/install.sh | sh
62
+
63
+ # Windows
64
+ powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
65
+ ```
66
+
67
+ **2. 设置 Python 环境**:
68
+ ```bash
69
+ cd GPT-SoVITS
70
+ uv sync # 创建 .venv 并安装所有依赖项
71
+ source .venv/bin/activate # macOS/Linux
72
+ # 或: .venv\Scripts\activate # Windows
73
+ ```
74
+
75
+ **3. 下载必需的模型**(详见 3.3 节):
76
+ ```bash
77
+ # 下载并解压 NLTK 数据、预训练模型等
78
+ # 或使用前端应用自动下载
79
+ ```
80
+
81
+ **4. 启动后端 API**:
82
+ ```bash
83
+ cd api_server
84
+ python app/main.py
85
+ # API 运行在 http://localhost:8000
86
+ ```
87
+
88
+ **5. 启动前端应用**(在新终端中):
89
+ ```bash
90
+ cd tts-voice-app
91
+ npm install
92
+ npm run dev
93
+ ```
94
+
95
+ 大功告成!Electron 应用将引导您完成模型设置和声音克隆。
96
+
97
+ 有关详细的安装说明、平台特定注意事项和配置选项,请继续阅读以下内容。
98
+
99
+ ---
100
+
101
+ ## 系统要求
102
+
103
+ ### 软件要求
104
+
105
+ | 组件 | 版本 | 说明 |
106
+ |-----------|---------|-------|
107
+ | **Python** | 3.10 - 3.12 | 推荐 Python 3.11 |
108
+ | **Node.js** | >= 18.x | 用于前端开发 |
109
+ | **uv** | 最新版 | Python 包管理器 |
110
+ | **CUDA** | 12.6 或 12.8 | 可选,用于 GPU 加速 |
111
+
112
+ ### 硬件要求
113
+
114
+ | 组件 | 最低配置 | 推荐配置 |
115
+ |-----------|---------|-------------|
116
+ | **CPU** | 双核 | 四核或更好 |
117
+ | **RAM** | 16 GB | 32 GB(用于训练) |
118
+ | **GPU** | 无(CPU 模式) | 配备 6GB+ 显存的 NVIDIA GPU |
119
+ | **存储** | 20 GB 可用空间 | 50 GB+(用于多个声音) |
120
+
121
+ **GPU 说明**:
122
+ - GPU 是可选的,但可显著加快训练速度(5-10 倍)
123
+ - 推荐使用支持 CUDA 12.6 或 12.8 的 NVIDIA GPU
124
+ - 目前不支持 AMD GPU 和 Apple Silicon 进行训练
125
+
126
+ ---
127
+
128
+ ## 安装指南
129
+
130
+ ### 3.1 安装 uv 包管理器
131
+
132
+ uv 是一个快速的 Python 包安装器和解析器,可以替代 pip。
133
+
134
+ **macOS / Linux**:
135
+ ```bash
136
+ curl -LsSf https://astral.sh/uv/install.sh | sh
137
+ ```
138
+
139
+ **Windows**(PowerShell):
140
+ ```powershell
141
+ powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
142
+ ```
143
+
144
+ 验证安装:
145
+ ```bash
146
+ uv --version
147
+ ```
148
+
149
+ ### 3.2 Python 环境设置
150
+
151
+ 该项目使用 `uv` 进行依赖管理,配置文件为 `pyproject.toml`。设置过程简化为一个命令。
152
+
153
+ **步骤 1:进入项目目录**
154
+ ```bash
155
+ cd GPT-SoVITS
156
+ ```
157
+
158
+ **步骤 2:同步所有依赖项**
159
+ ```bash
160
+ # 这个命令将:
161
+ # - 创建虚拟环境(.venv)
162
+ # - 安装 Python 3.11(或您指定的版本)
163
+ # - 从 pyproject.toml 安装所有依赖项
164
+ # - 为您的平台安装正确的 PyTorch 版本
165
+ uv sync
166
+ ```
167
+
168
+ **步骤 3:激活环境**
169
+
170
+ macOS / Linux:
171
+ ```bash
172
+ source .venv/bin/activate
173
+ ```
174
+
175
+ Windows:
176
+ ```cmd
177
+ .venv\Scripts\activate
178
+ ```
179
+
180
+ 您应该在终端提示符中看到 `(.venv)` 前缀。
181
+
182
+ **平台特定的 PyTorch 安装工作原理**:
183
+
184
+ `pyproject.toml` 会自动选择适当的 PyTorch 版本:
185
+ - **macOS**:安装仅 CPU 的 PyTorch(Apple Silicon 使�� CPU 模式)
186
+ - **Linux**:默认安装 CUDA 12.6 PyTorch
187
+ - **Windows**:需要手动选择 CUDA 版本(见下文)
188
+
189
+ **Windows 用户 - 选择 CUDA 版本**:
190
+
191
+ 对于 Windows,您需要明确指定 PyTorch 索引:
192
+
193
+ **CUDA 12.6**(默认):
194
+ ```bash
195
+ uv sync
196
+ ```
197
+
198
+ **CUDA 12.8**:
199
+ ```bash
200
+ uv sync --index pytorch-cu128
201
+ ```
202
+
203
+ **仅 CPU**(无 GPU):
204
+ ```bash
205
+ uv sync --index pytorch-cpu
206
+ ```
207
+
208
+ **验证安装**:
209
+ ```bash
210
+ # 检查 Python 版本
211
+ python --version # 应显示 Python 3.11.x
212
+
213
+ # 检查 PyTorch 安装
214
+ python -c "import torch; print(f'PyTorch: {torch.__version__}')"
215
+
216
+ # 检查 CUDA 可用性(如果您有 GPU)
217
+ python -c "import torch; print(f'CUDA 可用: {torch.cuda.is_available()}')"
218
+ ```
219
+
220
+ ### 3.3 下载必需的数据文件
221
+
222
+ 以下数据文件是文本处理和声音训练所必需的。
223
+
224
+ #### NLTK 数据(文本处理必需)
225
+
226
+ NLTK(自然语言工具包)数据用于文本分词和处理。
227
+
228
+ ```bash
229
+ # 从 ModelScope 下载
230
+ wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/nltk_data.zip
231
+
232
+ # 解压到 Python 环境
233
+ unzip -q -o nltk_data.zip -d .venv/
234
+
235
+ # 清理
236
+ rm nltk_data.zip
237
+ ```
238
+
239
+ **大小**:约 10 MB
240
+ **时间**:< 1 分钟
241
+
242
+ #### Open JTalk 词典(日语必需)
243
+
244
+ Open JTalk 是日语文本转语音处理所必需的。
245
+
246
+ ```bash
247
+ # 获取 pyopenjtalk 安装路径
248
+ PYOPENJTALK_PATH=$(python -c "import os, pyopenjtalk; print(os.path.dirname(pyopenjtalk.__file__))")
249
+
250
+ # 从 ModelScope 下载
251
+ wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/open_jtalk_dic_utf_8-1.11.tar.gz
252
+
253
+ # 解压到 pyopenjtalk 目录
254
+ tar -xzf open_jtalk_dic_utf_8-1.11.tar.gz -C "$PYOPENJTALK_PATH"
255
+
256
+ # 清理
257
+ rm open_jtalk_dic_utf_8-1.11.tar.gz
258
+ ```
259
+
260
+ **大小**:约 50 MB
261
+ **时间**:< 2 分钟
262
+
263
+
264
+ ### 3.4 前端设置
265
+
266
+ 前端是使用 Vue.js 构建的 Electron 应用程序。
267
+
268
+ ```bash
269
+ # 进入前端目录
270
+ cd tts-voice-app
271
+
272
+ # 安装 Node.js 依赖项
273
+ npm install
274
+ ```
275
+
276
+ **时间**:2-5 分钟
277
+ **说明**:这会安装所有必需的 Node.js 包,包括 Electron、Vue 和 UI 组件。
278
+
279
+ ---
280
+
281
+ ## 配置
282
+
283
+ ### 4.1 后端 API 配置
284
+
285
+ 后端使用环境变量进行配置。在项目根目录创建 `.env` 文件以进行自定义设置。
286
+
287
+ **创建 `.env` 文件**(可选,默认值适用于本地开发):
288
+
289
+ ```bash
290
+ # 部署模式
291
+ # 选项:local, server
292
+ DEPLOYMENT_MODE=local
293
+
294
+ # API 服务器设置
295
+ API_HOST=0.0.0.0
296
+ API_PORT=8000
297
+
298
+ # 数据存储路径
299
+ DATA_DIR=~/.moyoyo-tts/data
300
+ SQLITE_PATH=~/.moyoyo-tts/data/tasks.db
301
+
302
+ # 训练设置
303
+ LOCAL_MAX_WORKERS=1 # 并发训练任务数
304
+ ```
305
+
306
+ **配置选项**:
307
+
308
+ | 变量 | 默认值 | 说明 |
309
+ |----------|---------|-------------|
310
+ | `DEPLOYMENT_MODE` | `local` | 部署环境(local/server) |
311
+ | `API_HOST` | `0.0.0.0` | API 服务器绑定地址 |
312
+ | `API_PORT` | `8000` | API 服务器端口 |
313
+ | `DATA_DIR` | `~/.moyoyo-tts/data` | 数据存储目录 |
314
+ | `SQLITE_PATH` | `~/.moyoyo-tts/data/tasks.db` | SQLite 数据库路径 |
315
+ | `LOCAL_MAX_WORKERS` | `1` | 最大并发训练任务数 |
316
+
317
+ **说明**:
318
+ - `API_HOST=0.0.0.0` 允许来自任何网络接口的连接
319
+ - `LOCAL_MAX_WORKERS=1` 防止内存有限的系统出现内存问题
320
+ - 在高端系统上增加 `LOCAL_MAX_WORKERS` 以同时训练多个声音
321
+
322
+ ### 4.2 前端配置
323
+
324
+ 前端在本地开发时只需要最少的配置。
325
+
326
+ **默认设置**:
327
+ - **API 端点**:`http://localhost:8000`
328
+ - **声音存储**:`~/.moyoyo-tts/voices/`
329
+ - **模型存储**:`GPT_SoVITS/pretrained_models/`
330
+
331
+ **自动配置**:
332
+ Electron 应用将:
333
+ 1. 自动检测并连接到本地 API 服务器
334
+ 2. 首次启动时创建所需的目录
335
+ 3. 通过模型设置页面下载缺失的模型
336
+
337
+ 标准使用无需手动配置。
338
+
339
+ ---
340
+
341
+ ## 运行应用
342
+
343
+ ### 5.1 启动后端 API 服务器
344
+
345
+ **步骤 1:激活 Python 环境**
346
+
347
+ ```bash
348
+ # 进入项目目录
349
+ cd GPT-SoVITS
350
+
351
+ # 激活虚拟环境
352
+ source .venv/bin/activate # macOS/Linux
353
+ .venv\Scripts\activate # Windows
354
+ ```
355
+
356
+ **步骤 2:启动 API 服务器**
357
+
358
+ 方法 1 - 使用主脚本:
359
+ ```bash
360
+ cd api_server
361
+ python app/main.py
362
+ ```
363
+
364
+ 方法 2 - 直接使用 uvicorn:
365
+ ```bash
366
+ uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
367
+ ```
368
+
369
+ **预期输出**:
370
+ ```
371
+ INFO: Started server process [12345]
372
+ INFO: Waiting for application startup.
373
+ INFO: Application startup complete.
374
+ INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
375
+ ```
376
+
377
+ **API 文档**:
378
+ 服务器运行后,访问交互式 API 文档:
379
+
380
+ - **Swagger UI**:http://localhost:8000/docs
381
+ - **ReDoc**:http://localhost:8000/redoc
382
+ - **OpenAPI JSON**:http://localhost:8000/openapi.json
383
+
384
+ **健康检查**:
385
+ ```bash
386
+ curl http://localhost:8000/health
387
+ # 预期输出:{"status": "healthy"}
388
+ ```
389
+
390
+ ### 5.2 启动前端 Electron 应用
391
+
392
+ **步骤 1:打开新终端**
393
+
394
+ 保持后端服务器运行,打开一个新的终端窗口。
395
+
396
+ **步骤 2:进入前端目录**
397
+
398
+ ```bash
399
+ cd tts-voice-app
400
+ ```
401
+
402
+ **步骤 3:启动开发模式**
403
+
404
+ ```bash
405
+ npm run dev
406
+ ```
407
+
408
+ **预期输出**:
409
+ ```
410
+ > tts-voice-app@1.0.0 dev
411
+ > electron-vite dev
412
+
413
+ VITE v4.x.x ready in xxx ms
414
+ ➜ Local: http://localhost:5173/
415
+ ➜ Network: use --host to expose
416
+
417
+ Electron app starting...
418
+ ```
419
+
420
+ Electron 应用将自动启动,开发模式下启用热重载。
421
+
422
+ **开发模式功能**:
423
+ - 热模块替换(HMR)实现即时 UI 更新
424
+ - Vue DevTools 集成
425
+ - 用于调试的控制台日志记录
426
+ - 主进程更改时自动重启
427
+
428
+ ---
429
+
430
+ ## 使用指南
431
+
432
+ ### 6.1 首次设置
433
+
434
+ 首次启动 Electron 应用时,您需要下载必需的模型。
435
+
436
+ **设置流程**:
437
+
438
+ 1. **启动 Electron 应用**
439
+ ```bash
440
+ cd tts-voice-app
441
+ npm run dev
442
+ ```
443
+
444
+ 2. **模型设置页面**
445
+ - 应用自动检测缺失的模型
446
+ - 您将被重定向到模型设置页面
447
+
448
+ 3. **下载模型**
449
+ - 点击"下载所有模型"按钮
450
+ - 要下载的模型:
451
+ - **预训练模型**:4.56 GB
452
+ - **G2PW 模型**:588.86 MB
453
+ - **FunASR**:1.09 GB
454
+ - **Faster Whisper**:2.85 GB
455
+ - 总下载大小:约 9 GB
456
+
457
+ 4. **监控进度**
458
+ - 实时进度条显示下载状态
459
+ - 预计时间:10-30 分钟(取决于连接速度)
460
+ - 下载可以暂停和恢复
461
+
462
+ 5. **设置完成**
463
+ - 所有模型下载完成后,点击"继续"
464
+ - 您将被重定向到主 TTS 页面
465
+ - 应用现在可以使用了
466
+
467
+ **故障排除**:
468
+ - 如果下载失败,请检查您的互联网连接
469
+ - 确认您有约 10 GB 的可用磁盘空间
470
+ - 如需手动安装,请参见 3.3 节
471
+
472
+ ### 6.2 快速模式 - 初学者声音克隆
473
+
474
+ 快速模式为想要快速创建声音克隆的用户提供了简化的工作流程,无需技术知识。
475
+
476
+ #### 使用 API
477
+
478
+ **步骤 1:上传音频文件**
479
+
480
+ ```bash
481
+ curl -X POST http://localhost:8000/api/v1/files \
482
+ -F "file=@path/to/voice_sample.wav" \
483
+ -F "purpose=training"
484
+ ```
485
+
486
+ **响应**:
487
+ ```json
488
+ {
489
+ "file_id": "550e8400-e29b-41d4-a716-446655440000",
490
+ "filename": "voice_sample.wav",
491
+ "size": 1234567,
492
+ "purpose": "training"
493
+ }
494
+ ```
495
+
496
+ **步骤 2:创建训练任务**
497
+
498
+ ```bash
499
+ curl -X POST http://localhost:8000/api/v1/tasks \
500
+ -H "Content-Type: application/json" \
501
+ -d '{
502
+ "exp_name": "my_voice",
503
+ "audio_file_id": "550e8400-e29b-41d4-a716-446655440000",
504
+ "options": {
505
+ "version": "v2",
506
+ "language": "zh",
507
+ "quality": "standard"
508
+ }
509
+ }'
510
+ ```
511
+
512
+ **响应**:
513
+ ```json
514
+ {
515
+ "id": "task-uuid-here",
516
+ "status": "queued",
517
+ "exp_name": "my_voice",
518
+ "created_at": "2026-01-23T10:30:00Z"
519
+ }
520
+ ```
521
+
522
+ **步骤 3:监控进度**
523
+
524
+ 使用服务器发送事件(SSE):
525
+ ```bash
526
+ curl -N http://localhost:8000/api/v1/tasks/task-uuid-here/progress
527
+ ```
528
+
529
+ **进度事件**:
530
+ ```
531
+ event: progress
532
+ data: {"stage": "audio_slice", "progress": 25, "message": "切片音频中..."}
533
+
534
+ event: progress
535
+ data: {"stage": "sovits_train", "progress": 50, "message": "训练 SoVITS 模型中..."}
536
+
537
+ event: complete
538
+ data: {"status": "completed", "voice_id": "voice-uuid-here"}
539
+ ```
540
+
541
+ #### 质量预设
542
+
543
+ | 预设 | SoVITS 轮数 | GPT 轮数 | 预计时间 | 质量 |
544
+ |--------|---------------|------------|-----------|---------|
545
+ | **fast** | 4 | 8 | 约 10 分钟 | 适合测试 |
546
+ | **standard** | 8 | 15 | 约 20 分钟 | 平衡质量/速度 |
547
+ | **high** | 16 | 30 | 约 40 分钟 | 最佳质量 |
548
+
549
+ **建议**:
550
+ - 使用 `fast` 进行快速测试和预览
551
+ - 使用 `standard` 用于大多数生产用例
552
+ - 使用 `high` 用于需要最高质量的专业应用
553
+
554
+ #### 使用 UI
555
+
556
+ **步骤 1:进入声音克隆页面**
557
+ - 点击侧边栏中的"声音克隆"
558
+ - 或使用键盘快捷键:`Ctrl/Cmd + N`
559
+
560
+ **步骤 2:上传音频样本**
561
+ - 点击"上传音频"按钮
562
+ - 选择 WAV 或 MP3 文件
563
+ - **要求**:
564
+ - 时长:推荐 5-30 秒
565
+ - 质量:清晰的声音,最少的背景噪音
566
+ - 内容:自然的讲话,不是唱歌或喊叫
567
+
568
+ **步骤 3:配置训练**
569
+ - **声音名称**:输入唯一名称(例如,"张三的声音")
570
+ - **语言**:选择主要语言(中文、英文、日文)
571
+ - **质量预设**:从 fast/standard/high 中选择
572
+
573
+ **步骤 4:开始训练**
574
+ - 点击"开始训练"按钮
575
+ - 任务将被排队,处理将开始
576
+
577
+ **步骤 5:监控进度**
578
+ - 进度条显示整体完成情况
579
+ - 显示当前阶段(例如,"训练 SoVITS 模型中...")
580
+ - 显示预计剩余时间
581
+ - 您可以导航离开并稍后查看
582
+
583
+ **步骤 6:训练完成**
584
+ - 完成后您将收到通知
585
+ - 声音自动出现在声音库中
586
+ - 您可以立即使用它进行 TTS 生成
587
+
588
+ **获得最佳效果的提示**:
589
+ - 使用高质量音频(最好是 48kHz WAV)
590
+ - 确保音调和说话风格一致
591
+ - 避免带有音乐或声音效果的音频
592
+ - 10-15 秒是样本长度的最佳选择
593
+ - 可以组合多个短样本
594
+
595
+ ### 6.3 高级模式 - 专家声音克隆
596
+
597
+ 高级模式提供对声音训练管道每个阶段的精细控制。建议想要微调训练参数的用户使用。
598
+
599
+ #### 训练管道阶���
600
+
601
+ 完整的训练管道包含 7 个阶段:
602
+
603
+ 1. **Audio Slice**(音频切片):将音频分割成片段
604
+ 2. **ASR**(自动语音识别):将音频转录为文本
605
+ 3. **Text Feature**(文本特征):提取文本嵌入
606
+ 4. **Hubert Feature**(Hubert 特征):提取音频特征
607
+ 5. **Semantic Token**(语义标记):生成语义标记
608
+ 6. **SoVITS Train**(SoVITS 训练):训练声音合成模型
609
+ 7. **GPT Train**(GPT 训练):训练文本到语义模型
610
+
611
+ #### 阶段依赖关系
612
+
613
+ ```
614
+ audio_slice → asr → text_feature → sovits_train
615
+ ↘ ↗
616
+ hubert_feature → semantic_token → gpt_train
617
+ ```
618
+
619
+ **重要**:每个阶段必须等待其依赖项完成。
620
+
621
+ #### 使用 API
622
+
623
+ **步骤 1:创建实验**
624
+
625
+ ```bash
626
+ curl -X POST http://localhost:8000/api/v1/experiments \
627
+ -H "Content-Type: application/json" \
628
+ -d '{
629
+ "exp_name": "my_custom_voice",
630
+ "version": "v2",
631
+ "audio_file_id": "file-uuid-here"
632
+ }'
633
+ ```
634
+
635
+ **响应**:
636
+ ```json
637
+ {
638
+ "id": "exp-uuid-here",
639
+ "exp_name": "my_custom_voice",
640
+ "version": "v2",
641
+ "stages": {
642
+ "audio_slice": {"status": "pending"},
643
+ "asr": {"status": "pending"},
644
+ "text_feature": {"status": "pending"},
645
+ "hubert_feature": {"status": "pending"},
646
+ "semantic_token": {"status": "pending"},
647
+ "sovits_train": {"status": "pending"},
648
+ "gpt_train": {"status": "pending"}
649
+ }
650
+ }
651
+ ```
652
+
653
+ **步骤 2:单独执行阶段**
654
+
655
+ **阶段 1 - 音频切片**:
656
+ ```bash
657
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/audio_slice \
658
+ -H "Content-Type: application/json" \
659
+ -d '{
660
+ "threshold": -34,
661
+ "min_length": 4000,
662
+ "min_interval": 300,
663
+ "hop_size": 10,
664
+ "max_silence_kept": 500
665
+ }'
666
+ ```
667
+
668
+ **参数**:
669
+ - `threshold`:静音检测的 dB 阈值(-60 到 0,默认:-34)
670
+ - `min_length`:最小片段长度(毫秒)(1000-10000,默认:4000)
671
+ - `min_interval`:最小静音间隔(毫秒)(0-3000,默认:300)
672
+ - `hop_size`:分析窗口跳跃大小(毫秒)(默认:10)
673
+ - `max_silence_kept`:要保留的最大静音(毫秒)(默认:500)
674
+
675
+ **阶段 2 - ASR**:
676
+ ```bash
677
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/asr \
678
+ -H "Content-Type: application/json" \
679
+ -d '{
680
+ "model": "达摩 ASR (中文)",
681
+ "language": "zh"
682
+ }'
683
+ ```
684
+
685
+ **ASR 模型**:
686
+ - `达摩 ASR (中文)`:用于中文的 DamoASR(最适合中文)
687
+ - `Faster Whisper (多语言)`:用于多语言的 Faster Whisper
688
+
689
+ **阶段 3 - 文本特征**:
690
+ ```bash
691
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/text_feature \
692
+ -H "Content-Type: application/json" \
693
+ -d '{
694
+ "language": "zh"
695
+ }'
696
+ ```
697
+
698
+ **阶段 4 - Hubert 特征**:
699
+ ```bash
700
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/hubert_feature \
701
+ -H "Content-Type: application/json" \
702
+ -d '{}'
703
+ ```
704
+
705
+ **阶段 5 - 语义标记**:
706
+ ```bash
707
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/semantic_token \
708
+ -H "Content-Type: application/json" \
709
+ -d '{}'
710
+ ```
711
+
712
+ **阶段 6 - SoVITS 训练**:
713
+ ```bash
714
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/sovits_train \
715
+ -H "Content-Type: application/json" \
716
+ -d '{
717
+ "total_epoch": 8,
718
+ "batch_size": 4,
719
+ "save_every_epoch": 4,
720
+ "text_low_lr_rate": 0.4,
721
+ "if_save_latest": true,
722
+ "if_save_every_weights": true,
723
+ "version": "v2"
724
+ }'
725
+ ```
726
+
727
+ **参数**:
728
+ - `total_epoch`:总训练轮数(4-32,默认:8)
729
+ - `batch_size`:批次大小(1-40,默认:4)
730
+ - `save_every_epoch`:每 N 轮保存检查点(1-50,默认:4)
731
+ - `text_low_lr_rate`:文本编码器学习率乘数(0.2-1.0,默认:0.4)
732
+
733
+ **阶段 7 - GPT 训练**:
734
+ ```bash
735
+ curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/gpt_train \
736
+ -H "Content-Type: application/json" \
737
+ -d '{
738
+ "total_epoch": 15,
739
+ "batch_size": 4,
740
+ "save_every_epoch": 5,
741
+ "if_save_latest": true,
742
+ "if_save_every_weights": true,
743
+ "version": "v2"
744
+ }'
745
+ ```
746
+
747
+ **步骤 3:监控阶段进度**
748
+
749
+ 每个阶段通过 SSE 提供实时进度:
750
+
751
+ ```bash
752
+ curl -N http://localhost:8000/api/v1/experiments/exp-uuid/stages/sovits_train/progress
753
+ ```
754
+
755
+ **进度事件**:
756
+ ```
757
+ event: progress
758
+ data: {"epoch": 2, "total_epochs": 8, "progress": 25, "loss": 0.234}
759
+
760
+ event: progress
761
+ data: {"epoch": 4, "total_epochs": 8, "progress": 50, "loss": 0.189}
762
+
763
+ event: complete
764
+ data: {"status": "completed", "final_loss": 0.142}
765
+ ```
766
+
767
+ #### 使用 UI
768
+
769
+ **步骤 1:创建新实验**
770
+ - 进入"高级模式"页面
771
+ - 点击"新建实验"
772
+ - 输入实验名称并上传音频
773
+
774
+ **步骤 2:配置每个阶段**
775
+ - 点击阶段卡以展开设置
776
+ - 调整参数(或使用预设默认值)
777
+ - 点击"运行阶段"执行
778
+
779
+ **步骤 3:监控管道**
780
+ - 可视化管道图显示阶段状态
781
+ - 绿色:已完成,蓝色:运行中,灰色:待处理
782
+ - 点击任何阶段��看详细日志
783
+
784
+ **步骤 4:迭代和优化**
785
+ - 每个阶段后检查结果
786
+ - 如需要可调整参数并重新运行
787
+ - 满意时导出最终模型
788
+
789
+ **高级提示**:
790
+ - 在内存有限的 GPU 上使用较低的 `batch_size`(2-4)
791
+ - 对于有足够数据的更好质量,增加 `total_epoch`
792
+ - 频繁保存检查点(`save_every_epoch`)以从中断中恢复
793
+ - 监控损失值 - 应该随着轮数递减
794
+
795
+ ### 6.4 文本转语音生成
796
+
797
+ 训练好声音后,您可以使用它从文本生成语音。
798
+
799
+ #### 使用 API
800
+
801
+ **基本 TTS 请求**:
802
+ ```bash
803
+ curl -X POST http://localhost:8000/api/v1/inference/tts \
804
+ -H "Content-Type: application/json" \
805
+ -d '{
806
+ "text": "你好,这是文本转语音合成的测试。",
807
+ "voice_id": "voice-uuid-here",
808
+ "speed": 1.0,
809
+ "emotion": "auto"
810
+ }'
811
+ ```
812
+
813
+ **响应**:
814
+ ```json
815
+ {
816
+ "audio_url": "http://localhost:8000/api/v1/files/audio-uuid-here",
817
+ "duration": 3.2,
818
+ "format": "wav"
819
+ }
820
+ ```
821
+
822
+ **参数**:
823
+ - `text`(必需):要合成的文本(最多 5000 个字符)
824
+ - `voice_id`(必需):训练好的声音的 UUID
825
+ - `speed`(可选):说话速度乘数(0.5 - 2.0,默认:1.0)
826
+ - `emotion`(可选):情感风格(auto、neutral、happy、sad)
827
+ - `seed`(可选):用于可重复性的随机种子
828
+
829
+ **下载生成的音频**:
830
+ ```bash
831
+ curl -o output.wav http://localhost:8000/api/v1/files/audio-uuid-here
832
+ ```
833
+
834
+ #### 使用 UI
835
+
836
+ **步骤 1:进入 TTS 页面**
837
+ - 点击侧边栏中的"文本转语音"
838
+ - 或使用键盘快捷键:`Ctrl/Cmd + T`
839
+
840
+ **步骤 2:选择声音**
841
+ - 打开声音下拉菜单
842
+ - 从列表中选择训练好的声音
843
+ - 预览按钮可让您听到样本
844
+
845
+ **步骤 3:输入文本**
846
+ - 在文本区域中输入或粘贴文本
847
+ - 显示字符计数(最多 5000)
848
+ - 支持多行文本
849
+
850
+ **步骤 4:调整设置**
851
+ - **速度**:拖动滑块或输入值(0.5x - 2.0x)
852
+ - 0.5x:非常慢,清晰的发音
853
+ - 1.0x:自然的说话节奏
854
+ - 1.5x:快速,仍然清晰
855
+ - 2.0x:非常快
856
+ - **情感**:从下拉菜单中选择(如果模型支持)
857
+ - Auto:从文本推断
858
+ - Neutral:平坦、事实性的表达
859
+ - Happy:积极向上的语气
860
+ - Sad:忧郁、哀伤的语气
861
+
862
+ **步骤 5:生成**
863
+ - 点击"生成"按钮
864
+ - 处理需要 2-5 秒
865
+ - 显示进度指示器
866
+
867
+ **步骤 6:收听和下载**
868
+ - 音频播放器自动出现
869
+ - 点击播放按钮收听
870
+ - 点击下载按钮保存 WAV 文件
871
+ - 分享按钮复制可分享链接
872
+
873
+ **文本指南**:
874
+ - 使用适当的标点符号进行自然停顿
875
+ - 将长文本分成句子
876
+ - 对话使用引号
877
+ - 全大写用于强调(谨慎使用)
878
+
879
+ **自然语音提示**:
880
+ - 添加逗号进行呼吸停顿
881
+ - 使用省略号(...)进行尾音
882
+ - 问号影响语调
883
+ - 感叹号增加强调
884
+
885
+ ### 6.5 声音库管理
886
+
887
+ 声音库是存储和管理所有训练声音的地方。
888
+
889
+ #### 使用 API
890
+
891
+ **列出所有声音**:
892
+ ```bash
893
+ curl http://localhost:8000/api/v1/files?purpose=training
894
+ ```
895
+
896
+ **响应**:
897
+ ```json
898
+ {
899
+ "files": [
900
+ {
901
+ "id": "voice-uuid-1",
902
+ "filename": "john_voice",
903
+ "created_at": "2026-01-20T10:30:00Z",
904
+ "size": 1234567,
905
+ "metadata": {
906
+ "language": "zh",
907
+ "quality": "standard",
908
+ "duration": 12.5
909
+ }
910
+ },
911
+ {
912
+ "id": "voice-uuid-2",
913
+ "filename": "mary_voice",
914
+ "created_at": "2026-01-21T14:20:00Z",
915
+ "size": 2345678,
916
+ "metadata": {
917
+ "language": "en",
918
+ "quality": "high",
919
+ "duration": 18.3
920
+ }
921
+ }
922
+ ]
923
+ }
924
+ ```
925
+
926
+ **获取声音详情**:
927
+ ```bash
928
+ curl http://localhost:8000/api/v1/files/voice-uuid-1
929
+ ```
930
+
931
+ **删除声音**:
932
+ ```bash
933
+ curl -X DELETE http://localhost:8000/api/v1/files/voice-uuid-1
934
+ ```
935
+
936
+ **导出声音模型**:
937
+ ```bash
938
+ curl -o voice_model.zip http://localhost:8000/api/v1/voices/voice-uuid-1/export
939
+ ```
940
+
941
+ #### 使用 UI
942
+
943
+ **浏览声音库**:
944
+ - 进入"声音库"页面
945
+ - 声音显示为带有以下内容的卡片:
946
+ - 声音名称
947
+ - 语言和质量徽章
948
+ - 创建日期
949
+ - 样本持续时间
950
+ - 预览波形
951
+
952
+ **声音卡操作**:
953
+ - **播放**:收听声音样本
954
+ - **编辑**:重命名或更新元数据
955
+ - **导出**:下载声音模型文件
956
+ - **删除**:删除声音(带确认)
957
+
958
+ **搜索和筛选**:
959
+ - 搜索栏:按声音名称筛选
960
+ - 语言筛选:仅显示特定语言
961
+ - 质量筛选:仅显示特定质量预设
962
+ - 排序选项:
963
+ - 名称(A-Z)
964
+ - 创建日期(最新在前)
965
+ - 创建日期(最旧在前)
966
+ - 文件大小
967
+
968
+ **批量操作**:
969
+ - 选择多个声音(Shift+点击)
970
+ - 将选定的声音导出为 ZIP
971
+ - 删除选定的声音
972
+ - 标记选定的声音
973
+
974
+ **声音详情面板**:
975
+ 点击任何声音卡查看:
976
+ - 完整的训练参数
977
+ - 训练历史和日志
978
+ - 模型文件大小
979
+ - 样本音频片段
980
+ - 导出和分享选项
981
+
982
+ **组织提示**:
983
+ - 使用描述性名称(例如,"张三_专业"、"李四_休闲")
984
+ - 按项目或用例标记声音
985
+ - 导出重要的声音作为备份
986
+ - 删除测试声音以节省空间
987
+
988
+ ---
989
+
990
+ ## API 参考
991
+
992
+ ### 快速模式端点
993
+
994
+ #### 任务
995
+
996
+ **创建任务** - 启动一键式声音训练任务
997
+ ```http
998
+ POST /api/v1/tasks
999
+ Content-Type: application/json
1000
+
1001
+ {
1002
+ "exp_name": "string",
1003
+ "audio_file_id": "uuid",
1004
+ "options": {
1005
+ "version": "v2",
1006
+ "language": "zh|en|ja",
1007
+ "quality": "fast|standard|high"
1008
+ }
1009
+ }
1010
+ ```
1011
+
1012
+ **列出任务** - 获取所有任务
1013
+ ```http
1014
+ GET /api/v1/tasks?status=queued|running|completed|failed
1015
+ ```
1016
+
1017
+ **获取任务** - 获取特定任务详情
1018
+ ```http
1019
+ GET /api/v1/tasks/{task_id}
1020
+ ```
1021
+
1022
+ **取消任务** - 取消正在运行的任务
1023
+ ```http
1024
+ DELETE /api/v1/tasks/{task_id}
1025
+ ```
1026
+
1027
+ **任务进度** - 通过 SSE 实时进度
1028
+ ```http
1029
+ GET /api/v1/tasks/{task_id}/progress
1030
+ Accept: text/event-stream
1031
+ ```
1032
+
1033
+ ### 高级模式端点
1034
+
1035
+ #### 实验
1036
+
1037
+ **创建实验** - 初始化新的训练实验
1038
+ ```http
1039
+ POST /api/v1/experiments
1040
+ Content-Type: application/json
1041
+
1042
+ {
1043
+ "exp_name": "string",
1044
+ "version": "v2",
1045
+ "audio_file_id": "uuid"
1046
+ }
1047
+ ```
1048
+
1049
+ **获取实验** - 获取实验详情
1050
+ ```http
1051
+ GET /api/v1/experiments/{exp_id}
1052
+ ```
1053
+
1054
+ **列出实验** - 获取所有实验
1055
+ ```http
1056
+ GET /api/v1/experiments?status=pending|running|completed
1057
+ ```
1058
+
1059
+ **删除实验** - 删除实验和所有数据
1060
+ ```http
1061
+ DELETE /api/v1/experiments/{exp_id}
1062
+ ```
1063
+
1064
+ #### 阶段
1065
+
1066
+ **执行阶段** - 运行特定的管道阶段
1067
+ ```http
1068
+ POST /api/v1/experiments/{exp_id}/stages/{stage_type}
1069
+ Content-Type: application/json
1070
+
1071
+ {
1072
+ // 阶段特定参数
1073
+ }
1074
+ ```
1075
+
1076
+ **阶段类型**:
1077
+ - `audio_slice`
1078
+ - `asr`
1079
+ - `text_feature`
1080
+ - `hubert_feature`
1081
+ - `semantic_token`
1082
+ - `sovits_train`
1083
+ - `gpt_train`
1084
+
1085
+ **获取阶段状态** - 获取特定阶段的状态
1086
+ ```http
1087
+ GET /api/v1/experiments/{exp_id}/stages/{stage_type}
1088
+ ```
1089
+
1090
+ **获取所有阶段状态** - 获取所有阶段的状态
1091
+ ```http
1092
+ GET /api/v1/experiments/{exp_id}/stages
1093
+ ```
1094
+
1095
+ **阶段进度** - 通过 SSE 实时阶段进度
1096
+ ```http
1097
+ GET /api/v1/experiments/{exp_id}/stages/{stage_type}/progress
1098
+ Accept: text/event-stream
1099
+ ```
1100
+
1101
+ **获取阶段架构** - 获取阶段的参数架构
1102
+ ```http
1103
+ GET /api/v1/stages/{stage_type}/schema
1104
+ ```
1105
+
1106
+ ### 通用端点
1107
+
1108
+ #### 文件
1109
+
1110
+ **上传文件** - 上传音频或数据文件
1111
+ ```http
1112
+ POST /api/v1/files
1113
+ Content-Type: multipart/form-data
1114
+
1115
+ file: binary
1116
+ purpose: training|inference
1117
+ ```
1118
+
1119
+ **列出文件** - 获取所有上传的文件
1120
+ ```http
1121
+ GET /api/v1/files?purpose=training|inference
1122
+ ```
1123
+
1124
+ **获取文件** - 下载特定文件
1125
+ ```http
1126
+ GET /api/v1/files/{file_id}
1127
+ ```
1128
+
1129
+ **删除文件** - 删除文件
1130
+ ```http
1131
+ DELETE /api/v1/files/{file_id}
1132
+ ```
1133
+
1134
+ #### 推理
1135
+
1136
+ **文本转语音** - 从文本生成语音
1137
+ ```http
1138
+ POST /api/v1/inference/tts
1139
+ Content-Type: application/json
1140
+
1141
+ {
1142
+ "text": "string",
1143
+ "voice_id": "uuid",
1144
+ "speed": 1.0,
1145
+ "emotion": "auto|neutral|happy|sad",
1146
+ "seed": 42
1147
+ }
1148
+ ```
1149
+
1150
+ **获取声音信息** - 获取声音模型信息
1151
+ ```http
1152
+ GET /api/v1/voices/{voice_id}
1153
+ ```
1154
+
1155
+ #### 配置
1156
+
1157
+ **获取阶段预设** - 获取阶段的预设配置
1158
+ ```http
1159
+ GET /api/v1/stages/presets
1160
+ ```
1161
+
1162
+ **健康检查** - 检查 API 服务器健康状况
1163
+ ```http
1164
+ GET /health
1165
+ ```
1166
+
1167
+ **完整的 OpenAPI 规范可在以下位置获得**:http://localhost:8000/openapi.json
1168
+
1169
+ ---
1170
+
1171
+ ## 故障排除
1172
+
1173
+ ### 后端问题
1174
+
1175
+ #### 端口已被占用
1176
+
1177
+ **症状**:启动服务器时出现 `Address already in use` 错误消息。
1178
+
1179
+ **解决方案 1** - 在 `.env` 中更改端口:
1180
+ ```bash
1181
+ echo "API_PORT=8001" >> .env
1182
+ python app/main.py
1183
+ ```
1184
+
1185
+ **解决方案 2** - 查找并终止使用端口的进程:
1186
+ ```bash
1187
+ # macOS/Linux
1188
+ lsof -ti:8000 | xargs kill -9
1189
+
1190
+ # Windows
1191
+ netstat -ano | findstr :8000
1192
+ taskkill /PID <pid> /F
1193
+ ```
1194
+
1195
+ #### 数据库错误
1196
+
1197
+ **症状**:`sqlite3.OperationalError` 或数据库损坏消息。
1198
+
1199
+ **解决方案** - 重置数据库:
1200
+ ```bash
1201
+ # 备份现有数据库(可选)
1202
+ cp ~/.moyoyo-tts/data/tasks.db ~/.moyoyo-tts/data/tasks.db.backup
1203
+
1204
+ # 删除损坏的数据库
1205
+ rm ~/.moyoyo-tts/data/tasks.db
1206
+
1207
+ # 重启 API 服务器(数据库将被重新创建)
1208
+ python app/main.py
1209
+ ```
1210
+
1211
+ #### 训练立即失败
1212
+
1213
+ **症状**:训练开始但在几秒钟内失败。
1214
+
1215
+ **诊断**:
1216
+ ```bash
1217
+ # 检查 GPU 可用性
1218
+ python -c "import torch; print(torch.cuda.is_available())"
1219
+
1220
+ # 检查 CUDA 版本
1221
+ python -c "import torch; print(torch.version.cuda)"
1222
+
1223
+ # 检查磁盘空间
1224
+ df -h
1225
+ ```
1226
+
1227
+ **解决方案**:
1228
+ 1. **无 GPU**:系统将使用 CPU(较慢但有效)
1229
+ 2. **CUDA 不匹配**:使用正确的 CUDA 版本重新安装 PyTorch:
1230
+ ```bash
1231
+ # 对于 CUDA 12.6
1232
+ uv sync --reinstall-package torch --reinstall-package torchaudio
1233
+
1234
+ # 对于 CUDA 12.8(Windows)
1235
+ uv sync --reinstall-package torch --reinstall-package torchaudio --index pytorch-cu128
1236
+
1237
+ # 仅 CPU
1238
+ uv sync --reinstall-package torch --reinstall-package torchaudio --index pytorch-cpu
1239
+ ```
1240
+ 3. **磁盘空间不足**:至少释放 10GB
1241
+ 4. **内存不足**:在��练参数中减少 `batch_size`
1242
+
1243
+ #### Python 环境问题
1244
+
1245
+ **症状**:`ModuleNotFoundError` 或导入错误。
1246
+
1247
+ **解决方案**:
1248
+ ```bash
1249
+ # 验证环境已激活
1250
+ which python # 应显示 .venv 中的路径
1251
+
1252
+ # 重新安装所有依赖项
1253
+ uv sync --reinstall
1254
+
1255
+ # 或从头强制重新安装
1256
+ rm -rf .venv
1257
+ uv sync
1258
+
1259
+ # 检查缺失的包
1260
+ uv pip list
1261
+ ```
1262
+
1263
+ ### 前端问题
1264
+
1265
+ #### 无法连接到 API
1266
+
1267
+ **症状**:前端显示"无法连接到服务器"错误。
1268
+
1269
+ **诊断**:
1270
+ ```bash
1271
+ # 检查后端是否正在运行
1272
+ curl http://localhost:8000/health
1273
+
1274
+ # 检查网络连接
1275
+ ping localhost
1276
+ ```
1277
+
1278
+ **解决方案**:
1279
+ 1. **后端未运行**:启动后端服务器(参见 5.1 节)
1280
+ 2. **错误的端口**:检查后端是否在端口 8000 上
1281
+ 3. **防火墙**:允许连接到 localhost:8000
1282
+ 4. **CORS 错误**:检查后端 `.env` 中的 CORS 设置
1283
+
1284
+ #### 模型未下载
1285
+
1286
+ **症状**:模型下载失败或无限期挂起。
1287
+
1288
+ **解决方案**:
1289
+ 1. **检查互联网连接**:
1290
+ ```bash
1291
+ curl -I https://www.modelscope.cn
1292
+ ```
1293
+
1294
+ 2. **检查磁盘空间**:
1295
+ ```bash
1296
+ df -h # 需要约 10GB 可用空间
1297
+ ```
1298
+
1299
+ 3. **手动下载**:参见 3.3 节进行手动安装
1300
+
1301
+ 4. **代理问题**:配置代理设置:
1302
+ ```bash
1303
+ export http_proxy=http://proxy.example.com:8080
1304
+ export https_proxy=http://proxy.example.com:8080
1305
+ ```
1306
+
1307
+ #### Electron 应用无法启动
1308
+
1309
+ **症状**:应用启动时崩溃或显示空白屏幕。
1310
+
1311
+ **解决方案 1** - 清除缓存并重建:
1312
+ ```bash
1313
+ # 进入前端目录
1314
+ cd tts-voice-app
1315
+
1316
+ # 清除缓存
1317
+ rm -rf node_modules package-lock.json dist .vite
1318
+
1319
+ # 重新安装依赖项
1320
+ npm install
1321
+
1322
+ # 重建
1323
+ npm run dev
1324
+ ```
1325
+
1326
+ **解决方案 2** - 检查 Node.js 版本:
1327
+ ```bash
1328
+ node --version # 应该是 >= 18.x
1329
+
1330
+ # 如需更新 Node.js
1331
+ nvm install 18
1332
+ nvm use 18
1333
+ ```
1334
+
1335
+ **解决方案 3** - 检查 Electron 日志:
1336
+ ```bash
1337
+ # macOS
1338
+ ~/Library/Logs/tts-voice-app/
1339
+
1340
+ # Linux
1341
+ ~/.config/tts-voice-app/logs/
1342
+
1343
+ # Windows
1344
+ %APPDATA%\tts-voice-app\logs\
1345
+ ```
1346
+
1347
+ ### 常见错误
1348
+
1349
+ #### "PYTHONPATH not set" 错误
1350
+
1351
+ **症状**:与 `GPT_SoVITS` 模块相关的导入错误。
1352
+
1353
+ **原因**:API 服务器需要找到主项目目录。
1354
+
1355
+ **解决方案**:API 自动设置 `PYTHONPATH`,但请验证:
1356
+ ```bash
1357
+ # 检查项目结构
1358
+ ls GPT-SoVITS/ # 应包含 *.py 文件
1359
+
1360
+ # 如需手动设置
1361
+ export PYTHONPATH=/Users/coldish/workspace/GPT-SoVITS:$PYTHONPATH
1362
+ ```
1363
+
1364
+ #### "Model not found" 错误
1365
+
1366
+ **症状**:训练失败并显示"找不到预训练模型"消息。
1367
+
1368
+ **诊断**:
1369
+ ```bash
1370
+ # 检查模型是否存在
1371
+ ls GPT_SoVITS/pretrained_models/
1372
+ # 应显示:s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt, s2G488k.pth, s2D488k.pth
1373
+ ```
1374
+
1375
+ **解决方案**:下载预训练模型(参见 3.3 节):
1376
+ ```bash
1377
+ wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/pretrained_models.zip
1378
+ unzip -q -o pretrained_models.zip -d GPT_SoVITS
1379
+ ```
1380
+
1381
+ #### "Out of memory" 错误
1382
+
1383
+ **症状**:训练崩溃并显示 `CUDA out of memory` 或 `MemoryError`。
1384
+
1385
+ **解决方案**:
1386
+ 1. **减小批次大小**:
1387
+ ```json
1388
+ {
1389
+ "batch_size": 2 // 从 4 减少到 2
1390
+ }
1391
+ ```
1392
+
1393
+ 2. **关闭其他应用程序**:释放 GPU/RAM
1394
+
1395
+ 3. **使用 CPU 模式**:较慢但使用系统 RAM 而不是 GPU:
1396
+ ```bash
1397
+ # 设置环境变量
1398
+ export CUDA_VISIBLE_DEVICES=""
1399
+ python app/main.py
1400
+ ```
1401
+
1402
+ 4. **增加系统交换空间**(Linux):
1403
+ ```bash
1404
+ sudo dd if=/dev/zero of=/swapfile bs=1G count=8
1405
+ sudo mkswap /swapfile
1406
+ sudo swapon /swapfile
1407
+ ```
1408
+
1409
+ #### "NLTK Data Not Found" 错误
1410
+
1411
+ **症状**:文本处理失败并显示 NLTK 数据错误。
1412
+
1413
+ **解决方案**:下载 NLTK 数据(参见 3.3 节):
1414
+ ```bash
1415
+ wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/nltk_data.zip
1416
+ unzip -q -o nltk_data.zip -d .venv/
1417
+ ```
1418
+
1419
+ #### 音频质量问题
1420
+
1421
+ **症状**:生成的音频听起来像机器人、失真或不清楚。
1422
+
1423
+ **解决方案**:
1424
+ 1. **使用更好的训练数据**:
1425
+ - 高质量音频(首选 48kHz WAV)
1426
+ - 清晰的声音,最少的背景噪音
1427
+ - 10-15 秒的音频
1428
+ - 自然、对话式的讲话
1429
+
1430
+ 2. **提高训练质量**:
1431
+ ```json
1432
+ {
1433
+ "quality": "high" // 使用 high 而不是 standard
1434
+ }
1435
+ ```
1436
+
1437
+ 3. **训练更长时间**:
1438
+ ```json
1439
+ {
1440
+ "total_epoch": 16 // 从 8 增加到 16
1441
+ }
1442
+ ```
1443
+
1444
+ 4. **检查参考音频**:确保上传的音频未损坏
1445
+
1446
+ ---
1447
+
1448
+ ## 开发
1449
+
1450
+ ### 后端开发
1451
+
1452
+ #### 使用热重载运行
1453
+
1454
+ 热重载在检测到代码更改时自动重启服务器:
1455
+
1456
+ ```bash
1457
+ # 使用 uvicorn
1458
+ uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
1459
+
1460
+ # 使用自定义重载目录
1461
+ uvicorn app.main:app --reload --reload-dir api_server/app
1462
+ ```
1463
+
1464
+ #### 运行测试
1465
+
1466
+ ```bash
1467
+ # 进入项目根目录
1468
+ cd GPT-SoVITS
1469
+
1470
+ # 运行所有测试
1471
+ pytest api_server/tests/
1472
+
1473
+ # 运行特定测试文件
1474
+ pytest api_server/tests/test_tasks.py
1475
+
1476
+ # 使用覆盖���报告运行
1477
+ pytest --cov=api_server/app --cov-report=html
1478
+
1479
+ # 查看覆盖率报告
1480
+ open htmlcov/index.html
1481
+ ```
1482
+
1483
+ #### 代码格式化
1484
+
1485
+ ```bash
1486
+ # 使用 Black 格式化 Python 代码
1487
+ black api_server/
1488
+
1489
+ # 使用 isort 排序导入
1490
+ isort api_server/
1491
+
1492
+ # 使用 flake8 进行代码检查
1493
+ flake8 api_server/
1494
+
1495
+ # 使用 mypy 进行类型检查
1496
+ mypy api_server/
1497
+ ```
1498
+
1499
+ #### 数据库迁移
1500
+
1501
+ ```bash
1502
+ # 生成迁移
1503
+ alembic revision --autogenerate -m "Add new column"
1504
+
1505
+ # 应用迁移
1506
+ alembic upgrade head
1507
+
1508
+ # 回滚迁移
1509
+ alembic downgrade -1
1510
+ ```
1511
+
1512
+ #### 添加新端点
1513
+
1514
+ 1. 在 `api_server/app/routes/` 中创建路由
1515
+ 2. 在 `api_server/app/services/` 中添加业务逻辑
1516
+ 3. 在 `api_server/app/models/` 中更新模型
1517
+ 4. 在 `api_server/tests/` 中添加测试
1518
+ 5. 更新 OpenAPI 文档
1519
+
1520
+ ### 前端开发
1521
+
1522
+ #### 开发模式
1523
+
1524
+ 开发模式启用热模块替换(HMR)以获得即时反馈:
1525
+
1526
+ ```bash
1527
+ # 启动开发服务器
1528
+ npm run dev
1529
+
1530
+ # 使用自定义端口启动
1531
+ npm run dev -- --port 5174
1532
+
1533
+ # 使用调试日志启动
1534
+ DEBUG=electron* npm run dev
1535
+ ```
1536
+
1537
+ #### 类型检查
1538
+
1539
+ ```bash
1540
+ # 运行 Vue 类型检查
1541
+ npm run type-check
1542
+
1543
+ # 运行 TypeScript 编译器检查
1544
+ npx tsc --noEmit
1545
+
1546
+ # 监视模式以进行连续检查
1547
+ npm run type-check -- --watch
1548
+ ```
1549
+
1550
+ #### 构建生产版本
1551
+
1552
+ **开发构建**(带源映射):
1553
+ ```bash
1554
+ npm run build
1555
+ ```
1556
+
1557
+ **生产构建**(优化):
1558
+ ```bash
1559
+ npm run build:prod
1560
+ ```
1561
+
1562
+ **预览生产构建**:
1563
+ ```bash
1564
+ npm run preview
1565
+ ```
1566
+
1567
+ #### 构建分发包
1568
+
1569
+ 构建特定于平台的安装程序:
1570
+
1571
+ **macOS**:
1572
+ ```bash
1573
+ npm run build:mac
1574
+ # 输出:tts-voice-app/release/MoYoYo-TTS-1.0.0.dmg
1575
+ ```
1576
+
1577
+ **Windows**:
1578
+ ```bash
1579
+ npm run build:win
1580
+ # 输出:tts-voice-app/release/MoYoYo-TTS-Setup-1.0.0.exe
1581
+ ```
1582
+
1583
+ **Linux**:
1584
+ ```bash
1585
+ npm run build:linux
1586
+ # 输出:tts-voice-app/release/moyoyo-tts-1.0.0.AppImage
1587
+ ```
1588
+
1589
+ **构建所有平台**(需要特定于平台的依赖项):
1590
+ ```bash
1591
+ npm run build:all
1592
+ ```
1593
+
1594
+ **构建配置**:
1595
+ 编辑 `tts-voice-app/electron-builder.yml` 以自定义:
1596
+ - 应用名称和 ID
1597
+ - 图标文件
1598
+ - 文件关联
1599
+ - 自动更新设置
1600
+ - 代码签名
1601
+
1602
+ #### 组件开发
1603
+
1604
+ **创建新组件**:
1605
+ ```bash
1606
+ # 进入组件目录
1607
+ cd tts-voice-app/src/components
1608
+
1609
+ # 创建组件文件
1610
+ touch MyComponent.vue
1611
+ ```
1612
+
1613
+ **组件模板**:
1614
+ ```vue
1615
+ <template>
1616
+ <div class="my-component">
1617
+ <!-- 模板在这里 -->
1618
+ </div>
1619
+ </template>
1620
+
1621
+ <script setup lang="ts">
1622
+ import { ref } from 'vue'
1623
+
1624
+ // 组件逻辑在这里
1625
+ const myValue = ref('')
1626
+ </script>
1627
+
1628
+ <style scoped>
1629
+ .my-component {
1630
+ /* 样式在这里 */
1631
+ }
1632
+ </style>
1633
+ ```
1634
+
1635
+ #### 状态管理
1636
+
1637
+ 应用使用 Vue Composition API 和 Pinia stores:
1638
+
1639
+ ```typescript
1640
+ // 在 src/stores/myStore.ts 中创建新的 store
1641
+ import { defineStore } from 'pinia'
1642
+
1643
+ export const useMyStore = defineStore('myStore', {
1644
+ state: () => ({
1645
+ items: []
1646
+ }),
1647
+ getters: {
1648
+ itemCount: (state) => state.items.length
1649
+ },
1650
+ actions: {
1651
+ addItem(item) {
1652
+ this.items.push(item)
1653
+ }
1654
+ }
1655
+ })
1656
+ ```
1657
+
1658
+ #### 调试
1659
+
1660
+ **Vue DevTools**:
1661
+ - 在开发模式下自动启用
1662
+ - 通过浏览器 DevTools 面板访问
1663
+
1664
+ **Electron DevTools**:
1665
+ ```bash
1666
+ # 启动时打开 DevTools
1667
+ DEBUG_ELECTRON=true npm run dev
1668
+ ```
1669
+
1670
+ **控制台日志记录**:
1671
+ ```typescript
1672
+ // 主进程日志
1673
+ console.log('Main:', data)
1674
+
1675
+ // 渲染进程日志
1676
+ console.log('Renderer:', data)
1677
+
1678
+ // 在终端和 DevTools 控制台中检查日志
1679
+ ```
1680
+
1681
+ #### 测试
1682
+
1683
+ ```bash
1684
+ # 运行单元测试
1685
+ npm run test
1686
+
1687
+ # 使用覆盖率运行
1688
+ npm run test:coverage
1689
+
1690
+ # 运行 E2E 测试
1691
+ npm run test:e2e
1692
+
1693
+ # 监视模式
1694
+ npm run test:watch
1695
+ ```
1696
+
1697
+ ### 项目结构
1698
+
1699
+ ```
1700
+ GPT-SoVITS/
1701
+ ├── api_server/ # 后端 API
1702
+ │ ├── app/
1703
+ │ │ ├── main.py # FastAPI 应用
1704
+ │ │ ├── routes/ # API 端点
1705
+ │ │ ├── services/ # 业务逻辑
1706
+ │ │ ├── models/ # 数据模型
1707
+ │ │ └── utils/ # 实用工具
1708
+ │ └── tests/ # 后端测试
1709
+ ├── tts-voice-app/ # 前端 Electron 应用
1710
+ │ ├── src/
1711
+ │ │ ├── main/ # Electron 主进程
1712
+ │ │ ├── renderer/ # Vue UI
1713
+ │ │ ├── components/ # Vue 组件
1714
+ │ │ └── stores/ # 状态管理
1715
+ │ └── dist/ # 构建输出
1716
+ ├── GPT_SoVITS/ # 核心 ML 模型
1717
+ │ ├── pretrained_models/ # 基础模型
1718
+ │ └── text/ # 文本处理
1719
+ └── .env # 配置
1720
+ ```
1721
+
1722
+ ### 贡献指南
1723
+
1724
+ 1. **Fork 并克隆仓库**
1725
+ 2. **创建功能分支**:`git checkout -b feature/my-feature`
1726
+ 3. **进行更改**并添加测试
1727
+ 4. **运行测试和代码检查**:`pytest && black . && isort .`
1728
+ 5. **提交更改**:`git commit -m "feat: add my feature"`
1729
+ 6. **推送到分支**:`git push origin feature/my-feature`
1730
+ 7. **创建 Pull Request**并附��描述
1731
+
1732
+ **提交消息格式**:
1733
+ - `feat:`:新功能
1734
+ - `fix:`:错误修复
1735
+ - `docs:`:文档更改
1736
+ - `style:`:代码样式更改
1737
+ - `refactor:`:代码重构
1738
+ - `test:`:测试更改
1739
+ - `chore:`:构建/工具更改
1740
+
1741
+ ---
1742
+
1743
+ ## 其他资源
1744
+
1745
+ ### 文档
1746
+
1747
+ - **API 文档**:http://localhost:8000/docs
1748
+ - **设计文档**:`frontend_design.md`
1749
+ - **开发指南**:`development.md`
1750
+ - **OpenAPI 规范**:`openapi.json`
1751
+
1752
+ ### 外部链接
1753
+
1754
+ - **GPT-SoVITS 仓库**:https://github.com/RVC-Boss/GPT-SoVITS
1755
+ - **ModelScope 模型**:https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained
1756
+ - **FastAPI 文档**:https://fastapi.tiangolo.com
1757
+ - **Vue 3 文档**:https://cn.vuejs.org
1758
+ - **Electron 文档**:https://www.electronjs.org
1759
+
1760
+ ### 支持
1761
+
1762
+ 对于问题、疑问或功能请求:
1763
+ 1. 首先查看本文档
1764
+ 2. 搜索现有的 GitHub issues
1765
+ 3. 创建包含详细描述的新 issue
1766
+ 4. 包括错误消息、日志和系统信息
1767
+
1768
+ ### 许可证
1769
+
1770
+ 本项目根据 MIT 许可证授权。详见 `LICENSE` 文件。
1771
+
1772
+ ---
1773
+
1774
+ **最后更新**:2026-01-23
1775
+ **版本**:1.0.0
1776
+ **维护者**:MoYoYo.tts 开发团队
development.md ADDED
The diff for this file is too large to render. See raw diff