docs: add comprehensive usage manual for MoYoYo.tts

- Create `USAGE.md` covering installation, configuration, usage, API reference, and troubleshooting.
- Include detailed guides for Quick and Advanced Modes, setup instructions, and pipeline visualization.
- Document backend and frontend configurations, running processes, and language-specific setups.
- Add examples for training, inference, and voice library management via both API and UI.

Files changed (3) hide show

USAGE.md +1718 -0
USAGE_CN.md +1776 -0
development.md +0 -0

USAGE.md ADDED Viewed

	@@ -0,0 +1,1718 @@

+# MoYoYo.tts Usage Manual
+## Table of Contents
+- [Introduction](#introduction)
+- [System Requirements](#system-requirements)
+- [Installation Guide](#installation-guide)
+  - [Install uv Package Manager](#31-install-uv-package-manager)
+  - [Python Environment Setup](#32-python-environment-setup)
+  - [Download Required Data Files](#33-download-required-data-files)
+  - [Frontend Setup](#34-frontend-setup)
+- [Configuration](#configuration)
+  - [Backend API Configuration](#41-backend-api-configuration)
+  - [Frontend Configuration](#42-frontend-configuration)
+- [Running the Application](#running-the-application)
+  - [Start Backend API Server](#51-start-backend-api-server)
+  - [Start Frontend Electron App](#52-start-frontend-electron-app)
+- [Usage Guide](#usage-guide)
+  - [First-Time Setup](#61-first-time-setup)
+  - [Quick Mode - Voice Cloning for Beginners](#62-quick-mode---voice-cloning-for-beginners)
+  - [Advanced Mode - Expert Voice Cloning](#63-advanced-mode---expert-voice-cloning)
+  - [Text-to-Speech Generation](#64-text-to-speech-generation)
+  - [Voice Library Management](#65-voice-library-management)
+- [API Reference](#api-reference)
+- [Troubleshooting](#troubleshooting)
+- [Development](#development)
+---
+## Introduction
+MoYoYo.tts is a comprehensive voice cloning and text-to-speech system that combines:
+- **Backend API**: FastAPI-based REST API for voice training and inference
+- **Frontend Application**: Electron + Vue desktop app with intuitive UI
+The system is built on GPT-SoVITS technology, enabling high-quality voice cloning with minimal training data (as little as 5 seconds of audio).
+**Target Audience**:
+- End users who want to create custom voices for text-to-speech
+- Developers integrating voice synthesis into applications
+- Researchers experimenting with voice cloning technology
+**Key Features**:
+- Quick Mode: One-click voice cloning for beginners
+- Advanced Mode: Fine-grained control over training pipeline
+- Real-time progress tracking via Server-Sent Events (SSE)
+- Multi-language support (Chinese, English, Japanese)
+- GPU acceleration with CUDA support
+---
+## System Requirements
+### Software Requirements
+| Component | Version | Notes |
+|-----------|---------|-------|
+| **Python** | 3.10 - 3.12 | Python 3.11 recommended |
+| **Node.js** | >= 18.x | For frontend development |
+| **uv** | Latest | Python package manager |
+| **CUDA** | 12.6 or 12.8 | Optional, for GPU acceleration |
+### Hardware Requirements
+| Component | Minimum | Recommended |
+|-----------|---------|-------------|
+| **CPU** | Dual-core | Quad-core or better |
+| **RAM** | 16 GB | 32 GB (for training) |
+| **GPU** | None (CPU mode) | NVIDIA GPU with 6GB+ VRAM |
+| **Storage** | 20 GB free | 50 GB+ for multiple voices |
+**GPU Notes**:
+- GPU is optional but significantly speeds up training (5-10x faster)
+- NVIDIA GPUs with CUDA 12.6 or 12.8 support recommended
+- AMD GPUs and Apple Silicon currently not supported for training
+---
+## Installation Guide
+### 3.1 Install uv Package Manager
+uv is a fast Python package installer and resolver that replaces pip.
+**macOS / Linux**:
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```
+**Windows** (PowerShell):
+```powershell
+powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
+```
+Verify installation:
+```bash
+uv --version
+```
+### 3.2 Python Environment Setup
+The project uses `uv` for dependency management with a `pyproject.toml` configuration. The setup process is streamlined into a single command.
+**Step 1: Navigate to Project Directory**
+```bash
+cd GPT-SoVITS
+```
+**Step 2: Sync All Dependencies**
+```bash
+# This single command will:
+# - Create a virtual environment (.venv)
+# - Install Python 3.11 (or your specified version)
+# - Install all dependencies from pyproject.toml
+# - Install the correct PyTorch version for your platform
+uv sync
+```
+**Step 3: Activate Environment**
+macOS / Linux:
+```bash
+source .venv/bin/activate
+```
+Windows:
+```cmd
+.venv\Scripts\activate
+```
+You should see `(.venv)` prefix in your terminal prompt.
+**How Platform-Specific PyTorch Installation Works**:
+The `pyproject.toml` automatically selects the appropriate PyTorch version:
+- **macOS**: Installs CPU-only PyTorch (Apple Silicon uses CPU mode)
+- **Linux**: Installs CUDA 12.6 PyTorch by default
+- **Windows**: Manually select CUDA version (see below)
+**Windows Users - Choose CUDA Version**:
+For Windows, you need to specify the PyTorch index explicitly:
+**CUDA 12.6** (default):
+```bash
+uv sync
+```
+**CUDA 12.8**:
+```bash
+uv sync --index pytorch-cu128
+```
+**CPU Only** (no GPU):
+```bash
+uv sync --index pytorch-cpu
+```
+**Verify Installation**:
+```bash
+# Check Python version
+python --version  # Should show Python 3.11.x
+# Check PyTorch installation
+python -c "import torch; print(f'PyTorch: {torch.__version__}')"
+# Check CUDA availability (if you have GPU)
+python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}')"
+```
+### 3.3 Download Required Data Files
+The following data files are required for text processing and voice training.
+#### NLTK Data (Required for Text Processing)
+NLTK (Natural Language Toolkit) data is used for text tokenization and processing.
+```bash
+# Download from ModelScope
+wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/nltk_data.zip
+# Extract to Python environment
+unzip -q -o nltk_data.zip -d .venv/
+# Clean up
+rm nltk_data.zip
+```
+**Size**: ~10 MB
+**Time**: < 1 minute
+#### Open JTalk Dictionary (Required for Japanese)
+Open JTalk is required for Japanese text-to-speech processing.
+```bash
+# Get pyopenjtalk installation path
+PYOPENJTALK_PATH=$(python -c "import os, pyopenjtalk; print(os.path.dirname(pyopenjtalk.__file__))")
+# Download from ModelScope
+wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/open_jtalk_dic_utf_8-1.11.tar.gz
+# Extract to pyopenjtalk directory
+tar -xzf open_jtalk_dic_utf_8-1.11.tar.gz -C "$PYOPENJTALK_PATH"
+# Clean up
+rm open_jtalk_dic_utf_8-1.11.tar.gz
+```
+**Size**: ~50 MB
+**Time**: < 2 minutes
+### 3.4 Frontend Setup
+The frontend is an Electron application built with Vue.js.
+```bash
+# Navigate to frontend directory
+cd tts-voice-app
+# Install Node.js dependencies
+npm install
+```
+**Time**: 2-5 minutes
+**Note**: This installs all required Node.js packages including Electron, Vue, and UI components.
+---
+## Configuration
+### 4.1 Backend API Configuration
+The backend uses environment variables for configuration. Create a `.env` file in the project root for custom settings.
+**Create `.env` file** (optional, defaults work for local development):
+```bash
+# Deployment Mode
+# Options: local, server
+DEPLOYMENT_MODE=local
+# API Server Settings
+API_HOST=0.0.0.0
+API_PORT=8000
+# Data Storage Paths
+DATA_DIR=~/.moyoyo-tts/data
+SQLITE_PATH=~/.moyoyo-tts/data/tasks.db
+# Training Settings
+LOCAL_MAX_WORKERS=1  # Number of concurrent training tasks
+```
+**Configuration Options**:
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `DEPLOYMENT_MODE` | `local` | Deployment environment (local/server) |
+| `API_HOST` | `0.0.0.0` | API server bind address |
+| `API_PORT` | `8000` | API server port |
+| `DATA_DIR` | `~/.moyoyo-tts/data` | Directory for data storage |
+| `SQLITE_PATH` | `~/.moyoyo-tts/data/tasks.db` | SQLite database path |
+| `LOCAL_MAX_WORKERS` | `1` | Max concurrent training tasks |
+**Notes**:
+- `API_HOST=0.0.0.0` allows connections from any network interface
+- `LOCAL_MAX_WORKERS=1` prevents memory issues on systems with limited RAM
+- Increase `LOCAL_MAX_WORKERS` on high-end systems to train multiple voices simultaneously
+### 4.2 Frontend Configuration
+The frontend requires minimal configuration for local development.
+**Default Settings**:
+- **API Endpoint**: `http://localhost:8000`
+- **Voice Storage**: `~/.moyoyo-tts/voices/`
+- **Model Storage**: `GPT_SoVITS/pretrained_models/`
+**Auto-Configuration**:
+The Electron app will:
+1. Automatically detect and connect to the local API server
+2. Create required directories on first launch
+3. Download missing models via the Model Setup page
+No manual configuration needed for standard usage.
+---
+## Running the Application
+### 5.1 Start Backend API Server
+**Step 1: Activate Python Environment**
+```bash
+# Navigate to project directory
+cd GPT-SoVITS
+# Activate virtual environment
+source .venv/bin/activate  # macOS/Linux
+.venv\Scripts\activate     # Windows
+```
+**Step 2: Start the API Server**
+Method 1 - Using the main script:
+```bash
+cd api_server
+python app/main.py
+```
+Method 2 - Using uvicorn directly:
+```bash
+uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
+```
+**Expected Output**:
+```
+INFO:     Started server process [12345]
+INFO:     Waiting for application startup.
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
+```
+**API Documentation**:
+Once the server is running, access interactive API documentation:
+- **Swagger UI**: http://localhost:8000/docs
+- **ReDoc**: http://localhost:8000/redoc
+- **OpenAPI JSON**: http://localhost:8000/openapi.json
+**Health Check**:
+```bash
+curl http://localhost:8000/health
+# Expected: {"status": "healthy"}
+```
+### 5.2 Start Frontend Electron App
+**Step 1: Open New Terminal**
+Keep the backend server running and open a new terminal window.
+**Step 2: Navigate to Frontend Directory**
+```bash
+cd tts-voice-app
+```
+**Step 3: Start Development Mode**
+```bash
+npm run dev
+```
+**Expected Output**:
+```
+> tts-voice-app@1.0.0 dev
+> electron-vite dev
+  VITE v4.x.x  ready in xxx ms
+  ➜  Local:   http://localhost:5173/
+  ➜  Network: use --host to expose
+Electron app starting...
+```
+The Electron application will launch automatically with hot-reload enabled for development.
+**Features in Development Mode**:
+- Hot module replacement (HMR) for instant UI updates
+- Vue DevTools integration
+- Console logging for debugging
+- Automatic restart on main process changes
+---
+## Usage Guide
+### 6.1 First-Time Setup
+When you first launch the Electron app, you'll need to download required models.
+**Setup Process**:
+1. **Launch the Electron App**
+   ```bash
+   cd tts-voice-app
+   npm run dev
+   ```
+2. **Model Setup Page**
+   - The app automatically detects missing models
+   - You'll be redirected to the Model Setup page
+3. **Download Models**
+   - Click "Download All Models" button
+   - Models to be downloaded:
+     - **Pretrained Models**: 4.56 GB
+     - **G2PW Model**: 588.86 MB
+     - **FunASR**: 1.09 GB
+     - **Faster Whisper**: 2.85 GB
+   - Total download size: ~9 GB
+4. **Monitor Progress**
+   - Real-time progress bars show download status
+   - Estimated time: 10-30 minutes (depends on connection)
+   - Downloads can be paused and resumed
+5. **Setup Complete**
+   - Once all models are downloaded, click "Continue"
+   - You'll be redirected to the main TTS page
+   - The app is now ready to use
+**Troubleshooting**:
+- If downloads fail, check your internet connection
+- Verify you have ~10 GB free disk space
+- For manual installation, see section 3.3
+### 6.2 Quick Mode - Voice Cloning for Beginners
+Quick Mode provides a simplified workflow for users who want to create a voice clone quickly without technical knowledge.
+#### Using the API
+**Step 1: Upload Audio File**
+```bash
+curl -X POST http://localhost:8000/api/v1/files \
+  -F "file=@path/to/voice_sample.wav" \
+  -F "purpose=training"
+```
+**Response**:
+```json
+{
+  "file_id": "550e8400-e29b-41d4-a716-446655440000",
+  "filename": "voice_sample.wav",
+  "size": 1234567,
+  "purpose": "training"
+}
+```
+**Step 2: Create Training Task**
+```bash
+curl -X POST http://localhost:8000/api/v1/tasks \
+  -H "Content-Type: application/json" \
+  -d '{
+    "exp_name": "my_voice",
+    "audio_file_id": "550e8400-e29b-41d4-a716-446655440000",
+    "options": {
+      "version": "v2",
+      "language": "zh",
+      "quality": "standard"
+    }
+  }'
+```
+**Response**:
+```json
+{
+  "id": "task-uuid-here",
+  "status": "queued",
+  "exp_name": "my_voice",
+  "created_at": "2026-01-23T10:30:00Z"
+}
+```
+**Step 3: Monitor Progress**
+Using Server-Sent Events (SSE):
+```bash
+curl -N http://localhost:8000/api/v1/tasks/task-uuid-here/progress
+```
+**Progress Events**:
+```
+event: progress
+data: {"stage": "audio_slice", "progress": 25, "message": "Slicing audio..."}
+event: progress
+data: {"stage": "sovits_train", "progress": 50, "message": "Training SoVITS model..."}
+event: complete
+data: {"status": "completed", "voice_id": "voice-uuid-here"}
+```
+#### Quality Presets
+| Preset | SoVITS Epochs | GPT Epochs | Est. Time | Quality |
+|--------|---------------|------------|-----------|---------|
+| **fast** | 4 | 8 | ~10 min | Good for testing |
+| **standard** | 8 | 15 | ~20 min | Balanced quality/speed |
+| **high** | 16 | 30 | ~40 min | Best quality |
+**Recommendations**:
+- Use `fast` for quick tests and previews
+- Use `standard` for most production use cases
+- Use `high` for professional applications requiring maximum quality
+#### Using the UI
+**Step 1: Navigate to Voice Clone Page**
+- Click "Voice Clone" in the sidebar
+- Or use keyboard shortcut: `Ctrl/Cmd + N`
+**Step 2: Upload Audio Sample**
+- Click "Upload Audio" button
+- Select a WAV or MP3 file
+- **Requirements**:
+  - Duration: 5-30 seconds recommended
+  - Quality: Clear voice, minimal background noise
+  - Content: Natural speech, not singing or shouting
+**Step 3: Configure Training**
+- **Voice Name**: Enter a unique name (e.g., "John's Voice")
+- **Language**: Select primary language (Chinese, English, Japanese)
+- **Quality Preset**: Choose from fast/standard/high
+**Step 4: Start Training**
+- Click "Start Training" button
+- The task will be queued and processing will begin
+**Step 5: Monitor Progress**
+- Progress bar shows overall completion
+- Current stage displayed (e.g., "Training SoVITS model...")
+- Estimated time remaining shown
+- You can navigate away and check back later
+**Step 6: Training Complete**
+- You'll receive a notification when complete
+- The voice automatically appears in Voice Library
+- You can immediately use it for TTS generation
+**Tips for Best Results**:
+- Use high-quality audio (preferably 48kHz WAV)
+- Ensure consistent tone and speaking style
+- Avoid audio with music or sound effects
+- 10-15 seconds is the sweet spot for sample length
+- Multiple short samples can be combined
+### 6.3 Advanced Mode - Expert Voice Cloning
+Advanced Mode provides granular control over each stage of the voice training pipeline. This is recommended for users who want to fine-tune training parameters.
+#### Training Pipeline Stages
+The complete training pipeline consists of 7 stages:
+1. **Audio Slice**: Split audio into segments
+2. **ASR** (Automatic Speech Recognition): Transcribe audio to text
+3. **Text Feature**: Extract text embeddings
+4. **Hubert Feature**: Extract audio features
+5. **Semantic Token**: Generate semantic tokens
+6. **SoVITS Train**: Train voice synthesis model
+7. **GPT Train**: Train text-to-semantic model
+#### Stage Dependencies
+```
+audio_slice → asr → text_feature → sovits_train
+            ↘                    ↗
+              hubert_feature → semantic_token → gpt_train
+```
+**Important**: Each stage must wait for its dependencies to complete.
+#### Using the API
+**Step 1: Create Experiment**
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments \
+  -H "Content-Type: application/json" \
+  -d '{
+    "exp_name": "my_custom_voice",
+    "version": "v2",
+    "audio_file_id": "file-uuid-here"
+  }'
+```
+**Response**:
+```json
+{
+  "id": "exp-uuid-here",
+  "exp_name": "my_custom_voice",
+  "version": "v2",
+  "stages": {
+    "audio_slice": {"status": "pending"},
+    "asr": {"status": "pending"},
+    "text_feature": {"status": "pending"},
+    "hubert_feature": {"status": "pending"},
+    "semantic_token": {"status": "pending"},
+    "sovits_train": {"status": "pending"},
+    "gpt_train": {"status": "pending"}
+  }
+}
+```
+**Step 2: Execute Stages Individually**
+**Stage 1 - Audio Slice**:
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/audio_slice \
+  -H "Content-Type: application/json" \
+  -d '{
+    "threshold": -34,
+    "min_length": 4000,
+    "min_interval": 300,
+    "hop_size": 10,
+    "max_silence_kept": 500
+  }'
+```
+**Parameters**:
+- `threshold`: dB threshold for silence detection (-60 to 0, default: -34)
+- `min_length`: Minimum segment length in ms (1000-10000, default: 4000)
+- `min_interval`: Minimum silence interval in ms (0-3000, default: 300)
+- `hop_size`: Analysis window hop size in ms (default: 10)
+- `max_silence_kept`: Maximum silence to keep in ms (default: 500)
+**Stage 2 - ASR**:
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/asr \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "达摩 ASR (中文)",
+    "language": "zh"
+  }'
+```
+**ASR Models**:
+- `达摩 ASR (中文)`: DamoASR for Chinese (best for Chinese)
+- `Faster Whisper (多语言)`: Faster Whisper for multilingual
+**Stage 3 - Text Feature**:
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/text_feature \
+  -H "Content-Type: application/json" \
+  -d '{
+    "language": "zh"
+  }'
+```
+**Stage 4 - Hubert Feature**:
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/hubert_feature \
+  -H "Content-Type: application/json" \
+  -d '{}'
+```
+**Stage 5 - Semantic Token**:
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/semantic_token \
+  -H "Content-Type: application/json" \
+  -d '{}'
+```
+**Stage 6 - SoVITS Train**:
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/sovits_train \
+  -H "Content-Type: application/json" \
+  -d '{
+    "total_epoch": 8,
+    "batch_size": 4,
+    "save_every_epoch": 4,
+    "text_low_lr_rate": 0.4,
+    "if_save_latest": true,
+    "if_save_every_weights": true,
+    "version": "v2"
+  }'
+```
+**Parameters**:
+- `total_epoch`: Total training epochs (4-32, default: 8)
+- `batch_size`: Batch size (1-40, default: 4)
+- `save_every_epoch`: Save checkpoint every N epochs (1-50, default: 4)
+- `text_low_lr_rate`: Text encoder learning rate multiplier (0.2-1.0, default: 0.4)
+**Stage 7 - GPT Train**:
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/gpt_train \
+  -H "Content-Type: application/json" \
+  -d '{
+    "total_epoch": 15,
+    "batch_size": 4,
+    "save_every_epoch": 5,
+    "if_save_latest": true,
+    "if_save_every_weights": true,
+    "version": "v2"
+  }'
+```
+**Step 3: Monitor Stage Progress**
+Each stage provides real-time progress via SSE:
+```bash
+curl -N http://localhost:8000/api/v1/experiments/exp-uuid/stages/sovits_train/progress
+```
+**Progress Events**:
+```
+event: progress
+data: {"epoch": 2, "total_epochs": 8, "progress": 25, "loss": 0.234}
+event: progress
+data: {"epoch": 4, "total_epochs": 8, "progress": 50, "loss": 0.189}
+event: complete
+data: {"status": "completed", "final_loss": 0.142}
+```
+#### Using the UI
+**Step 1: Create New Experiment**
+- Navigate to "Advanced Mode" page
+- Click "New Experiment"
+- Enter experiment name and upload audio
+**Step 2: Configure Each Stage**
+- Click on a stage card to expand settings
+- Adjust parameters (or use preset defaults)
+- Click "Run Stage" to execute
+**Step 3: Monitor Pipeline**
+- Visual pipeline diagram shows stage status
+- Green: Completed, Blue: Running, Gray: Pending
+- Click any stage to view detailed logs
+**Step 4: Iterate and Refine**
+- Review results after each stage
+- Adjust parameters and re-run if needed
+- Export final model when satisfied
+**Advanced Tips**:
+- Use lower `batch_size` (2-4) on GPUs with limited memory
+- Increase `total_epoch` for better quality with sufficient data
+- Save checkpoints frequently (`save_every_epoch`) to recover from interruptions
+- Monitor loss values - should decrease over epochs
+### 6.4 Text-to-Speech Generation
+Once you have trained a voice, you can use it to generate speech from text.
+#### Using the API
+**Basic TTS Request**:
+```bash
+curl -X POST http://localhost:8000/api/v1/inference/tts \
+  -H "Content-Type: application/json" \
+  -d '{
+    "text": "Hello, this is a test of text-to-speech synthesis.",
+    "voice_id": "voice-uuid-here",
+    "speed": 1.0,
+    "emotion": "auto"
+  }'
+```
+**Response**:
+```json
+{
+  "audio_url": "http://localhost:8000/api/v1/files/audio-uuid-here",
+  "duration": 3.2,
+  "format": "wav"
+}
+```
+**Parameters**:
+- `text` (required): Text to synthesize (max 5000 characters)
+- `voice_id` (required): UUID of trained voice
+- `speed` (optional): Speaking speed multiplier (0.5 - 2.0, default: 1.0)
+- `emotion` (optional): Emotion style (auto, neutral, happy, sad)
+- `seed` (optional): Random seed for reproducibility
+**Download Generated Audio**:
+```bash
+curl -o output.wav http://localhost:8000/api/v1/files/audio-uuid-here
+```
+#### Using the UI
+**Step 1: Navigate to TTS Page**
+- Click "Text to Speech" in sidebar
+- Or use keyboard shortcut: `Ctrl/Cmd + T`
+**Step 2: Select Voice**
+- Open voice dropdown
+- Select a trained voice from the list
+- Preview button lets you hear a sample
+**Step 3: Enter Text**
+- Type or paste text into the text area
+- Character count shown (max 5000)
+- Supports multi-line text
+**Step 4: Adjust Settings**
+- **Speed**: Drag slider or enter value (0.5x - 2.0x)
+  - 0.5x: Very slow, clear enunciation
+  - 1.0x: Natural speaking pace
+  - 1.5x: Fast, still intelligible
+  - 2.0x: Very fast
+- **Emotion**: Select from dropdown (if supported by model)
+  - Auto: Infer from text
+  - Neutral: Flat, factual delivery
+  - Happy: Upbeat, positive tone
+  - Sad: Somber, melancholic tone
+**Step 5: Generate**
+- Click "Generate" button
+- Processing takes 2-5 seconds
+- Progress indicator shown
+**Step 6: Listen and Download**
+- Audio player appears automatically
+- Click play button to listen
+- Click download button to save WAV file
+- Share button to copy shareable link
+**Text Guidelines**:
+- Use proper punctuation for natural pauses
+- Break long text into sentences
+- Use quotation marks for dialogue
+- All-caps for emphasis (use sparingly)
+**Tips for Natural Speech**:
+- Add commas for breath pauses
+- Use ellipsis (...) for trailing off
+- Question marks affect intonation
+- Exclamation points add emphasis
+### 6.5 Voice Library Management
+The Voice Library is where all your trained voices are stored and managed.
+#### Using the API
+**List All Voices**:
+```bash
+curl http://localhost:8000/api/v1/files?purpose=training
+```
+**Response**:
+```json
+{
+  "files": [
+    {
+      "id": "voice-uuid-1",
+      "filename": "john_voice",
+      "created_at": "2026-01-20T10:30:00Z",
+      "size": 1234567,
+      "metadata": {
+        "language": "zh",
+        "quality": "standard",
+        "duration": 12.5
+      }
+    },
+    {
+      "id": "voice-uuid-2",
+      "filename": "mary_voice",
+      "created_at": "2026-01-21T14:20:00Z",
+      "size": 2345678,
+      "metadata": {
+        "language": "en",
+        "quality": "high",
+        "duration": 18.3
+      }
+    }
+  ]
+}
+```
+**Get Voice Details**:
+```bash
+curl http://localhost:8000/api/v1/files/voice-uuid-1
+```
+**Delete Voice**:
+```bash
+curl -X DELETE http://localhost:8000/api/v1/files/voice-uuid-1
+```
+**Export Voice Model**:
+```bash
+curl -o voice_model.zip http://localhost:8000/api/v1/voices/voice-uuid-1/export
+```
+#### Using the UI
+**Browse Voice Library**:
+- Navigate to "Voice Library" page
+- Voices displayed as cards with:
+  - Voice name
+  - Language and quality badges
+  - Creation date
+  - Sample duration
+  - Preview waveform
+**Voice Card Actions**:
+- **Play**: Listen to voice sample
+- **Edit**: Rename or update metadata
+- **Export**: Download voice model files
+- **Delete**: Remove voice (with confirmation)
+**Search and Filter**:
+- Search bar: Filter by voice name
+- Language filter: Show only specific languages
+- Quality filter: Show only specific quality presets
+- Sort options:
+  - Name (A-Z)
+  - Date created (newest first)
+  - Date created (oldest first)
+  - File size
+**Bulk Operations**:
+- Select multiple voices (Shift+Click)
+- Export selected voices as ZIP
+- Delete selected voices
+- Tag selected voices
+**Voice Details Panel**:
+Click on any voice card to view:
+- Full training parameters
+- Training history and logs
+- Model file sizes
+- Sample audio clips
+- Export and sharing options
+**Organization Tips**:
+- Use descriptive names (e.g., "John_Professional", "Mary_Casual")
+- Tag voices by project or use case
+- Export important voices as backups
+- Delete test voices to save space
+---
+## API Reference
+### Quick Mode Endpoints
+#### Tasks
+**Create Task** - Start a one-click voice training task
+```http
+POST /api/v1/tasks
+Content-Type: application/json
+{
+  "exp_name": "string",
+  "audio_file_id": "uuid",
+  "options": {
+    "version": "v2",
+    "language": "zh|en|ja",
+    "quality": "fast|standard|high"
+  }
+}
+```
+**List Tasks** - Get all tasks
+```http
+GET /api/v1/tasks?status=queued|running|completed|failed
+```
+**Get Task** - Get specific task details
+```http
+GET /api/v1/tasks/{task_id}
+```
+**Cancel Task** - Cancel a running task
+```http
+DELETE /api/v1/tasks/{task_id}
+```
+**Task Progress** - Real-time progress via SSE
+```http
+GET /api/v1/tasks/{task_id}/progress
+Accept: text/event-stream
+```
+### Advanced Mode Endpoints
+#### Experiments
+**Create Experiment** - Initialize a new training experiment
+```http
+POST /api/v1/experiments
+Content-Type: application/json
+{
+  "exp_name": "string",
+  "version": "v2",
+  "audio_file_id": "uuid"
+}
+```
+**Get Experiment** - Get experiment details
+```http
+GET /api/v1/experiments/{exp_id}
+```
+**List Experiments** - Get all experiments
+```http
+GET /api/v1/experiments?status=pending|running|completed
+```
+**Delete Experiment** - Delete experiment and all data
+```http
+DELETE /api/v1/experiments/{exp_id}
+```
+#### Stages
+**Execute Stage** - Run a specific pipeline stage
+```http
+POST /api/v1/experiments/{exp_id}/stages/{stage_type}
+Content-Type: application/json
+{
+  // Stage-specific parameters
+}
+```
+**Stage Types**:
+- `audio_slice`
+- `asr`
+- `text_feature`
+- `hubert_feature`
+- `semantic_token`
+- `sovits_train`
+- `gpt_train`
+**Get Stage Status** - Get status of a specific stage
+```http
+GET /api/v1/experiments/{exp_id}/stages/{stage_type}
+```
+**Get All Stage Statuses** - Get status of all stages
+```http
+GET /api/v1/experiments/{exp_id}/stages
+```
+**Stage Progress** - Real-time stage progress via SSE
+```http
+GET /api/v1/experiments/{exp_id}/stages/{stage_type}/progress
+Accept: text/event-stream
+```
+**Get Stage Schema** - Get parameters schema for a stage
+```http
+GET /api/v1/stages/{stage_type}/schema
+```
+### Common Endpoints
+#### Files
+**Upload File** - Upload audio or data file
+```http
+POST /api/v1/files
+Content-Type: multipart/form-data
+file: binary
+purpose: training|inference
+```
+**List Files** - Get all uploaded files
+```http
+GET /api/v1/files?purpose=training|inference
+```
+**Get File** - Download a specific file
+```http
+GET /api/v1/files/{file_id}
+```
+**Delete File** - Delete a file
+```http
+DELETE /api/v1/files/{file_id}
+```
+#### Inference
+**Text-to-Speech** - Generate speech from text
+```http
+POST /api/v1/inference/tts
+Content-Type: application/json
+{
+  "text": "string",
+  "voice_id": "uuid",
+  "speed": 1.0,
+  "emotion": "auto|neutral|happy|sad",
+  "seed": 42
+}
+```
+**Get Voice Info** - Get voice model information
+```http
+GET /api/v1/voices/{voice_id}
+```
+#### Configuration
+**Get Stage Presets** - Get preset configurations for stages
+```http
+GET /api/v1/stages/presets
+```
+**Health Check** - Check API server health
+```http
+GET /health
+```
+**Full OpenAPI specification available at**: http://localhost:8000/openapi.json
+---
+## Troubleshooting
+### Backend Issues
+#### Port Already in Use
+**Symptom**: Error message `Address already in use` when starting server.
+**Solution 1** - Change port in `.env`:
+```bash
+echo "API_PORT=8001" >> .env
+python app/main.py
+```
+**Solution 2** - Find and kill process using port:
+```bash
+# macOS/Linux
+lsof -ti:8000 | xargs kill -9
+# Windows
+netstat -ano | findstr :8000
+taskkill /PID <pid> /F
+```
+#### Database Errors
+**Symptom**: `sqlite3.OperationalError` or database corruption messages.
+**Solution** - Reset database:
+```bash
+# Backup existing database (optional)
+cp ~/.moyoyo-tts/data/tasks.db ~/.moyoyo-tts/data/tasks.db.backup
+# Remove corrupted database
+rm ~/.moyoyo-tts/data/tasks.db
+# Restart API server (database will be recreated)
+python app/main.py
+```
+#### Training Fails Immediately
+**Symptom**: Training starts but fails within seconds.
+**Diagnosis**:
+```bash
+# Check GPU availability
+python -c "import torch; print(torch.cuda.is_available())"
+# Check CUDA version
+python -c "import torch; print(torch.version.cuda)"
+# Check disk space
+df -h
+```
+**Solutions**:
+1. **No GPU**: System will use CPU (slower but works)
+2. **CUDA mismatch**: Reinstall PyTorch with correct CUDA version
+3. **Out of disk space**: Free up at least 10GB
+4. **Out of memory**: Reduce `batch_size` in training parameters
+#### Python Environment Issues
+**Symptom**: `ModuleNotFoundError` or import errors.
+**Solution**:
+```bash
+# Verify environment is activated
+which python  # Should show path in .venv
+# Reinstall all dependencies
+uv sync --reinstall
+# Or force reinstall from scratch
+rm -rf .venv
+uv sync
+# Check for missing packages
+uv pip list
+```
+### Frontend Issues
+#### Cannot Connect to API
+**Symptom**: Frontend shows "Cannot connect to server" error.
+**Diagnosis**:
+```bash
+# Check if backend is running
+curl http://localhost:8000/health
+# Check network connectivity
+ping localhost
+```
+**Solutions**:
+1. **Backend not running**: Start backend server (see section 5.1)
+2. **Wrong port**: Check backend is on port 8000
+3. **Firewall**: Allow connections to localhost:8000
+4. **CORS error**: Check CORS settings in backend `.env`
+#### Models Not Downloading
+**Symptom**: Model download fails or hangs indefinitely.
+**Solutions**:
+1. **Check internet connection**:
+   ```bash
+   curl -I https://www.modelscope.cn
+   ```
+2. **Check disk space**:
+   ```bash
+   df -h  # Need ~10GB free
+   ```
+3. **Manual download**: See section 3.3 for manual installation
+4. **Proxy issues**: Configure proxy settings:
+   ```bash
+   export http_proxy=http://proxy.example.com:8080
+   export https_proxy=http://proxy.example.com:8080
+   ```
+#### Electron App Won't Start
+**Symptom**: App crashes on launch or shows blank screen.
+**Solution 1** - Clear cache and rebuild:
+```bash
+# Navigate to frontend directory
+cd tts-voice-app
+# Clear cache
+rm -rf node_modules package-lock.json dist .vite
+# Reinstall dependencies
+npm install
+# Rebuild
+npm run dev
+```
+**Solution 2** - Check Node.js version:
+```bash
+node --version  # Should be >= 18.x
+# Update Node.js if needed
+nvm install 18
+nvm use 18
+```
+**Solution 3** - Check Electron logs:
+```bash
+# macOS
+~/Library/Logs/tts-voice-app/
+# Linux
+~/.config/tts-voice-app/logs/
+# Windows
+%APPDATA%\tts-voice-app\logs\
+```
+### Common Errors
+#### "PYTHONPATH not set" Error
+**Symptom**: Import errors related to `GPT_SoVITS` module.
+**Cause**: The API server needs to find the main project directory.
+**Solution**: The API automatically sets `PYTHONPATH`, but verify:
+```bash
+# Check project structure
+ls GPT-SoVITS/  # Should contain *.py files
+# Set manually if needed
+export PYTHONPATH=/Users/coldish/workspace/GPT-SoVITS:$PYTHONPATH
+```
+#### "Model not found" Error
+**Symptom**: Training fails with "Cannot find pretrained model" message.
+**Diagnosis**:
+```bash
+# Check if models exist
+ls GPT_SoVITS/pretrained_models/
+# Should show: s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt, s2G488k.pth, s2D488k.pth
+```
+**Solution**: Download pretrained models (see section 3.3):
+```bash
+wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/pretrained_models.zip
+unzip -q -o pretrained_models.zip -d GPT_SoVITS
+```
+#### "Out of memory" Error
+**Symptom**: Training crashes with `CUDA out of memory` or `MemoryError`.
+**Solutions**:
+1. **Reduce batch size**:
+   ```json
+   {
+     "batch_size": 2  // Reduce from 4 to 2
+   }
+   ```
+2. **Close other applications**: Free up GPU/RAM
+3. **Use CPU mode**: Slower but uses system RAM instead of GPU:
+   ```bash
+   # Set environment variable
+   export CUDA_VISIBLE_DEVICES=""
+   python app/main.py
+   ```
+4. **Increase system swap** (Linux):
+   ```bash
+   sudo dd if=/dev/zero of=/swapfile bs=1G count=8
+   sudo mkswap /swapfile
+   sudo swapon /swapfile
+   ```
+#### "NLTK Data Not Found" Error
+**Symptom**: Text processing fails with NLTK data errors.
+**Solution**: Download NLTK data (see section 3.3):
+```bash
+wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/nltk_data.zip
+unzip -q -o nltk_data.zip -d .venv/
+```
+#### Audio Quality Issues
+**Symptom**: Generated audio sounds robotic, distorted, or unclear.
+**Solutions**:
+1. **Use better training data**:
+   - High-quality audio (48kHz WAV preferred)
+   - Clear voice, minimal background noise
+   - 10-15 seconds of audio
+   - Natural, conversational speech
+2. **Increase training quality**:
+   ```json
+   {
+     "quality": "high"  // Use high instead of standard
+   }
+   ```
+3. **Train longer**:
+   ```json
+   {
+     "total_epoch": 16  // Increase from 8 to 16
+   }
+   ```
+4. **Check reference audio**: Ensure uploaded audio is not corrupted
+---
+## Development
+### Backend Development
+#### Running with Hot-Reload
+Hot-reload automatically restarts the server when code changes are detected:
+```bash
+# Using uvicorn
+uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
+# With custom reload directories
+uvicorn app.main:app --reload --reload-dir api_server/app
+```
+#### Running Tests
+```bash
+# Navigate to project root
+cd GPT-SoVITS
+# Run all tests
+pytest api_server/tests/
+# Run specific test file
+pytest api_server/tests/test_tasks.py
+# Run with coverage report
+pytest --cov=api_server/app --cov-report=html
+# View coverage report
+open htmlcov/index.html
+```
+#### Code Formatting
+```bash
+# Format Python code with Black
+black api_server/
+# Sort imports with isort
+isort api_server/
+# Lint with flake8
+flake8 api_server/
+# Type checking with mypy
+mypy api_server/
+```
+#### Database Migrations
+```bash
+# Generate migration
+alembic revision --autogenerate -m "Add new column"
+# Apply migrations
+alembic upgrade head
+# Rollback migration
+alembic downgrade -1
+```
+#### Adding New Endpoints
+1. Create route in `api_server/app/routes/`
+2. Add business logic in `api_server/app/services/`
+3. Update models in `api_server/app/models/`
+4. Add tests in `api_server/tests/`
+5. Update OpenAPI documentation
+### Frontend Development
+#### Development Mode
+Development mode enables hot module replacement (HMR) for instant feedback:
+```bash
+# Start development server
+npm run dev
+# Start with custom port
+npm run dev -- --port 5174
+# Start with debug logging
+DEBUG=electron* npm run dev
+```
+#### Type Checking
+```bash
+# Run Vue type checking
+npm run type-check
+# Run TypeScript compiler check
+npx tsc --noEmit
+# Watch mode for continuous checking
+npm run type-check -- --watch
+```
+#### Building for Production
+**Development Build** (with source maps):
+```bash
+npm run build
+```
+**Production Build** (optimized):
+```bash
+npm run build:prod
+```
+**Preview Production Build**:
+```bash
+npm run preview
+```
+#### Building Distribution Packages
+Build platform-specific installers:
+**macOS**:
+```bash
+npm run build:mac
+# Output: tts-voice-app/release/MoYoYo-TTS-1.0.0.dmg
+```
+**Windows**:
+```bash
+npm run build:win
+# Output: tts-voice-app/release/MoYoYo-TTS-Setup-1.0.0.exe
+```
+**Linux**:
+```bash
+npm run build:linux
+# Output: tts-voice-app/release/moyoyo-tts-1.0.0.AppImage
+```
+**Build All Platforms** (requires platform-specific dependencies):
+```bash
+npm run build:all
+```
+**Build Configuration**:
+Edit `tts-voice-app/electron-builder.yml` to customize:
+- App name and ID
+- Icon files
+- File associations
+- Auto-update settings
+- Code signing
+#### Component Development
+**Create New Component**:
+```bash
+# Navigate to components directory
+cd tts-voice-app/src/components
+# Create component file
+touch MyComponent.vue
+```
+**Component Template**:
+```vue
+<template>
+  <div class="my-component">
+    <!-- Template here -->
+  </div>
+</template>
+<script setup lang="ts">
+import { ref } from 'vue'
+// Component logic here
+const myValue = ref('')
+</script>
+<style scoped>
+.my-component {
+  /* Styles here */
+}
+</style>
+```
+#### State Management
+The app uses Vue Composition API with Pinia stores:
+```typescript
+// Create new store in src/stores/myStore.ts
+import { defineStore } from 'pinia'
+export const useMyStore = defineStore('myStore', {
+  state: () => ({
+    items: []
+  }),
+  getters: {
+    itemCount: (state) => state.items.length
+  },
+  actions: {
+    addItem(item) {
+      this.items.push(item)
+    }
+  }
+})
+```
+#### Debugging
+**Vue DevTools**:
+- Automatically enabled in development mode
+- Access via browser DevTools panel
+**Electron DevTools**:
+```bash
+# Open DevTools on startup
+DEBUG_ELECTRON=true npm run dev
+```
+**Console Logging**:
+```typescript
+// Main process logs
+console.log('Main:', data)
+// Renderer process logs
+console.log('Renderer:', data)
+// Check logs in terminal and DevTools console
+```
+#### Testing
+```bash
+# Run unit tests
+npm run test
+# Run with coverage
+npm run test:coverage
+# Run E2E tests
+npm run test:e2e
+# Watch mode
+npm run test:watch
+```
+### Project Structure
+```
+GPT-SoVITS/
+├── api_server/              # Backend API
+│   ├── app/
+│   │   ├── main.py         # FastAPI application
+│   │   ├── routes/         # API endpoints
+│   │   ├── services/       # Business logic
+│   │   ├── models/         # Data models
+│   │   └── utils/          # Utilities
+│   └── tests/              # Backend tests
+├── tts-voice-app/          # Frontend Electron app
+│   ├── src/
+│   │   ├── main/           # Electron main process
+│   │   ├── renderer/       # Vue UI
+│   │   ├── components/     # Vue components
+│   │   └── stores/         # State management
+│   └── dist/               # Build output
+├── GPT_SoVITS/             # Core ML models
+│   ├── pretrained_models/  # Base models
+│   └── text/               # Text processing
+└── .env                    # Configuration
+```
+### Contribution Guidelines
+1. **Fork and clone the repository**
+2. **Create feature branch**: `git checkout -b feature/my-feature`
+3. **Make changes** and add tests
+4. **Run tests and linting**: `pytest && black . && isort .`
+5. **Commit changes**: `git commit -m "feat: add my feature"`
+6. **Push to branch**: `git push origin feature/my-feature`
+7. **Create Pull Request** with description
+**Commit Message Format**:
+- `feat:` New feature
+- `fix:` Bug fix
+- `docs:` Documentation changes
+- `style:` Code style changes
+- `refactor:` Code refactoring
+- `test:` Test changes
+- `chore:` Build/tooling changes
+---
+## Additional Resources
+### Documentation
+- **API Documentation**: http://localhost:8000/docs
+- **Design Document**: `frontend_design.md`
+- **Development Guide**: `development.md`
+- **OpenAPI Specification**: `openapi.json`
+### External Links
+- **GPT-SoVITS Repository**: https://github.com/RVC-Boss/GPT-SoVITS
+- **ModelScope Models**: https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained
+- **FastAPI Documentation**: https://fastapi.tiangolo.com
+- **Vue 3 Documentation**: https://vuejs.org
+- **Electron Documentation**: https://www.electronjs.org
+### Support
+For issues, questions, or feature requests:
+1. Check this documentation first
+2. Search existing GitHub issues
+3. Create a new issue with detailed description
+4. Include error messages, logs, and system info
+### License
+This project is licensed under the MIT License. See `LICENSE` file for details.
+---
+**Last Updated**: 2026-01-23
+**Version**: 1.0.0
+**Maintainers**: MoYoYo.tts Development Team

USAGE_CN.md ADDED Viewed

	@@ -0,0 +1,1776 @@

+# MoYoYo.tts 使用手册
+## 目录
+- [简介](#简介)
+- [快速开始](#快速开始)
+- [系统要求](#系统要求)
+- [安装指南](#安装指南)
+  - [安装 uv 包管理器](#31-安装-uv-包管理器)
+  - [Python 环境设置](#32-python-环境设置)
+  - [下载必需的数据文件](#33-下载必需的数据文件)
+  - [前端设置](#34-前端设置)
+- [配置](#配置)
+  - [后端 API 配置](#41-后端-api-配置)
+  - [前端配置](#42-前端配置)
+- [运行应用](#运行应用)
+  - [启动后端 API 服务器](#51-启动后端-api-服务器)
+  - [启动前端 Electron 应用](#52-启动前端-electron-应用)
+- [使用指南](#使用指南)
+  - [首次设置](#61-首次设置)
+  - [快速模式 - 初学者声音克隆](#62-快速模式---初学者声音克隆)
+  - [高级模式 - 专家声音克隆](#63-高级模式---专家声音克隆)
+  - [文本转语音生成](#64-文本转语音生成)
+  - [声音库管理](#65-声音库管理)
+- [API 参考](#api-参考)
+- [故障排除](#故障排除)
+- [开发](#开发)
+---
+## 简介
+MoYoYo.tts 是一个综合性的声音克隆和文本转语音系统，结合了：
+- **后端 API**：基于 FastAPI 的 REST API，用于声音训练和推理
+- **前端应用**：Electron + Vue 桌面应用，具有直观的用户界面
+该系统基于 GPT-SoVITS 技术构建，能够使用最少的训练数据（最短 5 秒音频）实现高质量的声音克隆。
+**目标用户**：
+- 想要创建自定义文本转语音声音的最终用户
+- 将语音合成集成到应用程序中的开发人员
+- 从事声音克隆技术研究的研究人员
+**主要功能**：
+- 快速模式：为初学者提供一键式声音克隆
+- 高级模式：对训练管道进行精细控制
+- 通过服务器发送事件（SSE）进行实时进度跟踪
+- 多语言支持（中文、英文、日文）
+- 支持 GPU 加速的 CUDA
+---
+## 快速开始
+通过以下基本步骤，在 5 分钟内启动并运行：
+**1. 安装 uv**（Python 包管理器）：
+```bash
+# macOS/Linux
+curl -LsSf https://astral.sh/uv/install.sh | sh
+# Windows
+powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
+```
+**2. 设置 Python 环境**：
+```bash
+cd GPT-SoVITS
+uv sync  # 创建 .venv 并安装所有依赖项
+source .venv/bin/activate  # macOS/Linux
+# 或: .venv\Scripts\activate  # Windows
+```
+**3. 下载必需的模型**（详见 3.3 节）：
+```bash
+# 下载并解压 NLTK 数据、预训练模型等
+# 或使用前端应用自动下载
+```
+**4. 启动后端 API**：
+```bash
+cd api_server
+python app/main.py
+# API 运行在 http://localhost:8000
+```
+**5. 启动前端应用**（在新终端中）：
+```bash
+cd tts-voice-app
+npm install
+npm run dev
+```
+大功告成！Electron 应用将引导您完成模型设置和声音克隆。
+有关详细的安装说明、平台特定注意事项和配置选项，请继续阅读以下内容。
+---
+## 系统要求
+### 软件要求
+| 组件 | 版本 | 说明 |
+|-----------|---------|-------|
+| **Python** | 3.10 - 3.12 | 推荐 Python 3.11 |
+| **Node.js** | >= 18.x | 用于前端开发 |
+| **uv** | 最新版 | Python 包管理器 |
+| **CUDA** | 12.6 或 12.8 | 可选，用于 GPU 加速 |
+### 硬件要求
+| 组件 | 最低配置 | 推荐配置 |
+|-----------|---------|-------------|
+| **CPU** | 双核 | 四核或更好 |
+| **RAM** | 16 GB | 32 GB（用于训练） |
+| **GPU** | 无（CPU 模式） | 配备 6GB+ 显存的 NVIDIA GPU |
+| **存储** | 20 GB 可用空间 | 50 GB+（用于多个声音） |
+**GPU 说明**：
+- GPU 是可选的，但可显著加快训练速度（5-10 倍）
+- 推荐使用支持 CUDA 12.6 或 12.8 的 NVIDIA GPU
+- 目前不支持 AMD GPU 和 Apple Silicon 进行训练
+---
+## 安装指南
+### 3.1 安装 uv 包管理器
+uv 是一个快速的 Python 包安装器和解析器，可以替代 pip。
+**macOS / Linux**：
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```
+**Windows**（PowerShell）：
+```powershell
+powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
+```
+验证安装：
+```bash
+uv --version
+```
+### 3.2 Python 环境设置
+该项目使用 `uv` 进行依赖管理，配置文件为 `pyproject.toml`。设置过程简化为一个命令。
+**步骤 1：进入项目目录**
+```bash
+cd GPT-SoVITS
+```
+**步骤 2：同步所有依赖项**
+```bash
+# 这个命令将：
+# - 创建虚拟环境（.venv）
+# - 安装 Python 3.11（或您指定的版本）
+# - 从 pyproject.toml 安装所有依赖项
+# - 为您的平台安装正确的 PyTorch 版本
+uv sync
+```
+**步骤 3：激活环境**
+macOS / Linux：
+```bash
+source .venv/bin/activate
+```
+Windows：
+```cmd
+.venv\Scripts\activate
+```
+您应该在终端提示符中看到 `(.venv)` 前缀。
+**平台特定的 PyTorch 安装工作原理**：
+`pyproject.toml` 会自动选择适当的 PyTorch 版本：
+- **macOS**：安装仅 CPU 的 PyTorch（Apple Silicon 使�� CPU 模式）
+- **Linux**：默认安装 CUDA 12.6 PyTorch
+- **Windows**：需要手动选择 CUDA 版本（见下文）
+**Windows 用户 - 选择 CUDA 版本**：
+对于 Windows，您需要明确指定 PyTorch 索引：
+**CUDA 12.6**（默认）：
+```bash
+uv sync
+```
+**CUDA 12.8**：
+```bash
+uv sync --index pytorch-cu128
+```
+**仅 CPU**（无 GPU）：
+```bash
+uv sync --index pytorch-cpu
+```
+**验证安装**：
+```bash
+# 检查 Python 版本
+python --version  # 应显示 Python 3.11.x
+# 检查 PyTorch 安装
+python -c "import torch; print(f'PyTorch: {torch.__version__}')"
+# 检查 CUDA 可用性（如果您有 GPU）
+python -c "import torch; print(f'CUDA 可用: {torch.cuda.is_available()}')"
+```
+### 3.3 下载必需的数据文件
+以下数据文件是文本处理和声音训练所必需的。
+#### NLTK 数据（文本处理必需）
+NLTK（自然语言工具包）数据用于文本分词和处理。
+```bash
+# 从 ModelScope 下载
+wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/nltk_data.zip
+# 解压到 Python 环境
+unzip -q -o nltk_data.zip -d .venv/
+# 清理
+rm nltk_data.zip
+```
+**大小**：约 10 MB
+**时间**：< 1 分钟
+#### Open JTalk 词典（日语必需）
+Open JTalk 是日语文本转语音处理所必需的。
+```bash
+# 获取 pyopenjtalk 安装路径
+PYOPENJTALK_PATH=$(python -c "import os, pyopenjtalk; print(os.path.dirname(pyopenjtalk.__file__))")
+# 从 ModelScope 下载
+wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/open_jtalk_dic_utf_8-1.11.tar.gz
+# 解压到 pyopenjtalk 目录
+tar -xzf open_jtalk_dic_utf_8-1.11.tar.gz -C "$PYOPENJTALK_PATH"
+# 清理
+rm open_jtalk_dic_utf_8-1.11.tar.gz
+```
+**大小**：约 50 MB
+**时间**：< 2 分钟
+### 3.4 前端设置
+前端是使用 Vue.js 构建的 Electron 应用程序。
+```bash
+# 进入前端目录
+cd tts-voice-app
+# 安装 Node.js 依赖项
+npm install
+```
+**时间**：2-5 分钟
+**说明**：这会安装所有必需的 Node.js 包，包括 Electron、Vue 和 UI 组件。
+---
+## 配置
+### 4.1 后端 API 配置
+后端使用环境变量进行配置。在项目根目录创建 `.env` 文件以进行自定义设置。
+**创建 `.env` 文件**（可选，默认值适用于本地开发）：
+```bash
+# 部署模式
+# 选项：local, server
+DEPLOYMENT_MODE=local
+# API 服务器设置
+API_HOST=0.0.0.0
+API_PORT=8000
+# 数据存储路径
+DATA_DIR=~/.moyoyo-tts/data
+SQLITE_PATH=~/.moyoyo-tts/data/tasks.db
+# 训练设置
+LOCAL_MAX_WORKERS=1  # 并发训练任务数
+```
+**配置选项**：
+| 变量 | 默认值 | 说明 |
+|----------|---------|-------------|
+| `DEPLOYMENT_MODE` | `local` | 部署环境（local/server） |
+| `API_HOST` | `0.0.0.0` | API 服务器绑定地址 |
+| `API_PORT` | `8000` | API 服务器端口 |
+| `DATA_DIR` | `~/.moyoyo-tts/data` | 数据存储目录 |
+| `SQLITE_PATH` | `~/.moyoyo-tts/data/tasks.db` | SQLite 数据库路径 |
+| `LOCAL_MAX_WORKERS` | `1` | 最大并发训练任务数 |
+**说明**：
+- `API_HOST=0.0.0.0` 允许来自任何网络接口的连接
+- `LOCAL_MAX_WORKERS=1` 防止内存有限的系统出现内存问题
+- 在高端系统上增加 `LOCAL_MAX_WORKERS` 以同时训练多个声音
+### 4.2 前端配置
+前端在本地开发时只需要最少的配置。
+**默认设置**：
+- **API 端点**：`http://localhost:8000`
+- **声音存储**：`~/.moyoyo-tts/voices/`
+- **模型存储**：`GPT_SoVITS/pretrained_models/`
+**自动配置**：
+Electron 应用将：
+1. 自动检测并连接到本地 API 服务器
+2. 首次启动时创建所需的目录
+3. 通过模型设置页面下载缺失的模型
+标准使用无需手动配置。
+---
+## 运行应用
+### 5.1 启动后端 API 服务器
+**步骤 1：激活 Python 环境**
+```bash
+# 进入项目目录
+cd GPT-SoVITS
+# 激活虚拟环境
+source .venv/bin/activate  # macOS/Linux
+.venv\Scripts\activate     # Windows
+```
+**步骤 2：启动 API 服务器**
+方法 1 - 使用主脚本：
+```bash
+cd api_server
+python app/main.py
+```
+方法 2 - 直接使用 uvicorn：
+```bash
+uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
+```
+**预期输出**：
+```
+INFO:     Started server process [12345]
+INFO:     Waiting for application startup.
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
+```
+**API 文档**：
+服务器运行后，访问交互式 API 文档：
+- **Swagger UI**：http://localhost:8000/docs
+- **ReDoc**：http://localhost:8000/redoc
+- **OpenAPI JSON**：http://localhost:8000/openapi.json
+**健康检查**：
+```bash
+curl http://localhost:8000/health
+# 预期输出：{"status": "healthy"}
+```
+### 5.2 启动前端 Electron 应用
+**步骤 1：打开新终端**
+保持后端服务器运行，打开一个新的终端窗口。
+**步骤 2：进入前端目录**
+```bash
+cd tts-voice-app
+```
+**步骤 3：启动开发模式**
+```bash
+npm run dev
+```
+**预期输出**：
+```
+> tts-voice-app@1.0.0 dev
+> electron-vite dev
+  VITE v4.x.x  ready in xxx ms
+  ➜  Local:   http://localhost:5173/
+  ➜  Network: use --host to expose
+Electron app starting...
+```
+Electron 应用将自动启动，开发模式下启用热重载。
+**开发模式功能**：
+- 热模块替换（HMR）实现即时 UI 更新
+- Vue DevTools 集成
+- 用于调试的控制台日志记录
+- 主进程更改时自动重启
+---
+## 使用指南
+### 6.1 首次设置
+首次启动 Electron 应用时，您需要下载必需的模型。
+**设置流程**：
+1. **启动 Electron 应用**
+   ```bash
+   cd tts-voice-app
+   npm run dev
+   ```
+2. **模型设置页面**
+   - 应用自动检测缺失的模型
+   - 您将被重定向到模型设置页面
+3. **下载模型**
+   - 点击"下载所有模型"按钮
+   - 要下载的模型：
+     - **预训练模型**：4.56 GB
+     - **G2PW 模型**：588.86 MB
+     - **FunASR**：1.09 GB
+     - **Faster Whisper**：2.85 GB
+   - 总下载大小：约 9 GB
+4. **监控进度**
+   - 实时进度条显示下载状态
+   - 预计时间：10-30 分钟（取决于连接速度）
+   - 下载可以暂停和恢复
+5. **设置完成**
+   - 所有模型下载完成后，点击"继续"
+   - 您将被重定向到主 TTS 页面
+   - 应用现在可以使用了
+**故障排除**：
+- 如果下载失败，请检查您的互联网连接
+- 确认您有约 10 GB 的可用磁盘空间
+- 如需手动安装，请参见 3.3 节
+### 6.2 快速模式 - 初学者声音克隆
+快速模式为想要快速创建声音克隆的用户提供了简化的工作流程，无需技术知识。
+#### 使用 API
+**步骤 1：上传音频文件**
+```bash
+curl -X POST http://localhost:8000/api/v1/files \
+  -F "file=@path/to/voice_sample.wav" \
+  -F "purpose=training"
+```
+**响应**：
+```json
+{
+  "file_id": "550e8400-e29b-41d4-a716-446655440000",
+  "filename": "voice_sample.wav",
+  "size": 1234567,
+  "purpose": "training"
+}
+```
+**步骤 2：创建训练任务**
+```bash
+curl -X POST http://localhost:8000/api/v1/tasks \
+  -H "Content-Type: application/json" \
+  -d '{
+    "exp_name": "my_voice",
+    "audio_file_id": "550e8400-e29b-41d4-a716-446655440000",
+    "options": {
+      "version": "v2",
+      "language": "zh",
+      "quality": "standard"
+    }
+  }'
+```
+**响应**：
+```json
+{
+  "id": "task-uuid-here",
+  "status": "queued",
+  "exp_name": "my_voice",
+  "created_at": "2026-01-23T10:30:00Z"
+}
+```
+**步骤 3：监控进度**
+使用服务器发送事件（SSE）：
+```bash
+curl -N http://localhost:8000/api/v1/tasks/task-uuid-here/progress
+```
+**进度事件**：
+```
+event: progress
+data: {"stage": "audio_slice", "progress": 25, "message": "切片音频中..."}
+event: progress
+data: {"stage": "sovits_train", "progress": 50, "message": "训练 SoVITS 模型中..."}
+event: complete
+data: {"status": "completed", "voice_id": "voice-uuid-here"}
+```
+#### 质量预设
+| 预设 | SoVITS 轮数 | GPT 轮数 | 预计时间 | 质量 |
+|--------|---------------|------------|-----------|---------|
+| **fast** | 4 | 8 | 约 10 分钟 | 适合测试 |
+| **standard** | 8 | 15 | 约 20 分钟 | 平衡质量/速度 |
+| **high** | 16 | 30 | 约 40 分钟 | 最佳质量 |
+**建议**：
+- 使用 `fast` 进行快速测试和预览
+- 使用 `standard` 用于大多数生产用例
+- 使用 `high` 用于需要最高质量的专业应用
+#### 使用 UI
+**步骤 1：进入声音克隆页面**
+- 点击侧边栏中的"声音克隆"
+- 或使用键盘快捷键：`Ctrl/Cmd + N`
+**步骤 2：上传音频样本**
+- 点击"上传音频"按钮
+- 选择 WAV 或 MP3 文件
+- **要求**：
+  - 时长：推荐 5-30 秒
+  - 质量：清晰的声音，最少的背景噪音
+  - 内容：自然的讲话，不是唱歌或喊叫
+**步骤 3：配置训练**
+- **声音名称**：输入唯一名称（例如，"张三的声音"）
+- **语言**：选择主要语言（中文、英文、日文）
+- **质量预设**：从 fast/standard/high 中选择
+**步骤 4：开始训练**
+- 点击"开始训练"按钮
+- 任务将被排队，处理将开始
+**步骤 5：监控进度**
+- 进度条显示整体完成情况
+- 显示当前阶段（例如，"训练 SoVITS 模型中..."）
+- 显示预计剩余时间
+- 您可以导航离开并稍后查看
+**步骤 6：训练完成**
+- 完成后您将收到通知
+- 声音自动出现在声音库中
+- 您可以立即使用它进行 TTS 生成
+**获得最佳效果的提示**：
+- 使用高质量音频（最好是 48kHz WAV）
+- 确保音调和说话风格一致
+- 避免带有音乐或声音效果的音频
+- 10-15 秒是样本长度的最佳选择
+- 可以组合多个短样本
+### 6.3 高级模式 - 专家声音克隆
+高级模式提供对声音训练管道每个阶段的精细控制。建议想要微调训练参数的用户使用。
+#### 训练管道阶���
+完整的训练管道包含 7 个阶段：
+1. **Audio Slice**（音频切片）：将音频分割成片段
+2. **ASR**（自动语音识别）：将音频转录为文本
+3. **Text Feature**（文本特征）：提取文本嵌入
+4. **Hubert Feature**（Hubert 特征）：提取音频特征
+5. **Semantic Token**（语义标记）：生成语义标记
+6. **SoVITS Train**（SoVITS 训练）：训练声音合成模型
+7. **GPT Train**（GPT 训练）：训练文本到语义模型
+#### 阶段依赖关系
+```
+audio_slice → asr → text_feature → sovits_train
+            ↘                    ↗
+              hubert_feature → semantic_token → gpt_train
+```
+**重要**：每个阶段必须等待其依赖项完成。
+#### 使用 API
+**步骤 1：创建实验**
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments \
+  -H "Content-Type: application/json" \
+  -d '{
+    "exp_name": "my_custom_voice",
+    "version": "v2",
+    "audio_file_id": "file-uuid-here"
+  }'
+```
+**响应**：
+```json
+{
+  "id": "exp-uuid-here",
+  "exp_name": "my_custom_voice",
+  "version": "v2",
+  "stages": {
+    "audio_slice": {"status": "pending"},
+    "asr": {"status": "pending"},
+    "text_feature": {"status": "pending"},
+    "hubert_feature": {"status": "pending"},
+    "semantic_token": {"status": "pending"},
+    "sovits_train": {"status": "pending"},
+    "gpt_train": {"status": "pending"}
+  }
+}
+```
+**步骤 2：单独执行阶段**
+**阶段 1 - 音频切片**：
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/audio_slice \
+  -H "Content-Type: application/json" \
+  -d '{
+    "threshold": -34,
+    "min_length": 4000,
+    "min_interval": 300,
+    "hop_size": 10,
+    "max_silence_kept": 500
+  }'
+```
+**参数**：
+- `threshold`：静音检测的 dB 阈值（-60 到 0，默认：-34）
+- `min_length`：最小片段长度（毫秒）（1000-10000，默认：4000）
+- `min_interval`：最小静音间隔（毫秒）（0-3000，默认：300）
+- `hop_size`：分析窗口跳跃大小（毫秒）（默认：10）
+- `max_silence_kept`：要保留的最大静音（毫秒）（默认：500）
+**阶段 2 - ASR**：
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/asr \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "达摩 ASR (中文)",
+    "language": "zh"
+  }'
+```
+**ASR 模型**：
+- `达摩 ASR (中文)`：用于中文的 DamoASR（最适合中文）
+- `Faster Whisper (多语言)`：用于多语言的 Faster Whisper
+**阶段 3 - 文本特征**：
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/text_feature \
+  -H "Content-Type: application/json" \
+  -d '{
+    "language": "zh"
+  }'
+```
+**阶段 4 - Hubert 特征**：
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/hubert_feature \
+  -H "Content-Type: application/json" \
+  -d '{}'
+```
+**阶段 5 - 语义标记**：
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/semantic_token \
+  -H "Content-Type: application/json" \
+  -d '{}'
+```
+**阶段 6 - SoVITS 训练**：
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/sovits_train \
+  -H "Content-Type: application/json" \
+  -d '{
+    "total_epoch": 8,
+    "batch_size": 4,
+    "save_every_epoch": 4,
+    "text_low_lr_rate": 0.4,
+    "if_save_latest": true,
+    "if_save_every_weights": true,
+    "version": "v2"
+  }'
+```
+**参数**：
+- `total_epoch`：总训练轮数（4-32，默认：8）
+- `batch_size`：批次大小（1-40，默认：4）
+- `save_every_epoch`：每 N 轮保存检查点（1-50，默认：4）
+- `text_low_lr_rate`：文本编码器学习率乘数（0.2-1.0，默认：0.4）
+**阶段 7 - GPT 训练**：
+```bash
+curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/gpt_train \
+  -H "Content-Type: application/json" \
+  -d '{
+    "total_epoch": 15,
+    "batch_size": 4,
+    "save_every_epoch": 5,
+    "if_save_latest": true,
+    "if_save_every_weights": true,
+    "version": "v2"
+  }'
+```
+**步骤 3：监控阶段进度**
+每个阶段通过 SSE 提供实时进度：
+```bash
+curl -N http://localhost:8000/api/v1/experiments/exp-uuid/stages/sovits_train/progress
+```
+**进度事件**：
+```
+event: progress
+data: {"epoch": 2, "total_epochs": 8, "progress": 25, "loss": 0.234}
+event: progress
+data: {"epoch": 4, "total_epochs": 8, "progress": 50, "loss": 0.189}
+event: complete
+data: {"status": "completed", "final_loss": 0.142}
+```
+#### 使用 UI
+**步骤 1：创建新实验**
+- 进入"高级模式"页面
+- 点击"新建实验"
+- 输入实验名称并上传音频
+**步骤 2：配置每个阶段**
+- 点击阶段卡以展开设置
+- 调整参数（或使用预设默认值）
+- 点击"运行阶段"执行
+**步骤 3：监控管道**
+- 可视化管道图显示阶段状态
+- 绿色：已完成，蓝色：运行中，灰色：待处理
+- 点击任何阶段��看详细日志
+**步骤 4：迭代和优化**
+- 每个阶段后检查结果
+- 如需要可调整参数并重新运行
+- 满意时导出最终模型
+**高级提示**：
+- 在内存有限的 GPU 上使用较低的 `batch_size`（2-4）
+- 对于有足够数据的更好质量，增加 `total_epoch`
+- 频繁保存检查点（`save_every_epoch`）以从中断中恢复
+- 监控损失值 - 应该随着轮数递减
+### 6.4 文本转语音生成
+训练好声音后，您可以使用它从文本生成语音。
+#### 使用 API
+**基本 TTS 请求**：
+```bash
+curl -X POST http://localhost:8000/api/v1/inference/tts \
+  -H "Content-Type: application/json" \
+  -d '{
+    "text": "你好，这是文本转语音合成的测试。",
+    "voice_id": "voice-uuid-here",
+    "speed": 1.0,
+    "emotion": "auto"
+  }'
+```
+**响应**：
+```json
+{
+  "audio_url": "http://localhost:8000/api/v1/files/audio-uuid-here",
+  "duration": 3.2,
+  "format": "wav"
+}
+```
+**参数**：
+- `text`（必需）：要合成的文本（最多 5000 个字符）
+- `voice_id`（必需）：训练好的声音的 UUID
+- `speed`（可选）：说话速度乘数（0.5 - 2.0，默认：1.0）
+- `emotion`（可选）：情感风格（auto、neutral、happy、sad）
+- `seed`（可选）：用于可重复性的随机种子
+**下载生成的音频**：
+```bash
+curl -o output.wav http://localhost:8000/api/v1/files/audio-uuid-here
+```
+#### 使用 UI
+**步骤 1：进入 TTS 页面**
+- 点击侧边栏中的"文本转语音"
+- 或使用键盘快捷键：`Ctrl/Cmd + T`
+**步骤 2：选择声音**
+- 打开声音下拉菜单
+- 从列表中选择训练好的声音
+- 预览按钮可让您听到样本
+**步骤 3：输入文本**
+- 在文本区域中输入或粘贴文本
+- 显示字符计数（最多 5000）
+- 支持多行文本
+**步骤 4：调整设置**
+- **速度**：拖动滑块或输入值（0.5x - 2.0x）
+  - 0.5x：非常慢，清晰的发音
+  - 1.0x：自然的说话节奏
+  - 1.5x：快速，仍然清晰
+  - 2.0x：非常快
+- **情感**：从下拉菜单中选择（如果模型支持）
+  - Auto：从文本推断
+  - Neutral：平坦、事实性的表达
+  - Happy：积极向上的语气
+  - Sad：忧郁、哀伤的语气
+**步骤 5：生成**
+- 点击"生成"按钮
+- 处理需要 2-5 秒
+- 显示进度指示器
+**步骤 6：收听和下载**
+- 音频播放器自动出现
+- 点击播放按钮收听
+- 点击下载按钮保存 WAV 文件
+- 分享按钮复制可分享链接
+**文本指南**：
+- 使用适当的标点符号进行自然停顿
+- 将长文本分成句子
+- 对话使用引号
+- 全大写用于强调（谨慎使用）
+**自然语音提示**：
+- 添加逗号进行呼吸停顿
+- 使用省略号（...）进行尾音
+- 问号影响语调
+- 感叹号增加强调
+### 6.5 声音库管理
+声音库是存储和管理所有训练声音的地方。
+#### 使用 API
+**列出所有声音**：
+```bash
+curl http://localhost:8000/api/v1/files?purpose=training
+```
+**响应**：
+```json
+{
+  "files": [
+    {
+      "id": "voice-uuid-1",
+      "filename": "john_voice",
+      "created_at": "2026-01-20T10:30:00Z",
+      "size": 1234567,
+      "metadata": {
+        "language": "zh",
+        "quality": "standard",
+        "duration": 12.5
+      }
+    },
+    {
+      "id": "voice-uuid-2",
+      "filename": "mary_voice",
+      "created_at": "2026-01-21T14:20:00Z",
+      "size": 2345678,
+      "metadata": {
+        "language": "en",
+        "quality": "high",
+        "duration": 18.3
+      }
+    }
+  ]
+}
+```
+**获取声音详情**：
+```bash
+curl http://localhost:8000/api/v1/files/voice-uuid-1
+```
+**删除声音**：
+```bash
+curl -X DELETE http://localhost:8000/api/v1/files/voice-uuid-1
+```
+**导出声音模型**：
+```bash
+curl -o voice_model.zip http://localhost:8000/api/v1/voices/voice-uuid-1/export
+```
+#### 使用 UI
+**浏览声音库**：
+- 进入"声音库"页面
+- 声音显示为带有以下内容的卡片：
+  - 声音名称
+  - 语言和质量徽章
+  - 创建日期
+  - 样本持续时间
+  - 预览波形
+**声音卡操作**：
+- **播放**：收听声音样本
+- **编辑**：重命名或更新元数据
+- **导出**：下载声音模型文件
+- **删除**：删除声音（带确认）
+**搜索和筛选**：
+- 搜索栏：按声音名称筛选
+- 语言筛选：仅显示特定语言
+- 质量筛选：仅显示特定质量预设
+- 排序选项：
+  - 名称（A-Z）
+  - 创建日期（最新在前）
+  - 创建日期（最旧在前）
+  - 文件大小
+**批量操作**：
+- 选择多个声音（Shift+点击）
+- 将选定的声音导出为 ZIP
+- 删除选定的声音
+- 标记选定的声音
+**声音详情面板**：
+点击任何声音卡查看：
+- 完整的训练参数
+- 训练历史和日志
+- 模型文件大小
+- 样本音频片段
+- 导出和分享选项
+**组织提示**：
+- 使用描述性名称（例如，"张三_专业"、"李四_休闲"）
+- 按项目或用例标记声音
+- 导出重要的声音作为备份
+- 删除测试声音以节省空间
+---
+## API 参考
+### 快速模式端点
+#### 任务
+**创建任务** - 启动一键式声音训练任务
+```http
+POST /api/v1/tasks
+Content-Type: application/json
+{
+  "exp_name": "string",
+  "audio_file_id": "uuid",
+  "options": {
+    "version": "v2",
+    "language": "zh|en|ja",
+    "quality": "fast|standard|high"
+  }
+}
+```
+**列出任务** - 获取所有任务
+```http
+GET /api/v1/tasks?status=queued|running|completed|failed
+```
+**获取任务** - 获取特定任务详情
+```http
+GET /api/v1/tasks/{task_id}
+```
+**取消任务** - 取消正在运行的任务
+```http
+DELETE /api/v1/tasks/{task_id}
+```
+**任务进度** - 通过 SSE 实时进度
+```http
+GET /api/v1/tasks/{task_id}/progress
+Accept: text/event-stream
+```
+### 高级模式端点
+#### 实验
+**创建实验** - 初始化新的训练实验
+```http
+POST /api/v1/experiments
+Content-Type: application/json
+{
+  "exp_name": "string",
+  "version": "v2",
+  "audio_file_id": "uuid"
+}
+```
+**获取实验** - 获取实验详情
+```http
+GET /api/v1/experiments/{exp_id}
+```
+**列出实验** - 获取所有实验
+```http
+GET /api/v1/experiments?status=pending|running|completed
+```
+**删除实验** - 删除实验和所有数据
+```http
+DELETE /api/v1/experiments/{exp_id}
+```
+#### 阶段
+**执行阶段** - 运行特定的管道阶段
+```http
+POST /api/v1/experiments/{exp_id}/stages/{stage_type}
+Content-Type: application/json
+{
+  // 阶段特定参数
+}
+```
+**阶段类型**：
+- `audio_slice`
+- `asr`
+- `text_feature`
+- `hubert_feature`
+- `semantic_token`
+- `sovits_train`
+- `gpt_train`
+**获取阶段状态** - 获取特定阶段的状态
+```http
+GET /api/v1/experiments/{exp_id}/stages/{stage_type}
+```
+**获取所有阶段状态** - 获取所有阶段的状态
+```http
+GET /api/v1/experiments/{exp_id}/stages
+```
+**阶段进度** - 通过 SSE 实时阶段进度
+```http
+GET /api/v1/experiments/{exp_id}/stages/{stage_type}/progress
+Accept: text/event-stream
+```
+**获取阶段架构** - 获取阶段的参数架构
+```http
+GET /api/v1/stages/{stage_type}/schema
+```
+### 通用端点
+#### 文件
+**上传文件** - 上传音频或数据文件
+```http
+POST /api/v1/files
+Content-Type: multipart/form-data
+file: binary
+purpose: training|inference
+```
+**列出文件** - 获取所有上传的文件
+```http
+GET /api/v1/files?purpose=training|inference
+```
+**获取文件** - 下载特定文件
+```http
+GET /api/v1/files/{file_id}
+```
+**删除文件** - 删除文件
+```http
+DELETE /api/v1/files/{file_id}
+```
+#### 推理
+**文本转语音** - 从文本生成语音
+```http
+POST /api/v1/inference/tts
+Content-Type: application/json
+{
+  "text": "string",
+  "voice_id": "uuid",
+  "speed": 1.0,
+  "emotion": "auto|neutral|happy|sad",
+  "seed": 42
+}
+```
+**获取声音信息** - 获取声音模型信息
+```http
+GET /api/v1/voices/{voice_id}
+```
+#### 配置
+**获取阶段预设** - 获取阶段的预设配置
+```http
+GET /api/v1/stages/presets
+```
+**健康检查** - 检查 API 服务器健康状况
+```http
+GET /health
+```
+**完整的 OpenAPI 规范可在以下位置获得**：http://localhost:8000/openapi.json
+---
+## 故障排除
+### 后端问题
+#### 端口已被占用
+**症状**：启动服务器时出现 `Address already in use` 错误消息。
+**解决方案 1** - 在 `.env` 中更改端口：
+```bash
+echo "API_PORT=8001" >> .env
+python app/main.py
+```
+**解决方案 2** - 查找并终止使用端口的进程：
+```bash
+# macOS/Linux
+lsof -ti:8000 | xargs kill -9
+# Windows
+netstat -ano | findstr :8000
+taskkill /PID <pid> /F
+```
+#### 数据库错误
+**症状**：`sqlite3.OperationalError` 或数据库损坏消息。
+**解决方案** - 重置数据库：
+```bash
+# 备份现有数据库（可选）
+cp ~/.moyoyo-tts/data/tasks.db ~/.moyoyo-tts/data/tasks.db.backup
+# 删除损坏的数据库
+rm ~/.moyoyo-tts/data/tasks.db
+# 重启 API 服务器（数据库将被重新创建）
+python app/main.py
+```
+#### 训练立即失败
+**症状**：训练开始但在几秒钟内失败。
+**诊断**：
+```bash
+# 检查 GPU 可用性
+python -c "import torch; print(torch.cuda.is_available())"
+# 检查 CUDA 版本
+python -c "import torch; print(torch.version.cuda)"
+# 检查磁盘空间
+df -h
+```
+**解决方案**：
+1. **无 GPU**：系统将使用 CPU（较慢但有效）
+2. **CUDA 不匹配**：使用正确的 CUDA 版本重新安装 PyTorch：
+   ```bash
+   # 对于 CUDA 12.6
+   uv sync --reinstall-package torch --reinstall-package torchaudio
+   # 对于 CUDA 12.8（Windows）
+   uv sync --reinstall-package torch --reinstall-package torchaudio --index pytorch-cu128
+   # 仅 CPU
+   uv sync --reinstall-package torch --reinstall-package torchaudio --index pytorch-cpu
+   ```
+3. **磁盘空间不足**：至少释放 10GB
+4. **内存不足**：在��练参数中减少 `batch_size`
+#### Python 环境问题
+**症状**：`ModuleNotFoundError` 或导入错误。
+**解决方案**：
+```bash
+# 验证环境已激活
+which python  # 应显示 .venv 中的路径
+# 重新安装所有依赖项
+uv sync --reinstall
+# 或从头强制重新安装
+rm -rf .venv
+uv sync
+# 检查缺失的包
+uv pip list
+```
+### 前端问题
+#### 无法连接到 API
+**症状**：前端显示"无法连接到服务器"错误。
+**诊断**：
+```bash
+# 检查后端是否正在运行
+curl http://localhost:8000/health
+# 检查网络连接
+ping localhost
+```
+**解决方案**：
+1. **后端未运行**：启动后端服务器（参见 5.1 节）
+2. **错误的端口**：检查后端是否在端口 8000 上
+3. **防火墙**：允许连接到 localhost:8000
+4. **CORS 错误**：检查后端 `.env` 中的 CORS 设置
+#### 模型未下载
+**症状**：模型下载失败或无限期挂起。
+**解决方案**：
+1. **检查互联网连接**：
+   ```bash
+   curl -I https://www.modelscope.cn
+   ```
+2. **检查磁盘空间**：
+   ```bash
+   df -h  # 需要约 10GB 可用空间
+   ```
+3. **手动下载**：参见 3.3 节进行手动安装
+4. **代理问题**：配置代理设置：
+   ```bash
+   export http_proxy=http://proxy.example.com:8080
+   export https_proxy=http://proxy.example.com:8080
+   ```
+#### Electron 应用无法启动
+**症状**：应用启动时崩溃或显示空白屏幕。
+**解决方案 1** - 清除缓存并重建：
+```bash
+# 进入前端目录
+cd tts-voice-app
+# 清除缓存
+rm -rf node_modules package-lock.json dist .vite
+# 重新安装依赖项
+npm install
+# 重建
+npm run dev
+```
+**解决方案 2** - 检查 Node.js 版本：
+```bash
+node --version  # 应该是 >= 18.x
+# 如需更新 Node.js
+nvm install 18
+nvm use 18
+```
+**解决方案 3** - 检查 Electron 日志：
+```bash
+# macOS
+~/Library/Logs/tts-voice-app/
+# Linux
+~/.config/tts-voice-app/logs/
+# Windows
+%APPDATA%\tts-voice-app\logs\
+```
+### 常见错误
+#### "PYTHONPATH not set" 错误
+**症状**：与 `GPT_SoVITS` 模块相关的导入错误。
+**原因**：API 服务器需要找到主项目目录。
+**解决方案**：API 自动设置 `PYTHONPATH`，但请验证：
+```bash
+# 检查项目结构
+ls GPT-SoVITS/  # 应包含 *.py 文件
+# 如需手动设置
+export PYTHONPATH=/Users/coldish/workspace/GPT-SoVITS:$PYTHONPATH
+```
+#### "Model not found" 错误
+**症状**：训练失败并显示"找不到预训练模型"消息。
+**诊断**：
+```bash
+# 检查模型是否存在
+ls GPT_SoVITS/pretrained_models/
+# 应显示：s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt, s2G488k.pth, s2D488k.pth
+```
+**解决方案**：下载预训练模型（参见 3.3 节）：
+```bash
+wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/pretrained_models.zip
+unzip -q -o pretrained_models.zip -d GPT_SoVITS
+```
+#### "Out of memory" 错误
+**症状**：训练崩溃并显示 `CUDA out of memory` 或 `MemoryError`。
+**解决方案**：
+1. **减小批次大小**：
+   ```json
+   {
+     "batch_size": 2  // 从 4 减少到 2
+   }
+   ```
+2. **关闭其他应用程序**：释放 GPU/RAM
+3. **使用 CPU 模式**：较慢但使用系统 RAM 而不是 GPU：
+   ```bash
+   # 设置环境变量
+   export CUDA_VISIBLE_DEVICES=""
+   python app/main.py
+   ```
+4. **增加系统交换空间**（Linux）：
+   ```bash
+   sudo dd if=/dev/zero of=/swapfile bs=1G count=8
+   sudo mkswap /swapfile
+   sudo swapon /swapfile
+   ```
+#### "NLTK Data Not Found" 错误
+**症状**：文本处理失败并显示 NLTK 数据错误。
+**解决方案**：下载 NLTK 数据（参见 3.3 节）：
+```bash
+wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/nltk_data.zip
+unzip -q -o nltk_data.zip -d .venv/
+```
+#### 音频质量问题
+**症状**：生成的音频听起来像机器人、失真或不清楚。
+**解决方案**：
+1. **使用更好的训练数据**：
+   - 高质量音频（首选 48kHz WAV）
+   - 清晰的声音，最少的背景噪音
+   - 10-15 秒的音频
+   - 自然、对话式的讲话
+2. **提高训练质量**：
+   ```json
+   {
+     "quality": "high"  // 使用 high 而不是 standard
+   }
+   ```
+3. **训练更长时间**：
+   ```json
+   {
+     "total_epoch": 16  // 从 8 增加到 16
+   }
+   ```
+4. **检查参考音频**：确保上传的音频未损坏
+---
+## 开发
+### 后端开发
+#### 使用热重载运行
+热重载在检测到代码更改时自动重启服务器：
+```bash
+# 使用 uvicorn
+uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
+# 使用自定义重载目录
+uvicorn app.main:app --reload --reload-dir api_server/app
+```
+#### 运行测试
+```bash
+# 进入项目根目录
+cd GPT-SoVITS
+# 运行所有测试
+pytest api_server/tests/
+# 运行特定测试文件
+pytest api_server/tests/test_tasks.py
+# 使用覆盖���报告运行
+pytest --cov=api_server/app --cov-report=html
+# 查看覆盖率报告
+open htmlcov/index.html
+```
+#### 代码格式化
+```bash
+# 使用 Black 格式化 Python 代码
+black api_server/
+# 使用 isort 排序导入
+isort api_server/
+# 使用 flake8 进行代码检查
+flake8 api_server/
+# 使用 mypy 进行类型检查
+mypy api_server/
+```
+#### 数据库迁移
+```bash
+# 生成迁移
+alembic revision --autogenerate -m "Add new column"
+# 应用迁移
+alembic upgrade head
+# 回滚迁移
+alembic downgrade -1
+```
+#### 添加新端点
+1. 在 `api_server/app/routes/` 中创建路由
+2. 在 `api_server/app/services/` 中添加业务逻辑
+3. 在 `api_server/app/models/` 中更新模型
+4. 在 `api_server/tests/` 中添加测试
+5. 更新 OpenAPI 文档
+### 前端开发
+#### 开发模式
+开发模式启用热模块替换（HMR）以获得即时反馈：
+```bash
+# 启动开发服务器
+npm run dev
+# 使用自定义端口启动
+npm run dev -- --port 5174
+# 使用调试日志启动
+DEBUG=electron* npm run dev
+```
+#### 类型检查
+```bash
+# 运行 Vue 类型检查
+npm run type-check
+# 运行 TypeScript 编译器检查
+npx tsc --noEmit
+# 监视模式以进行连续检查
+npm run type-check -- --watch
+```
+#### 构建生产版本
+**开发构建**（带源映射）：
+```bash
+npm run build
+```
+**生产构建**（优化）：
+```bash
+npm run build:prod
+```
+**预览生产构建**：
+```bash
+npm run preview
+```
+#### 构建分发包
+构建特定于平台的安装程序：
+**macOS**：
+```bash
+npm run build:mac
+# 输出：tts-voice-app/release/MoYoYo-TTS-1.0.0.dmg
+```
+**Windows**：
+```bash
+npm run build:win
+# 输出：tts-voice-app/release/MoYoYo-TTS-Setup-1.0.0.exe
+```
+**Linux**：
+```bash
+npm run build:linux
+# 输出：tts-voice-app/release/moyoyo-tts-1.0.0.AppImage
+```
+**构建所有平台**（需要特定于平台的依赖项）：
+```bash
+npm run build:all
+```
+**构建配置**：
+编辑 `tts-voice-app/electron-builder.yml` 以自定义：
+- 应用名称和 ID
+- 图标文件
+- 文件关联
+- 自动更新设置
+- 代码签名
+#### 组件开发
+**创建新组件**：
+```bash
+# 进入组件目录
+cd tts-voice-app/src/components
+# 创建组件文件
+touch MyComponent.vue
+```
+**组件模板**：
+```vue
+<template>
+  <div class="my-component">
+    <!-- 模板在这里 -->
+  </div>
+</template>
+<script setup lang="ts">
+import { ref } from 'vue'
+// 组件逻辑在这里
+const myValue = ref('')
+</script>
+<style scoped>
+.my-component {
+  /* 样式在这里 */
+}
+</style>
+```
+#### 状态管理
+应用使用 Vue Composition API 和 Pinia stores：
+```typescript
+// 在 src/stores/myStore.ts 中创建新的 store
+import { defineStore } from 'pinia'
+export const useMyStore = defineStore('myStore', {
+  state: () => ({
+    items: []
+  }),
+  getters: {
+    itemCount: (state) => state.items.length
+  },
+  actions: {
+    addItem(item) {
+      this.items.push(item)
+    }
+  }
+})
+```
+#### 调试
+**Vue DevTools**：
+- 在开发模式下自动启用
+- 通过浏览器 DevTools 面板访问
+**Electron DevTools**：
+```bash
+# 启动时打开 DevTools
+DEBUG_ELECTRON=true npm run dev
+```
+**控制台日志记录**：
+```typescript
+// 主进程日志
+console.log('Main:', data)
+// 渲染进程日志
+console.log('Renderer:', data)
+// 在终端和 DevTools 控制台中检查日志
+```
+#### 测试
+```bash
+# 运行单元测试
+npm run test
+# 使用覆盖率运行
+npm run test:coverage
+# 运行 E2E 测试
+npm run test:e2e
+# 监视模式
+npm run test:watch
+```
+### 项目结构
+```
+GPT-SoVITS/
+├── api_server/              # 后端 API
+│   ├── app/
+│   │   ├── main.py         # FastAPI 应用
+│   │   ├── routes/         # API 端点
+│   │   ├── services/       # 业务逻辑
+│   │   ├── models/         # 数据模型
+│   │   └── utils/          # 实用工具
+│   └── tests/              # 后端测试
+├── tts-voice-app/          # 前端 Electron 应用
+│   ├── src/
+│   │   ├── main/           # Electron 主进程
+│   │   ├── renderer/       # Vue UI
+│   │   ├── components/     # Vue 组件
+│   │   └── stores/         # 状态管理
+│   └── dist/               # 构建输出
+├── GPT_SoVITS/             # 核心 ML 模型
+│   ├── pretrained_models/  # 基础模型
+│   └── text/               # 文本处理
+└── .env                    # 配置
+```
+### 贡献指南
+1. **Fork 并克隆仓库**
+2. **创建功能分支**：`git checkout -b feature/my-feature`
+3. **进行更改**并添加测试
+4. **运行测试和代码检查**：`pytest && black . && isort .`
+5. **提交更改**：`git commit -m "feat: add my feature"`
+6. **推送到分支**：`git push origin feature/my-feature`
+7. **创建 Pull Request**并附��描述
+**提交消息格式**：
+- `feat:`：新功能
+- `fix:`：错误修复
+- `docs:`：文档更改
+- `style:`：代码样式更改
+- `refactor:`：代码重构
+- `test:`：测试更改
+- `chore:`：构建/工具更改
+---
+## 其他资源
+### 文档
+- **API 文档**：http://localhost:8000/docs
+- **设计文档**：`frontend_design.md`
+- **开发指南**：`development.md`
+- **OpenAPI 规范**：`openapi.json`
+### 外部链接
+- **GPT-SoVITS 仓库**：https://github.com/RVC-Boss/GPT-SoVITS
+- **ModelScope 模型**：https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained
+- **FastAPI 文档**：https://fastapi.tiangolo.com
+- **Vue 3 文档**：https://cn.vuejs.org
+- **Electron 文档**：https://www.electronjs.org
+### 支持
+对于问题、疑问或功能请求：
+1. 首先查看本文档
+2. 搜索现有的 GitHub issues
+3. 创建包含详细描述的新 issue
+4. 包括错误消息、日志和系统信息
+### 许可证
+本项目根据 MIT 许可证授权。详见 `LICENSE` 文件。
+---
+**最后更新**：2026-01-23
+**版本**：1.0.0
+**维护者**：MoYoYo.tts 开发团队

development.md ADDED Viewed

The diff for this file is too large to render. See raw diff