Spaces:
Running
Running
| title: Expressive TTS Arena | |
| emoji: π€ | |
| colorFrom: indigo | |
| colorTo: pink | |
| sdk: docker | |
| app_file: src/main.py | |
| python_version: "3.11" | |
| pinned: true | |
| license: mit | |
| <div align="center"> | |
| <img src="https://storage.googleapis.com/hume-public-logos/hume/hume-banner.png"> | |
| <h1>Expressive TTS Arena</h1> | |
| <p> | |
| <strong> | |
| A web application for comparing and evaluating the expressiveness of different text-to-speech models | |
| </strong> | |
| </p> | |
| </div> | |
| ## Overview | |
| Expressive TTS Arena is an open-source web application for evaluating the expressiveness of voice generation and speech synthesis from different text-to-speech providers. | |
| For support or to join the conversation, visit our [Discord](https://discord.com/invite/humeai). | |
| ## Prerequisites | |
| - [Python >=3.11.11](https://www.python.org/downloads/) | |
| - [pip >=25.0](https://pypi.org/project/pip/) | |
| - [uv >=0.5.29](https://github.com/astral-sh/uv) | |
| - [Postgres](https://www.postgresql.org/download/) | |
| - API keys for Hume AI, Anthropic, OpenAI, and ElevenLabs | |
| ## Project Structure | |
| ``` | |
| Expressive TTS Arena/ | |
| βββ public/ | |
| βββ src/ | |
| β βββ common/ | |
| β β βββ __init__.py | |
| β β βββ common_types.py # Application-wide custom type aliases and definitions. | |
| β β βββ config.py # Manages application config (Singleton) loaded from env vars. | |
| β β βββ constants.py # Application-wide constant values. | |
| β β βββ utils.py # General-purpose utility functions used across modules. | |
| β βββ core/ | |
| β β βββ __init__.py | |
| β β βββ tts_service.py # Service handling Text-to-Speech provider selection and API calls. | |
| β β βββ voting_service.py # Service managing database operations for votes and leaderboards. | |
| β βββ database/ # Database access layer using SQLAlchemy. | |
| β β βββ __init__.py | |
| β β βββ crud.py # Data Access Objects (DAO) / CRUD operations for database models. | |
| β β βββ database.py # Database connection setup (engine, session management). | |
| β β βββ models.py # SQLAlchemy ORM models defining database tables. | |
| β βββ frontend/ | |
| β β βββ components/ | |
| β β β β βββ __init__.py | |
| β β β β βββ arena.py # UI definition and logic for the 'Arena' tab. | |
| β β β β βββ leaderboard.py # UI definition and logic for the 'Leaderboard' tab. | |
| β β βββ __init__.py | |
| β β βββ frontend.py # Main Gradio application class; orchestrates UI components and layout. | |
| β βββ integrations/ # Modules for interacting with external third-party APIs. | |
| β β βββ __init__.py | |
| β β βββ anthropic_api.py # Integration logic for the Anthropic API. | |
| β β βββ elevenlabs_api.py # Integration logic for the ElevenLabs API. | |
| β β βββ hume_api.py # Integration logic for the Hume API. | |
| β βββ middleware/ | |
| β β βββ __init__.py | |
| β β βββ meta_tag_injection.py # Middleware for injecting custom HTML meta tags into the Gradio page. | |
| β βββ scripts/ | |
| β β βββ __init__.py | |
| β β βββ init_db.py # Script to create database tables based on models. | |
| β β βββ test_db.py # Script for testing the database connection configuration. | |
| β βββ __init__.py | |
| β βββ main.py # Main script to configure and run the Gradio application. | |
| βββ static/ | |
| β βββ audio/ # Temporary storage for generated audio files served to the UI. | |
| β βββ css/ | |
| β β βββ styles.css # Custom CSS overrides and styling for the Gradio UI. | |
| βββ .dockerignore | |
| βββ .env.example | |
| βββ .gitignore | |
| βββ .pre-commit-config.yaml | |
| βββ Dockerfile | |
| βββ LICENSE.txt | |
| βββ pyproject.toml | |
| βββ README.md | |
| βββ uv.lock | |
| ``` | |
| ## Installation | |
| 1. This project uses the [uv](https://docs.astral.sh/uv/) package manager. Follow the installation instructions for your platform [here](https://docs.astral.sh/uv/getting-started/installation/). | |
| 2. Configure environment variables: | |
| - Create a `.env` file based on `.env.example` | |
| - Add your API keys: | |
| ```txt | |
| HUME_API_KEY=YOUR_HUME_API_KEY | |
| ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY | |
| ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY | |
| OPENAI_API_KEY=YOUR_OPENAI_API_KEY | |
| ``` | |
| 3. Run the application: | |
| Standard | |
| ```sh | |
| uv run python -m src.main | |
| ``` | |
| With hot-reloading | |
| ```sh | |
| uv run watchfiles "python -m src.main" src | |
| ``` | |
| 4. Test the application by navigating to the the localhost URL in your browser (e.g. `localhost:7860` or `http://127.0.0.1:7860`) | |
| 5. (Optional) If contributing, install pre-commit hook for automatic linting, formatting, and type-checking: | |
| ```sh | |
| uv run pre-commit install | |
| ``` | |
| ## User Flow | |
| 1. Select a sample character, or input a custom character description and click **"Generate Text"**, to generate your text input. | |
| 2. Click the **"Synthesize Speech"** button to synthesize two TTS outputs based on your text and character description. | |
| 3. Listen to both audio samples to compare their expressiveness. | |
| 4. Vote for the most expressive result by clicking either **"Select Option A"** or **"Select Option B"**. | |
| ## License | |
| This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details. | |