VocRT

This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively.


Repository Structure

β”œβ”€β”€ backend/         # Express server for handling API requests
β”œβ”€β”€ frontend/        # React client for user interaction
β”œβ”€β”€ .env             # Environment variables (OpenAI API key, etc.)
β”œβ”€β”€ voices           # All available voices
β”œβ”€β”€ demo             # Contains sample audio and demo files
β”œβ”€β”€ other...

Docker

🐳 VocRT on Docker Hub: https://hub.docker.com/r/anuragsingh922/vocrt

Repository

Setup Guide

Step 1: Clone the Repository

Clone this repository to your local machine:

git clone https://huggingface.co/anuragsingh922/VocRT
cd VocRT

Step 2: Python Virtual Environment Setup

Create a virtual environment to manage dependencies:

macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Windows:

python -m venv venv
venv\Scripts\activate

Step 3: Install Python Dependencies

With the virtual environment activated, install the required dependencies:

pip install --upgrade pip setuptools wheel
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
pip install phonemizer transformers scipy munch python-dotenv openai grpcio grpcio-tools

Installing eSpeak

eSpeak is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform:

Ubuntu/Linux

Use the apt-get package manager to install eSpeak:

sudo apt-get update
sudo apt-get install espeak

macOS

Install eSpeak using Homebrew:

  1. Ensure Homebrew is installed on your system:
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
    
  2. Install espeak:
    brew install espeak
    

Windows

For Windows, follow these steps to install eSpeak:

  1. Download the eSpeak installer from the official website: eSpeak Downloads.
  2. Run the installer and follow the on-screen instructions to complete the installation.
  3. Add the eSpeak installation path to your system's PATH environment variable:
    • Open System Properties β†’ Advanced β†’ Environment Variables.
    • In the "System Variables" section, find the Path variable and edit it.
    • Add the path to the espeak.exe file (e.g., C:\Program Files (x86)\eSpeak).
  4. Verify the installation: Open Command Prompt and run:
    espeak --version
    

Verification

After installing eSpeak, verify it is correctly set up by running:

espeak "Hello, world!"

This should output "Hello, world!" as audio on your system.


Step 4: Backend Setup (Express Server)

  1. Navigate to the backend directory:

    cd backend
    
  2. Install Node.js dependencies:

    npm install
    
  3. Update the config.env file with your Deepgram API key:

    • Open config.env in a text editor.
    • Replace <deepgram_api_key> with your actual Deepgram API key.
  4. Start the Express server:

    node app.js
    

Step 5: Frontend Setup (React Client)

  1. Open a new terminal and navigate to the frontend directory:
    cd frontend
    
  2. Install client dependencies:
    npm install
    
  3. Start the client:
    npm start
    

Step 6: Start the VocRT Server

  1. Add your OpenAI API key to the .env file:

    • Open .env in a text editor.
    • Replace <openai_api_key> with your actual OpenAI API key.
  2. Start the VocRT server:

    python3 app.py
    

Step 7: Test the Full System

  • Once all servers are running:
    1. Access the React client at http://localhost:3000.
    2. Interact with the VocRT system via the web interface.

Model Used

VocRT uses Kokoro-82M for text-to-speech synthesis, processing user inputs into high-quality voice responses.


Key Features

  1. Realtime voice response generation: Convert speech input into speech with minimal latency.
  2. React Client: A user-friendly frontend for interaction.
  3. Express Backend: Handles API requests and integrates the VocRT system with external services.
  4. gRPC Communication: Seamless communication between the VocRT server and other components.
  5. Configurable APIs: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation.

Dependencies

Python:

  • torch, torchvision, torchaudio
  • phonemizer
  • transformers
  • scipy
  • munch
  • python-dotenv
  • openai
  • grpcio, grpcio-tools
  • espeak

Node.js:

  • Express server dependencies (npm install in backend).
  • React client dependencies (npm install in frontend).

Contributing

Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements.


Acknowledgments

  • Hugging Face for hosting the Kokoro-82M model.
  • The amazing communities behind PyTorch, OpenAI, and Deepgram APIs.
Downloads last month
20
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for anuragsingh922/VocRT

Finetuned
hexgrad/Kokoro-82M
Quantized
(9)
this model