Voice-Human2Robot / README.md
Danh Tran
Update README.md
497951d verified

A newer version of the Gradio SDK is available: 6.0.1

Upgrade
metadata
license: mit
title: Human voice to Robot voice
python_version: 3.11
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
emoji: 👀
colorFrom: blue
colorTo: red
short_description: This project builds upon work of Neil Lakin/robot_voice.

Robot Voice with Enhanced Features

Get Started:

This project aims to offer a more dynamic and customizable approach to robot voice synthesis. Explore the code, experiment with the features, and contribute to the ongoing development!

Enhanced Capabilities:

  • Real-time Recording: Capture your voice instantly and manipulate it with the same robotic effects applied to text input.
  • Virtual Audio Driver Connectivity: Seamlessly integrate with virtual audio drivers, allowing you to route the synthesized voice output to various applications and devices.
  • Expanded Presets: Utilize a wider variety of pre-configured voice settings to customize the robotic sound to your liking, with sample audio included for demonstration.

Project Foundation:

The core code of this project is based on Neil Lakin's original "robot_voice" project. For a deeper understanding of the underlying concepts and techniques, I encourage you to explore his repository:

Neil Lakin's "robot_voice" repository

How to Use:

1. Audio Files

  • Place your audio files (e.g., .wav, .mp3) in the audios/audios/input_audios folder.
  • The processed robotic voice versions of your files will be saved in the audios/audios/output_audios folder with the same filenames.

2. Configuration

  • Configuration File of all configurations are stored in config_folder/config.yaml. You can control whether the program prompts for configuration by setting ASK_NEW_CONFIG to true or false. If set to false, the program will not ask for configuration.

  • Keyboard Shortcuts Use the following keyboard shortcuts to control the program:

    • s: Start recording
    • q: Stop recording
    • e: Stream Robot Voice to microphone

3. Real-time Recording

  • Record your voice using the provided interface and write_to_file=True.
    • Your recordings will be saved in audios/records/input_records/mic.wav.
    • The robotic versions of your recordings will be saved in audios/records/output_records/mic.wav.
- If `write_to_file=False`. The robotic versions of your recordings will be saved in `audios/records/output_records/mic.wav`.
  • Config

4. Parameters for Adjusting

This section explains the parameters you can use to fine-tune the sound of the generated speech.

4.1. Parameter Short Explanation and Default Values

Parameter Default Value Description
VB 0.2 Controls the volume of the bright part of the speech. Higher values result in a shorter voice that might be difficult to hear clearly.
VL 0.4 Controls the volume of the low part of the speech. Higher values result in a shorter voice that might be difficult to hear clearly.
H 4 Controls the slope of the linear section of the diode model's response, which kicks in after the voltage exceeds VL.
LOOKUP_SAMPLES 1024 Determines the size of the lookup table used for sound synthesis. Lower values can introduce noise.
MOD_F 50 Controls the modulation frequency, which influences the "robotic" effect. Higher values produce a more robotic sound.

Important Notes:

  • Each parameter has a default value. You can adjust them to fine-tune the sound to your liking.
  • VB and VL are independent and do not have the same value.
  • VB and VL must be below 1.0.

4.2. Parameter Short Explanation

4.2.1. Explaining the Parameters in the Diode-Based Ring Modulator Model

The parameters you listed are used in a digital model of a diode-based ring modulator, as described in the article "A SIMPLE DIGITAL MODEL OF THE DIODE-BASED RING-MODULATOR". This model aims to recreate the distinctive sound of an analog diode-based ring modulator, which is characterized by its non-linear behavior and added harmonics.

Here's a breakdown of each parameter and its role in the model:

  • VB (Default: 0.2): Diode Forward Bias Voltage

    • This parameter emulates the forward bias voltage of a diode, controlling the point at which the diode starts conducting.
    • It essentially sets the threshold for the signal to pass through the diode model.
    • The paper mentions using values of 0.2 and 0.4, and it's important to keep this value below 1.
  • VL (Default: 0.4): Transition to Linear Behavior

    • This parameter determines the voltage at which the diode model transitions from a curved response to a linear response.
    • Beyond this voltage, the output of the diode model is proportional to the input.
    • Like VB, VL should be kept below 1, and the paper uses 0.4 as a value.
  • H (Default: 4): Slope of Linear Section

    • This parameter controls the slope of the linear section of the diode model's response, which kicks in after the voltage exceeds VL.
    • A higher H value means a steeper slope, leading to a more pronounced effect of the diode's non-linearity.
    • By adjusting H, you can influence the overall distortion characteristics of the ring modulator.
  • LOOKUP_SAMPLES (Default: 1024): Size of Lookup Table

    • This parameter determines the number of samples used in a lookup table that represents the diode's non-linearity.
    • A larger table can potentially provide a more accurate representation of the non-linearity, but it comes at the cost of increased memory usage.
    • The article recommends leaving this value as is unless you have specific reasons to change it.
  • MOD_F (Default: 50): Modulating Frequency

    • This parameter sets the frequency of the modulating signal in Hertz (Hz).
    • In ring modulation, the modulating signal multiplies the carrier signal, creating sum and difference frequencies.
    • MOD_F determines the frequency of one of the input signals to the ring modulator.

Understanding these parameters helps you to control and manipulate the sound of the digital diode-based ring modulator. You can experiment with different values to achieve a wide range of timbres, from subtle saturation to harsher, more distorted sounds.

4.2.2. Exploring the Impact of High and Low Parameter Values

Adjusting the parameters in the digital model of a diode-based ring modulator influences the sound by altering the behavior of the simulated diodes. Here's a breakdown of what happens when these parameters are set to high or low values:

  • VB (Diode Forward Bias Voltage)

    • High VB: With a higher forward bias voltage, the diode takes longer to start conducting. This means a larger portion of the input signal will fall below the threshold and be effectively clipped, resulting in a more pronounced distortion effect.
    • Low VB: A lower VB allows the diode to conduct at lower signal levels. This leads to less clipping and a more subtle, less distorted sound. The signal will pass through with less alteration.
  • VL (Transition to Linear Behavior)

    • High VL: Increasing VL expands the curved, non-linear region of the diode's response. This enhances the initial distortion and coloration before the signal reaches the linear section.
    • Low VL: A lower VL means a shorter curved region. The diode's response becomes linear more quickly, resulting in less initial distortion and a faster transition to a cleaner output.
  • H (Slope of Linear Section)

    • High H: A steeper slope in the linear section amplifies the differences between the input and output signals. This leads to a more significant change in the signal's overall shape, emphasizing the harmonic content introduced by the diode's non-linearity and creating a more distorted sound.
    • Low H: A gentler slope reduces the difference between the input and output signals in the linear region. The effect of the non-linearity becomes less pronounced, resulting in less distortion.
  • LOOKUP_SAMPLES (Size of Lookup Table)

    • High LOOKUP_SAMPLES: While increasing the lookup table size can potentially improve the accuracy of the diode non-linearity representation, it might not significantly impact the perceived sound. It also increases memory usage.
    • Low LOOKUP_SAMPLES: A smaller lookup table might introduce some inaccuracies in the representation of the diode non-linearity, potentially leading to a slightly less accurate emulation of the analog behavior.
  • MOD_F (Modulating Frequency)

    • High MOD_F: Increasing the modulating frequency shifts the sum and difference frequencies generated by the ring modulation process higher in the frequency spectrum. This can create a brighter, more metallic or clangorous sound.
    • Low MOD_F: A lower modulating frequency results in lower sum and difference frequencies, leading to a darker, muddier, or more subtle ring modulation effect.

Remember that the perception of "high" and "low" for these parameters depends on the context of the specific sounds you are working with. Experimentation is key to finding the sweet spots that create the desired sonic results.

4.2 Preset Styles

5. Connect to Virtual Audio Cable

  • Download and Install Driver in https://vb-audio.com/Cable/.

  • This flowchart illustrates the audio routing for real-time distortion processing using VB-Audio CABLE.

graph LR
A[Recorded Audio] --> B(CABLE Output);
B --> C{VB-Audio Virtual Cable};
C --> D(CABLE Input);
D --> E[Distorted Audio Output];

The process flows as follows:

  • Recorded Audio: Audio data from a recording source (e.g., microphone, instrument) is sent to...

  • CABLE Output: The virtual output of VB-Audio CABLE. This acts as a virtual audio interface output.

  • VB-Audio Virtual Cable: The audio data is passed through the VB-Audio Virtual Cable driver. This is where the distortion processing would occur (using a plugin or other software that intercepts the virtual cable's audio stream).

  • CABLE Input: The processed audio is then received by the virtual input of VB-Audio CABLE.

  • Distorted Audio Output: Finally, the distorted audio is sent to the desired output destination (e.g., speakers, headphones, recording software).

  • Note that: CABLE Output, VB-Audio Virtual Cable:, and CABLE Input can be replace by Input, Other Microphone, and Output.

Note

This readme assumes the project structure includes folders like audios, records, input_audios, output_audios, etc. If your project uses a different folder structure, adjust the paths accordingly.