Spaces:
Runtime error
A newer version of the Gradio SDK is available:
6.0.1
license: mit
title: Human voice to Robot voice
python_version: 3.11
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
emoji: 👀
colorFrom: blue
colorTo: red
short_description: This project builds upon work of Neil Lakin/robot_voice.
Robot Voice with Enhanced Features
This project is based on the paper "A Simple Digital Model of the Diode-Based Ring-Modulator," which can be found at http://recherche.ircam.fr/pub/dafx11/Papers/66_e.pdf.
This project builds upon the excellent work of Neil Lakin - robot_voice on his robot_voice project, adding several exciting new features:
Get Started:
This project aims to offer a more dynamic and customizable approach to robot voice synthesis. Explore the code, experiment with the features, and contribute to the ongoing development!
Enhanced Capabilities:
- Real-time Recording: Capture your voice instantly and manipulate it with the same robotic effects applied to text input.
- Virtual Audio Driver Connectivity: Seamlessly integrate with virtual audio drivers, allowing you to route the synthesized voice output to various applications and devices.
- Expanded Presets: Utilize a wider variety of pre-configured voice settings to customize the robotic sound to your liking, with sample audio included for demonstration.
Project Foundation:
The core code of this project is based on Neil Lakin's original "robot_voice" project. For a deeper understanding of the underlying concepts and techniques, I encourage you to explore his repository:
Neil Lakin's "robot_voice" repository
How to Use:
1. Audio Files
- Place your audio files (e.g., .wav, .mp3) in the
audios/audios/input_audiosfolder. - The processed robotic voice versions of your files will be saved in the
audios/audios/output_audiosfolder with the same filenames.
2. Configuration
Configuration File of all configurations are stored in
config_folder/config.yaml. You can control whether the program prompts for configuration by settingASK_NEW_CONFIGtotrueorfalse. If set tofalse, the program will not ask for configuration.Keyboard Shortcuts Use the following keyboard shortcuts to control the program:
s: Start recordingq: Stop recordinge: Stream Robot Voice to microphone
3. Real-time Recording
- Record your voice using the provided interface and
write_to_file=True.- Your recordings will be saved in
audios/records/input_records/mic.wav. - The robotic versions of your recordings will be saved in
audios/records/output_records/mic.wav.
- Your recordings will be saved in
- If `write_to_file=False`. The robotic versions of your recordings will be saved in `audios/records/output_records/mic.wav`.
- Config
4. Parameters for Adjusting
This section explains the parameters you can use to fine-tune the sound of the generated speech.
4.1. Parameter Short Explanation and Default Values
| Parameter | Default Value | Description |
|---|---|---|
VB |
0.2 | Controls the volume of the bright part of the speech. Higher values result in a shorter voice that might be difficult to hear clearly. |
VL |
0.4 | Controls the volume of the low part of the speech. Higher values result in a shorter voice that might be difficult to hear clearly. |
H |
4 | Controls the slope of the linear section of the diode model's response, which kicks in after the voltage exceeds VL. |
LOOKUP_SAMPLES |
1024 | Determines the size of the lookup table used for sound synthesis. Lower values can introduce noise. |
MOD_F |
50 | Controls the modulation frequency, which influences the "robotic" effect. Higher values produce a more robotic sound. |
Important Notes:
- Each parameter has a default value. You can adjust them to fine-tune the sound to your liking.
VBandVLare independent and do not have the same value.VBandVLmust be below 1.0.
4.2. Parameter Short Explanation
4.2.1. Explaining the Parameters in the Diode-Based Ring Modulator Model
The parameters you listed are used in a digital model of a diode-based ring modulator, as described in the article "A SIMPLE DIGITAL MODEL OF THE DIODE-BASED RING-MODULATOR". This model aims to recreate the distinctive sound of an analog diode-based ring modulator, which is characterized by its non-linear behavior and added harmonics.
Here's a breakdown of each parameter and its role in the model:
VB(Default: 0.2): Diode Forward Bias Voltage- This parameter emulates the forward bias voltage of a diode, controlling the point at which the diode starts conducting.
- It essentially sets the threshold for the signal to pass through the diode model.
- The paper mentions using values of 0.2 and 0.4, and it's important to keep this value below 1.
VL(Default: 0.4): Transition to Linear Behavior- This parameter determines the voltage at which the diode model transitions from a curved response to a linear response.
- Beyond this voltage, the output of the diode model is proportional to the input.
- Like
VB,VLshould be kept below 1, and the paper uses 0.4 as a value.
H(Default: 4): Slope of Linear Section- This parameter controls the slope of the linear section of the diode model's response, which kicks in after the voltage exceeds
VL. - A higher
Hvalue means a steeper slope, leading to a more pronounced effect of the diode's non-linearity. - By adjusting
H, you can influence the overall distortion characteristics of the ring modulator.
- This parameter controls the slope of the linear section of the diode model's response, which kicks in after the voltage exceeds
LOOKUP_SAMPLES(Default: 1024): Size of Lookup Table- This parameter determines the number of samples used in a lookup table that represents the diode's non-linearity.
- A larger table can potentially provide a more accurate representation of the non-linearity, but it comes at the cost of increased memory usage.
- The article recommends leaving this value as is unless you have specific reasons to change it.
MOD_F(Default: 50): Modulating Frequency- This parameter sets the frequency of the modulating signal in Hertz (Hz).
- In ring modulation, the modulating signal multiplies the carrier signal, creating sum and difference frequencies.
MOD_Fdetermines the frequency of one of the input signals to the ring modulator.
Understanding these parameters helps you to control and manipulate the sound of the digital diode-based ring modulator. You can experiment with different values to achieve a wide range of timbres, from subtle saturation to harsher, more distorted sounds.
4.2.2. Exploring the Impact of High and Low Parameter Values
Adjusting the parameters in the digital model of a diode-based ring modulator influences the sound by altering the behavior of the simulated diodes. Here's a breakdown of what happens when these parameters are set to high or low values:
VB(Diode Forward Bias Voltage)- High
VB: With a higher forward bias voltage, the diode takes longer to start conducting. This means a larger portion of the input signal will fall below the threshold and be effectively clipped, resulting in a more pronounced distortion effect. - Low
VB: A lowerVBallows the diode to conduct at lower signal levels. This leads to less clipping and a more subtle, less distorted sound. The signal will pass through with less alteration.
- High
VL(Transition to Linear Behavior)- High
VL: IncreasingVLexpands the curved, non-linear region of the diode's response. This enhances the initial distortion and coloration before the signal reaches the linear section. - Low
VL: A lowerVLmeans a shorter curved region. The diode's response becomes linear more quickly, resulting in less initial distortion and a faster transition to a cleaner output.
- High
H(Slope of Linear Section)- High
H: A steeper slope in the linear section amplifies the differences between the input and output signals. This leads to a more significant change in the signal's overall shape, emphasizing the harmonic content introduced by the diode's non-linearity and creating a more distorted sound. - Low
H: A gentler slope reduces the difference between the input and output signals in the linear region. The effect of the non-linearity becomes less pronounced, resulting in less distortion.
- High
LOOKUP_SAMPLES(Size of Lookup Table)- High
LOOKUP_SAMPLES: While increasing the lookup table size can potentially improve the accuracy of the diode non-linearity representation, it might not significantly impact the perceived sound. It also increases memory usage. - Low
LOOKUP_SAMPLES: A smaller lookup table might introduce some inaccuracies in the representation of the diode non-linearity, potentially leading to a slightly less accurate emulation of the analog behavior.
- High
MOD_F(Modulating Frequency)- High
MOD_F: Increasing the modulating frequency shifts the sum and difference frequencies generated by the ring modulation process higher in the frequency spectrum. This can create a brighter, more metallic or clangorous sound. - Low
MOD_F: A lower modulating frequency results in lower sum and difference frequencies, leading to a darker, muddier, or more subtle ring modulation effect.
- High
Remember that the perception of "high" and "low" for these parameters depends on the context of the specific sounds you are working with. Experimentation is key to finding the sweet spots that create the desired sonic results.
4.2 Preset Styles
5. Connect to Virtual Audio Cable
Download and Install Driver in https://vb-audio.com/Cable/.
This flowchart illustrates the audio routing for real-time distortion processing using VB-Audio CABLE.
graph LR
A[Recorded Audio] --> B(CABLE Output);
B --> C{VB-Audio Virtual Cable};
C --> D(CABLE Input);
D --> E[Distorted Audio Output];
The process flows as follows:
Recorded Audio: Audio data from a recording source (e.g., microphone, instrument) is sent to...
CABLE Output: The virtual output of VB-Audio CABLE. This acts as a virtual audio interface output.
VB-Audio Virtual Cable: The audio data is passed through the VB-Audio Virtual Cable driver. This is where the distortion processing would occur (using a plugin or other software that intercepts the virtual cable's audio stream).
CABLE Input: The processed audio is then received by the virtual input of VB-Audio CABLE.
Distorted Audio Output: Finally, the distorted audio is sent to the desired output destination (e.g., speakers, headphones, recording software).
Note that: CABLE Output, VB-Audio Virtual Cable:, and CABLE Input can be replace by Input, Other Microphone, and Output.
Note
This readme assumes the project structure includes folders like audios, records, input_audios, output_audios, etc. If your project uses a different folder structure, adjust the paths accordingly.