voiceblock / voicebox /README.md
ALeLacheur's picture
Voiceblock demo: Attempt 8
957e2dc

A newer version of the Gradio SDK is available: 5.6.0

Upgrade

VoiceBlock

Privacy through Real-Time Adversarial Attacks with Audio-to-Audio Models

Open In Colab Demo MIT license

Contents

Installation

  1. Clone the repository:

     git clone https://github.com/voiceboxneurips/voicebox.git
    
  2. We recommend working from a clean environment, e.g. using conda:

     conda create --name voicebox python=3.9
     source activate voicebox 
    
  3. Install dependencies:

     cd voicebox
     pip install -r requirements.txt
     pip install -e .
    
  4. Grant permissions:

     chmod -R u+x scripts/
    

Reproducing Results

To reproduce our results, first download the corresponding data. Note that to download the VoxCeleb1 dataset, you must register and obtain a username and password.

Task Dataset (Size) Command
Objective evaluation VoxCeleb1 (39G) python scripts/downloads/download_voxceleb.py --subset=1 --username=<VGG_USERNAME> --password=<VGG_PASSWORD>
WER / supplemental evaluations LibriSpeech train-clean-360 (23G) ./scripts/downloads/download_librispeech_eval.sh
Train attacks LibriSpeech train-clean-100 (11G) ./scripts/downloads/download_librispeech_train.sh

We provide scripts to reproduce our experiments and save results, including generated audio, to named and time-stamped subdirectories within runs/. To reproduce our objective evaluation experiments using pre-trained attacks, run:

python scripts/experiments/evaluate.py

To reproduce our training, run:

python scripts/experiments/train.py

Streaming Implementation

As a proof of concept, we provide a streaming implementation of VoiceBox capable of modifying user audio in real-time. Here, we provide installation instructions for MacOS and Ubuntu 20.04.

MacOS

See video below:

Ubuntu 20.04

  1. Open a terminal and follow the installation instructions above. Change directory to the root of this repository.

  2. Run the following command:

     pacmd load-module module-null-sink sink_name=voicebox sink_properties=device.description=voicebox
    

    If you are using PipeWire instead of PulseAudio:

     pactl load-module module-null-sink media.class=Audio/Sink sink_name=voicebox sink_properties=device.description=voicebox
     
    

    PulseAudio is the default on Ubuntu. If you haven't changed your system defaults, you are probably using PulseAudio. This will add "voicebox" as an output device. Select it as the input to your chosen audio software.

  3. Find which audio device to read and write from. In your conda environment, run:

     python -m sounddevice
    

    You will get output similar to this:

       0 HDA Intel HDMI: 0 (hw:0,3), ALSA (0 in, 8 out)
       1 HDA Intel HDMI: 1 (hw:0,7), ALSA (0 in, 8 out)
       2 HDA Intel HDMI: 2 (hw:0,8), ALSA (0 in, 8 out)
       3 HDA Intel HDMI: 3 (hw:0,9), ALSA (0 in, 8 out)
       4 HDA Intel HDMI: 4 (hw:0,10), ALSA (0 in, 8 out)
       5 hdmi, ALSA (0 in, 8 out)
       6 jack, ALSA (2 in, 2 out)
       7 pipewire, ALSA (64 in, 64 out)
       8 pulse, ALSA (32 in, 32 out)
     * 9 default, ALSA (32 in, 32 out)
    

    In this example, we are going to route the audio through PipeWire (channel 7). This will be our INPUT_NUM and OUTPUT_NUM

  4. First, we need to create a conditioning embedding. To do this, run the enrollment script and follow its on-screen instructions:

     python scripts/streamer/enroll.py --input INPUT_NUM
    
  5. We can now use the streamer. Run:

     python scripts/stream.py --input INPUT_NUM --output OUTPUT_NUM
    
  6. Once the streamer is running, open pavucontrol.

    a. In pavucontrol, go to the "Playback" tab and find "ALSA pug-in [python3.9]: ALSA Playback on". Set the output to "voicebox".

    b. Then, go to "Recording" and find "ALSA pug-in [python3.9]: ALSA Playback from", and set the input to your desired microphone device.

Citation

If you use this your academic research, please cite the following:

@inproceedings{authors2022voicelock,
title={VoiceBlock: Privacy through Real-Time Adversarial Attacks with Audio-to-Audio Models},
author={Patrick O'Reilly, Andreas Bugler, Keshav Bhandari, Max Morrison, Bryan Pardo},
booktitle={Neural Information Processing Systems},
month={November},
year={2022}
}