## Audio Data Ownership ## Installation conda env create -n audio_ethics --file gen_audio_ethics_3.10.yml To set up wandb, please check out this following link: [https://docs.wandb.ai/quickstart](https://docs.wandb.ai/quickstart) ## Run Encoder Attack cd src python test_encoder_attack.py ## Overview ## Task 1: Audio Completion with Diffusion Models For this task, we use the [Free Music Archive (FMA)](https://github.com/mdeff/fma), which is a collection of royalty-free music. You can use any version of the model you wish, but we'll use the `fma_large` partition for training an initial system. Note: If librosa version is too high, have to edit line in audioldm to be `fft_window = pad_center(fft_window, size=filter_length)` To preprocess FMA, configure the file with your corresponding path and run the correct preprocessing script to convert the `.mp3` files to numpy (Loading in audio files during training is prohibitively slow). - Proceprocessing for ArchiSound encoders: `nohup python -u scripts/data_processing/process_music_numpy.py > logs/process_48k_music.out &` ## Task 2: TTS with Diffusion Models TTS with Diffusion (or flow) models is one approach of many that folks have been taking for SOTA TTS performance right now. In this repo, we have a model similar to [Grad-TTS](https://grad-tts.github.io/), with the example inference for Grad-TTS below: ![Inference Figure for Grad-TTS](./assets/gradtts_system.png) To run, first you need to build the `monotonic_align` code: `cd model/monotonic_align; python setup.py build_ext --inplace; cd ../..` You possibly might have to move the generated .so file to the `monotonic_align/` directory if it is generated in `montonic_align/build/`.