schnik commited on
Commit
f5ef93a
1 Parent(s): 722c832

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ library_name: peft
6
+ ---
7
+
8
+ # Master Thesis: High-Fidelity Video Background Music Generation using Transformers
9
+ This is the corresponding GitLab Repository of my Master Thesis. The goal this thisis is to generate video background
10
+ music by the adaptation of MusicGen (https://arxiv.org/pdf/2306.05284.pdf) to video input as another input modality.
11
+ This should be accomplished by mapping video information into the T5 text embedding space on which MusicGen usually
12
+ works on. To this end, a Transformer Encoder network to accomplish this task, called Video Encoder. Two options are
13
+ foreseen within the training loop for the Video Encoder:
14
+
15
+ - freezing the weights within the MusicGen Audio Decoder
16
+ - adjusting the weights of the MusicGen Audio Decoder with Parameter Efficient Fine-Tuning (PEFT) using LoRA (https://arxiv.org/abs/2106.09685)
17
+
18
+
19
+ # Installation
20
+ - create a Python virtual environment with `Python 3.11`
21
+ - check https://pytorch.org/get-started/previous-versions/ to install `PyTorch 2.1.0` with `CUDA` on your machine
22
+ - install the local fork of audiocraft: `cd audiocraft; pip install -e .`
23
+ - install the other requirements: `pip install -r requirements.txt`
24
+
25
+
26
+ # Folder Structure
27
+ - `audiocraft` contains a local fork of the audiocraft library (https://github.com/facebookresearch/audiocraft) with
28
+ little changes to the generation method, further information can be seen in `code/code_adaptations_audiocraft`.
29
+ - `code` contains the code for model `training` and `inference` of video background music
30
+ - `datasets` contains the code to create the datasets used for training within `data_preparation` and video examples
31
+ used for the evaluation in `example_videos`
32
+ - `evaluation` contains the code used to evaluate the datasets and created video embeddings
33
+ - `gradio_app` contains the code for interface to generate video background music
34
+
35
+ # Training
36
+ To train the models set the training parameters under `training/training_conf.yml` and start training with
37
+ `python training/training.py`. The models weights will be stored under `training/models_audiocraft` or
38
+ `training/models_peft` respectively.
39
+
40
+ # Inference
41
+ - start the user interface by running `python gradio_app/app.py`
42
+ - inside the interface select a video, parameters
43
+ - click on "submit" to start the generation
44
+
45
+ # Contact
46
+ For any questions contact me at [niklas.schulte@rwth-aachen.de](mailto:niklas.schulte@rwth-aachen.de)