Create README.md (#1)
Browse files- Create README.md (f5ef93a95f103ef58585ea56d753c4e719426f02)
README.md
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
library_name: peft
|
6 |
+
---
|
7 |
+
|
8 |
+
# Master Thesis: High-Fidelity Video Background Music Generation using Transformers
|
9 |
+
This is the corresponding GitLab Repository of my Master Thesis. The goal this thisis is to generate video background
|
10 |
+
music by the adaptation of MusicGen (https://arxiv.org/pdf/2306.05284.pdf) to video input as another input modality.
|
11 |
+
This should be accomplished by mapping video information into the T5 text embedding space on which MusicGen usually
|
12 |
+
works on. To this end, a Transformer Encoder network to accomplish this task, called Video Encoder. Two options are
|
13 |
+
foreseen within the training loop for the Video Encoder:
|
14 |
+
|
15 |
+
- freezing the weights within the MusicGen Audio Decoder
|
16 |
+
- adjusting the weights of the MusicGen Audio Decoder with Parameter Efficient Fine-Tuning (PEFT) using LoRA (https://arxiv.org/abs/2106.09685)
|
17 |
+
|
18 |
+
|
19 |
+
# Installation
|
20 |
+
- create a Python virtual environment with `Python 3.11`
|
21 |
+
- check https://pytorch.org/get-started/previous-versions/ to install `PyTorch 2.1.0` with `CUDA` on your machine
|
22 |
+
- install the local fork of audiocraft: `cd audiocraft; pip install -e .`
|
23 |
+
- install the other requirements: `pip install -r requirements.txt`
|
24 |
+
|
25 |
+
|
26 |
+
# Folder Structure
|
27 |
+
- `audiocraft` contains a local fork of the audiocraft library (https://github.com/facebookresearch/audiocraft) with
|
28 |
+
little changes to the generation method, further information can be seen in `code/code_adaptations_audiocraft`.
|
29 |
+
- `code` contains the code for model `training` and `inference` of video background music
|
30 |
+
- `datasets` contains the code to create the datasets used for training within `data_preparation` and video examples
|
31 |
+
used for the evaluation in `example_videos`
|
32 |
+
- `evaluation` contains the code used to evaluate the datasets and created video embeddings
|
33 |
+
- `gradio_app` contains the code for interface to generate video background music
|
34 |
+
|
35 |
+
# Training
|
36 |
+
To train the models set the training parameters under `training/training_conf.yml` and start training with
|
37 |
+
`python training/training.py`. The models weights will be stored under `training/models_audiocraft` or
|
38 |
+
`training/models_peft` respectively.
|
39 |
+
|
40 |
+
# Inference
|
41 |
+
- start the user interface by running `python gradio_app/app.py`
|
42 |
+
- inside the interface select a video, parameters
|
43 |
+
- click on "submit" to start the generation
|
44 |
+
|
45 |
+
# Contact
|
46 |
+
For any questions contact me at [niklas.schulte@rwth-aachen.de](mailto:niklas.schulte@rwth-aachen.de)
|