WestAI-SC
/

high_fidelity_video_background_music_generation_with_transformers

Model card Files Files and versions Community

schnik commited on Feb 28

Commit

f5ef93a

•

1 Parent(s): 722c832

Create README.md

Files changed (1) hide show

README.md +46 -0

README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+---
+license: mit
+language:
+- en
+library_name: peft
+---
+# Master Thesis: High-Fidelity Video Background Music Generation using Transformers
+This is the corresponding GitLab Repository of my Master Thesis. The goal this thisis is to generate video background
+music by the adaptation of MusicGen (https://arxiv.org/pdf/2306.05284.pdf) to video input as another input modality.
+This should be accomplished by mapping video information into the T5 text embedding space on which MusicGen usually
+works on. To this end, a Transformer Encoder network to accomplish this task, called Video Encoder. Two options are
+foreseen within the training loop for the Video Encoder:
+- freezing the weights within the MusicGen Audio Decoder
+- adjusting the weights of the MusicGen Audio Decoder with Parameter Efficient Fine-Tuning (PEFT) using LoRA (https://arxiv.org/abs/2106.09685)
+# Installation
+- create a Python virtual environment with `Python 3.11`
+- check https://pytorch.org/get-started/previous-versions/ to install `PyTorch 2.1.0` with `CUDA` on your machine
+- install the local fork of audiocraft: `cd audiocraft; pip install -e .`
+- install the other requirements: `pip install -r requirements.txt`
+# Folder Structure
+- `audiocraft` contains a local fork of the audiocraft library (https://github.com/facebookresearch/audiocraft) with
+little changes to the generation method, further information can be seen in `code/code_adaptations_audiocraft`.
+- `code` contains the code for model `training` and `inference` of video background music
+- `datasets` contains the code to create the datasets used for training within `data_preparation` and video examples
+used for the evaluation in `example_videos`
+- `evaluation` contains the code used to evaluate the datasets and created video embeddings
+- `gradio_app` contains the code for interface to generate video background music
+# Training
+To train the models set the training parameters under `training/training_conf.yml` and start training with
+`python training/training.py`. The models weights will be stored under `training/models_audiocraft` or
+`training/models_peft` respectively.
+# Inference
+- start the user interface by running `python gradio_app/app.py`
+- inside the interface select a video, parameters
+- click on "submit" to start the generation
+# Contact
+For any questions contact me at [niklas.schulte@rwth-aachen.de](mailto:niklas.schulte@rwth-aachen.de)