update README
Browse files- .gitattributes +1 -0
- README.md +12 -7
- gradio_interface.png +3 -0
.gitattributes
CHANGED
@@ -450,3 +450,4 @@ inference_large/peft_symmv_large_s_8_video(2).mp4 filter=lfs diff=lfs merge=lfs
|
|
450 |
inference_large/peft_symmv_large_s_8_video(3).mp4 filter=lfs diff=lfs merge=lfs -text
|
451 |
inference_large/peft_symmv_large_s_8_video(4).mp4 filter=lfs diff=lfs merge=lfs -text
|
452 |
*.mp4 filter=lfs diff=lfs merge=lfs -text
|
|
|
|
450 |
inference_large/peft_symmv_large_s_8_video(3).mp4 filter=lfs diff=lfs merge=lfs -text
|
451 |
inference_large/peft_symmv_large_s_8_video(4).mp4 filter=lfs diff=lfs merge=lfs -text
|
452 |
*.mp4 filter=lfs diff=lfs merge=lfs -text
|
453 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -8,18 +8,23 @@ language:
|
|
8 |
### Abstract
|
9 |
Current AI music generation models are mainly controlled with a single input modality: text. Adapting these models to accept alternative input modalities extends their field of use. Video input is one such modality, with remarkably different requirements for the generation of background music accompanying it. Even though alternative methods for generating video background music exist, none achieve the music quality and diversity of the text-based models. Hence, this thesis aims to efficiently reuse text-based models' high-fidelity music generation capabilities by adapting them for video background music generation. This is accomplished by training a model to represent video information inside a format that the text-based model can naturally process. To test the capabilities of our approach, we apply two datasets for model training with various levels of variation in the visual and audio parts. We evaluate our approach by analyzing the audio quality and diversity of the results. A case study is also performed to determine the video encoder's ability to capture the video-audio relationship successfully.
|
10 |
|
11 |
-
This repository contains the code for the pretrained models for the adaptation of MusicGen([https://arxiv.org/abs/2306.05284](https://arxiv.org/abs/2306.05284)) to video background music generation. A Gradio interface is provided for convinient usage of the models.
|
12 |
|
13 |
### Installation
|
14 |
-
- install PyTorch `2.1.0` with CUDA enabled by following the instructions from [https://pytorch.org/get-started/previous-versions/]
|
15 |
- install the local fork of the audiocraft with `pip install git+https://github.com/IntelliNik/audiocraft.git@main`
|
16 |
- install the remaining dependencies with `pip install peft moviepy omegaconf`
|
17 |
|
18 |
-
###
|
19 |
-
-
|
20 |
-
- select an example input video
|
21 |
-
|
22 |
-
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
### Contact
|
25 |
For any questions contact me at [niklas.schulte@rwth-aachen.de](mailto:niklas.schulte@rwth-aachen.de)
|
|
|
8 |
### Abstract
|
9 |
Current AI music generation models are mainly controlled with a single input modality: text. Adapting these models to accept alternative input modalities extends their field of use. Video input is one such modality, with remarkably different requirements for the generation of background music accompanying it. Even though alternative methods for generating video background music exist, none achieve the music quality and diversity of the text-based models. Hence, this thesis aims to efficiently reuse text-based models' high-fidelity music generation capabilities by adapting them for video background music generation. This is accomplished by training a model to represent video information inside a format that the text-based model can naturally process. To test the capabilities of our approach, we apply two datasets for model training with various levels of variation in the visual and audio parts. We evaluate our approach by analyzing the audio quality and diversity of the results. A case study is also performed to determine the video encoder's ability to capture the video-audio relationship successfully.
|
10 |
|
11 |
+
This repository contains the code for the pretrained models for the adaptation of MusicGen([https://arxiv.org/abs/2306.05284](https://arxiv.org/abs/2306.05284)) to video background music generation. A Gradio interface is provided for a convinient usage of the pretrained models.
|
12 |
|
13 |
### Installation
|
14 |
+
- install PyTorch `2.1.0` with CUDA enabled by following the instructions from [https://pytorch.org/get-started/previous-versions/](https://pytorch.org/get-started/previous-versions/)
|
15 |
- install the local fork of the audiocraft with `pip install git+https://github.com/IntelliNik/audiocraft.git@main`
|
16 |
- install the remaining dependencies with `pip install peft moviepy omegaconf`
|
17 |
|
18 |
+
### Staring the Interface for Inference
|
19 |
+
- run `python app.py`
|
20 |
+
- select an example input video and start the generation by clicking "Submit"
|
21 |
+
|
22 |
+
### Screenshot of the Gradio Interface
|
23 |
+

|
24 |
+
|
25 |
+
### Limitations and Usage Advice
|
26 |
+
- not all models generate audible results, especially the smaller ones
|
27 |
+
- the best results in terms of audio quality are generated with the parameters `nature, peft=true, large`
|
28 |
|
29 |
### Contact
|
30 |
For any questions contact me at [niklas.schulte@rwth-aachen.de](mailto:niklas.schulte@rwth-aachen.de)
|
gradio_interface.png
ADDED
![]() |
Git LFS Details
|