Edit model card

Master Thesis: High-Fidelity Video Background Music Generation using Transformers

This repository contains the pretrained models for the adaptation of MusicGen(https://arxiv.org/abs/2306.05284) to video-based music generation, which have been created during my Master Thesis at RWTH Aachen University.

Abstract

Current AI music generation models are mainly controlled with a single input modality: text. Adapting these models to accept alternative input modalities extends their field of use. Video input is one such modality, with remarkably different requirements for the generation of background music accompanying it. Even though alternative methods for generating video background music exist, none achieve the music quality and diversity of the text-based models. Hence, this thesis aims to efficiently reuse text-based models' high-fidelity music generation capabilities by adapting them for video background music generation. This is accomplished by training a model to represent video information inside a format that the text-based model can naturally process. To test the capabilities of our approach, we apply two datasets for model training with various levels of variation in the visual and audio parts. We evaluate our approach by analyzing the audio quality and diversity of the results. A case study is also performed to determine the video encoder's ability to capture the video-audio relationship successfully.

Installation

Model Usage

  • to start the Gradio interface run python app.py
  • select an example input video and start the generation by clicking "Submit"

Screenshot of the Gradio Interface

Screenshot of the Gradio Interface

Limitations and Usage Advice

  • not all models generate audible results, especially the smaller ones
  • the best results in terms of audio quality are generated with the parameters nature, peft=true, large

Contact

For any questions feel free to contact me at niklas.schulte@rwth-aachen.de

Downloads last month
0
Unable to determine this model's library. Check the docs .