arxiv:2410.06084

Diversity-Rewarded CFG Distillation

Published on Oct 8

· Submitted by

alexrame on Oct 10

Upvote

Authors:

Andrea Agostinelli ,

Johan Ferret ,

Olivier Bachem ,

Sarah Perrin ,

Alexandre Ramé

Abstract

Generative models are transforming creative domains such as music generation, with inference-time strategies like Classifier-Free Guidance (CFG) playing a crucial role. However, CFG doubles inference cost while limiting originality and diversity across generated contents. In this paper, we introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations. Our approach optimises two training objectives: (1) a distillation objective, encouraging the model alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL objective with a diversity reward, promoting the generation of diverse outputs for a given prompt. By finetuning, we learn model weights with the ability to generate high-quality and diverse outputs, without any inference overhead. This also unlocks the potential of weight-based model merging strategies: by interpolating between the weights of two models (the first focusing on quality, the second on diversity), we can control the quality-diversity trade-off at deployment time, and even further boost performance. We conduct extensive experiments on the MusicLM (Agostinelli et al., 2023) text-to-music generative model, where our approach surpasses CFG in terms of quality-diversity Pareto optimality. According to human evaluators, our finetuned-then-merged model generates samples with higher quality-diversity than the base model augmented with CFG. Explore our generations at https://google-research.github.io/seanet/musiclm/diverse_music/.

View arXiv page View PDF Add to collection

Community

alexrame

Paper author Paper submitter about 7 hours ago

•

edited about 7 hours ago

An AI will win a Nobel price someday✨. Yet currently, alignment reduces creativity. Our new GoogleDeepMind paper "diversity-rewarded CFG distillation" improves quality AND diversity for music, via distillation of test-time compute, RL with a diversity reward, and model merging. See more here: https://x.com/ramealexandre/status/1844296670059602081 and https://x.com/CdrGeo/status/1844306954992415142

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.06084 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.06084 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.06084 in a Space README.md to link it from this page.