Papers
arxiv:2402.13144

Neural Network Diffusion

Published on Feb 20
· Featured in Daily Papers on Feb 21
Authors:
,
,

Abstract

Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also generate high-performing neural network parameters. Our approach is simple, utilizing an autoencoder and a standard latent diffusion model. The autoencoder extracts latent representations of a subset of the trained network parameters. A diffusion model is then trained to synthesize these latent parameter representations from random noise. It then generates new representations that are passed through the autoencoder's decoder, whose outputs are ready to use as new subsets of network parameters. Across various architectures and datasets, our diffusion process consistently generates models of comparable or improved performance over trained networks, with minimal additional cost. Notably, we empirically find that the generated models perform differently with the trained networks. Our results encourage more exploration on the versatile use of diffusion models.

Community

Wow... 🤯

predicting loras on the fly, routing in moe, etc. could all use this

I read some of the paper and give a very short and brief summary of it: https://twitter.com/JavArButt/status/1760273030540869868

Also, I pose a question, that might be interesting for further research

·
Paper author

maybe in the next version, we will explore the tech of cross-arch parameter generation. Thanks for your question!

interesting

thinking about getting optimal initial state of parameters ahead of training, would reduce pre-training cost.

Shouldn't they compare with https://openreview.net/forum?id=JXkz3zm8gJ more thoroughly?

·
Paper author

hope this reply (https://x.com/liuzhuang1234/status/1760363128309600607?s=20) can address your question.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2402.13144 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.13144 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.13144 in a Space README.md to link it from this page.

Collections including this paper 27