Ema vs non-ema version, differences?

#22
by aimer-yui - opened

What are the pros and cons of using ema.ckpt vs nonema.ckpt? I saw some Reddit/Github discussions on it, but would still like to understand better.

Used for : Generating images (webui), no training.

Thanks πŸ™‚

BUMP
I would like an answer as well.
ANYONE?

Since there's almost a hostile lack of information on the subject, I can at least share what I've just found:
(Statements are not verified)

"EMA (exponential moving average) is meant as a checkpoint for resuming training while the normal, smaller one is for inference."

"And what are EMA weights, and why are they supposed to be better? Same as when you are training as a student, maybe you will fail your last test or decide to cheat and memorize the answers. So generally you get a better approximation of the student performance by using an average of the test scores, and since you don't care about kindergarten, you get MA (moving average) if you just consider last year, and a EMA if you keep the whole history but give way more weight to recent scores."

found here:
https://www.reddit.com/r/StableDiffusion/comments/x5am4v/ema_model_vs_non_ema_differences/

================================================================================================================================================

"EMA should create more creatives images"
"The ema version is intended for users that wish to train the model further"

found here:
https://www.reddit.com/r/StableDiffusion/comments/zfut05/ema_or_not_ema/

================================================================================================================================================

I find it SO annoying that there are so many arrogant non-answers about these things when people ask simple questions!
Programmers are not teachers, but they would do well to learn to explain things to the rest of us.

All that aside, I am extremely impressed with, and very thankful for stable diffusion.
I hope the lawsuits don't kill it. But I practically know they won't, now that pandora's out of the box, we can't just cram it back in, it doesn't work like that.
This is a technology that will revolutionize the world, and has surely already started a cultural revolution.
I'm a musician, and I know it's coming for us next. Although.. probably not as effectively. music diffusion, or "riffusion" is kind of shit, as it is right now.

Thank you for your explanation. Even though i asked so, still i download non ema version. Because the file size is three times smaller πŸ˜‚

No problem :)

"I find it SO annoying that there are so many arrogant non-answers about these things when people ask simple questions!"

People are trying their best to answer questions and your subconscious is picking a word arrogant for whom ? Who is your subconscious are talking to ? Other guy or yourself. May be it is begging you not to be arrogant this way. I am a mechanical engineer and programmer. If these questions are simple, and if explaining them to someone who knows nothing technical is simple why don't you give it a try and get an engineering and programming degree and explain things to us. Again, "maybe" we will learn something.

I'm a musician, and I know it's coming for us next. Although.. probably not as effectively. music diffusion, or "riffusion" is kind of shit, as it is right now.

So, how good did you think stable diffusion was in august 2022 when it first came out? Or the first steps of Midjourney or Dall-E? Now, all of these tools are getting way too good... Not that I'm complaining, these tools allow me to do things I never hoped I would be able to...

Given that interest in the visual art is much greater and that many more people are interested in producing images than producing music, it might take LONGER for Generative Music AI to get good. However, I in no way believe it will be LESS EFFECTIVE than Generative Image AI once it gets there. The rules and principles for music are a MUCH simpler and coherent curriculum than the rules and principles of visual arts. Once better models come out, get ready to be swept away real fast.

Even after reading comments on reddit link and all the conversation here, the difference between EMA and non-EMA is not clear.

EMA is meant as a checkpoint for resuming training. This is mentioned by @Hardts
You are supposed to use the EMA model for inference. This is mentioned by in one of the longest comments on reddit.

Only points clear are:

  1. EMA checkpoint file size is big.
  2. EMA means Exponential Moving Average.

"I find it SO annoying that there are so many arrogant non-answers about these things when people ask simple questions!"

People are trying their best to answer questions and your subconscious is picking a word arrogant for whom ? Who is your subconscious are talking to ? Other guy or yourself. May be it is begging you not to be arrogant this way. I am a mechanical engineer and programmer. If these questions are simple, and if explaining them to someone who knows nothing technical is simple why don't you give it a try and get an engineering and programming degree and explain things to us. Again, "maybe" we will learn something.

Well, you made his point perfectly!!! It is EXACTLY this kind of arrogant answer he was talking about. If you don't have anything to say to HELP (or don't want to), just STFU, ok? Don't answer just to waste other's time.

"I find it SO annoying that there are so many arrogant non-answers about these things when people ask simple questions!"

People are trying their best to answer questions and your subconscious is picking a word arrogant for whom ? Who is your subconscious are talking to ? Other guy or yourself. May be it is begging you not to be arrogant this way. I am a mechanical engineer and programmer. If these questions are simple, and if explaining them to someone who knows nothing technical is simple why don't you give it a try and get an engineering and programming degree and explain things to us. Again, "maybe" we will learn something.

I didn't mention this, but I do IT-Support by trade. I am in no means a programmer. I can do a bit of basic scripting - like windows batch and a tiny bit of VB, maybe some hex editing if Google holds my hand, etc etc.
My irritation probably stems from the fact, that I believe my crowd should be the link between average users and the technical challenges they face.
I see a bit of bad culture as well, with other supporters speaking down to people.

I am SUPER impressed with what stable diffusion has become, and how fast it's growing. There are some seriously brilliant people at work there.
..But it should be seen as a sign strength of a competent programmer, to be able to explain things in simple terms, instead of going "You don't know what you're talking about, get a degree".
Remember, we can all do different things here.

I'm a musician, and I know it's coming for us next. Although.. probably not as effectively. music diffusion, or "riffusion" is kind of shit, as it is right now.

So, how good did you think stable diffusion was in august 2022 when it first came out? Or the first steps of Midjourney or Dall-E? Now, all of these tools are getting way too good... Not that I'm complaining, these tools allow me to do things I never hoped I would be able to...

Given that interest in the visual art is much greater and that many more people are interested in producing images than producing music, it might take LONGER for Generative Music AI to get good. However, I in no way believe it will be LESS EFFECTIVE than Generative Image AI once it gets there. The rules and principles for music are a MUCH simpler and coherent curriculum than the rules and principles of visual arts. Once better models come out, get ready to be swept away real fast.

Yeah, it's crazy how fast things are progressing! Let's see how these law suits play out. I can kind of see both sides..
I hear Bing's GPT chatbot wants to take over the world..

Not sure I agree that music is simpler, but I'm sure music AI is coming, fast.

GPT4 has this to say about the pros and cons of using ema.ckpt vs nonema.ckpt for inference or training?

Using ema.ckpt and nonema.ckpt for inference and training have different implications. Let's discuss the pros and cons of using each type in both scenarios:

ema.ckpt for Inference:
Pros:

Better generalization: EMA checkpoints tend to have better generalization performance, as they smooth out the model weights, leading to better predictions on unseen data.
Improved stability: EMA checkpoints can provide more stable predictions, as they are less sensitive to noisy or sudden changes in the training data.
Cons:

Slightly increased complexity: Using EMA checkpoints during inference might introduce some additional complexity in terms of model deployment and management.
nonema.ckpt for Inference:
Pros:

Simpler deployment: Non-EMA checkpoints are easier to deploy and manage, as they do not require additional computation or storage overhead during inference.
Cons:

Potential overfitting: Non-EMA checkpoints may be more prone to overfitting, which can lead to worse performance on unseen data during inference.
Less stable predictions: Non-EMA checkpoints might produce less stable predictions, as they are more sensitive to noisy or sudden changes in the training data.
ema.ckpt for Training:
Pros:

Improved generalization: EMA checkpoints can result in better generalization by reducing the noise in the training process.
Less prone to overfitting: EMA checkpoints can help reduce overfitting by smoothing out the weights of the model during training.
Stable convergence: EMA checkpoints often lead to more stable convergence, as they average the recent model weights.
Cons:

Slower training: EMA checkpoints can result in slower training, as the exponential moving average needs to be calculated and maintained for each weight during training.
Increased complexity: Implementing EMA checkpoints can add complexity to the training process, as they require additional computation and memory overhead.
nonema.ckpt for Training:
Pros:

Faster training: Non-EMA checkpoints have a faster training process, as they do not require additional computations for maintaining the moving average of weights.
Simpler implementation: Non-EMA checkpoints are easier to implement and require less overhead in terms of computation and memory.
Cons:

Prone to overfitting: Non-EMA checkpoints might be more prone to overfitting, as they do not benefit from the weight smoothing provided by the EMA method.
Less stable convergence: Non-EMA checkpoints can exhibit less stable convergence, as they rely on a single snapshot of the model weights during training.

In general, using EMA checkpoints (ema.ckpt) for both training and inference can lead to better generalization and stability, at the cost of increased complexity and slower training times. On the other hand, non-EMA checkpoints (nonema.ckpt) provide faster training and simpler implementation, but may be more prone to overfitting and less stable convergence. The choice between the two depends on the specific requirements and priorities of your machine learning project.

GPT4 has this to say about the pros and cons of using ema.ckpt vs nonema.ckpt for inference or training?

Using ema.ckpt and nonema.ckpt for inference and training have different implications. Let's discuss the pros and cons of using each type in both scenarios

Impressive. With an extension, I allowed ChatGPT 3 to access the internet to search for an answer to the same question. But it doesn't provide this detailed explanation, only mentions the definition of EMA and nothing else. And it left me a link to this discussion.

I'm also new on this.

Here's a Tweet from Tanishq who works part-time with stabilityAI:

https://twitter.com/iScienceLuvr/status/1601011140934664193

And this is from Google Bard:

You can use either EMA or Non-EMA Stability Diffusion model for personal and commercial use. However, there are some things to keep in mind.

  • EMA is more stable and produces more realistic results, but it is also slower to train and requires more memory.
  • Non-EMA is faster to train and requires less memory, but it is less stable and may produce less realistic results.

Ultimately, the best choice for you will depend on your specific needs and preferences. If you are not sure which option to choose, I recommend starting with EMA and then experimenting with Non-EMA if you are not satisfied with the results.

Here are some additional details about EMA and Non-EMA:

  • EMA is created by averaging the weights of the model over a certain number of steps. This helps to smooth out the noise in the training data and produce more stable results.

  • Non-EMA is the raw model, which means that it is not averaged over any steps. This makes it faster to train, but it also makes it more prone to overfitting.

Here are some examples of when you might want to use EMA:

  • When you want to generate high-quality images.
  • When you are short on time.
  • When you are using a less powerful computer.

Here are some examples of when you might want to use Non-EMA:

  • When you need to generate images quickly.
  • When you are experimenting with different parameters.
  • When you are using a powerful computer with a lot of memory.

TLDR: An employee from stabilityAI recommends using EMA. start with EMA and then experiment with Non-EMA if you are not satisfied with the results.

I am still confused about the weight size here:
Since ema is the checkpoint for training, the ema version should be larger?
And how can a model be ema and be pruned at the same time?
I suppose pruning is taken place after the training?

The content below is from
https://huggingface.co/runwayml/stable-diffusion-v1-5

...
Original GitHub Repository
Download the weights

v1-5-pruned-emaonly.ckpt - 4.27GB, ema-only weight. uses less VRAM - suitable for inference
v1-5-pruned.ckpt - 7.7GB, ema+non-ema weights. uses more VRAM - suitable for fine-tuning
Follow instructions here.
....

ema only use less VRAM? Doesn't it mean training version use less VRAM?

What are the pros and cons of using ema.ckpt vs nonema.ckpt? I saw some Reddit/Github discussions on it, but would still like to understand better.

Used for : Generating images (webui), no training.

Thanks πŸ™‚

https://www.youtube.com/watch?app=desktop&v=PG5KrzjLJ8M

Sign up or log in to comment