Any plans to use RMSNorm (or FlashNorm) instead of LayerNorm?
#12
by
graefics
- opened
Llama and many other LLMs use RMSNorm. Any reason why you still use LayerNorm? Thanks
FlashNorm: https://arxiv.org/abs/2407.09577
RMSNorm: https://arxiv.org/abs/1910.07467
Yes.
I think LayerNorm have mean and variance of input but RMSNorm haven't it.
So many LLM use RMSNorm.
Of course you can use LayerNorm.