camenduru
/

NeMo

Model card Files Files and versions Community

NeMo / docs /source /common /callbacks.rst

camenduru's picture

thanks to NVIDIA ❤

7934b29 about 2 years ago

history blame contribute delete

1.74 kB

	*********
	Callbacks
	*********

	Exponential Moving Average (EMA)
	================================

	During training, EMA maintains a moving average of the trained parameters.
	EMA parameters can produce significantly better results and faster convergence for a variety of different domains and models.

	EMA is a simple calculation. EMA Weights are pre-initialized with the model weights at the start of training.

	Every training update, the EMA weights are updated based on the new model weights.

	.. math::
	ema_w = ema_w * decay + model_w * (1-decay)

	Enabling EMA is straightforward. We can pass the additional argument to the experiment manager at runtime.

	.. code-block:: bash

	python examples/asr/asr_ctc/speech_to_text_ctc.py \
	model.train_ds.manifest_filepath=/path/to/my/train/manifest.json \
	model.validation_ds.manifest_filepath=/path/to/my/validation/manifest.json \
	trainer.devices=2 \
	trainer.accelerator='gpu' \
	trainer.max_epochs=50 \
	exp_manager.ema.enable=True # pass this additional argument to enable EMA

	To change the decay rate, pass the additional argument.

	.. code-block:: bash

	python examples/asr/asr_ctc/speech_to_text_ctc.py \
	...
	exp_manager.ema.enable=True \
	exp_manager.ema.decay=0.999

	We also offer other helpful arguments.

	.. list-table::
	:header-rows: 1

	* - Argument
	- Description
	* - `exp_manager.ema.validate_original_weights=True`
	- Validate the original weights instead of EMA weights.
	* - `exp_manager.ema.every_n_steps=2`
	- Apply EMA every N steps instead of every step.
	* - `exp_manager.ema.cpu_offload=True`
	- Offload EMA weights to CPU. May introduce significant slow-downs.