NeMo_Canary / docs /source /multimodal /vlm /checkpoint.rst

Upload folder using huggingface_hub

b386992 verified about 2 months ago

3.64 kB

	Checkpoints
	===========

	In this section, we present four key functionalities of NVIDIA NeMo related to checkpoint management:

	1. Checkpoint Loading: Use the :code:`restore_from()` method to load local ``.nemo`` checkpoint files.
	2. Partial Checkpoint Conversion: Convert partially-trained ``.ckpt`` checkpoints to the ``.nemo`` format.
	3. Community Checkpoint Conversion: Transition checkpoints from community sources, like HuggingFace, into the ``.nemo`` format.
	4. Model Parallelism Adjustment: Modify model parallelism to efficiently train models that exceed the memory of a single GPU. NeMo employs both tensor (intra-layer) and pipeline (inter-layer) model parallelisms. Dive deeper with `"Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM" <https://arxiv.org/pdf/2104.04473.pdf>`_. This tool aids in adjusting model parallelism, accommodating users who need to deploy on larger GPU arrays due to memory constraints.

	Understanding Checkpoint Formats
	--------------------------------

	A ``.nemo`` checkpoint is fundamentally a tar file that bundles the model configurations (given as a YAML file), model weights, and other pertinent artifacts like tokenizer models or vocabulary files. This consolidated design streamlines sharing, loading, tuning, evaluating, and inference.

	Contrarily, the ``.ckpt`` file, created during PyTorch Lightning training, encompasses both the model weights and the optimizer states, usually employed to pick up training from a pause.

	The subsequent sections elucidate instructions for the functionalities above, specifically tailored for deploying fully trained checkpoints for assessment or additional fine-tuning.

	Loading Local Checkpoints
	-------------------------

	By default, NeMo saves checkpoints of trained models in the ``.nemo`` format. To save a model manually during training, use:

	.. code-block:: python

	model.save_to(<checkpoint_path>.nemo)

	To load a local ``.nemo`` checkpoint:

	.. code-block:: python

	import nemo.collections.multimodal as nemo_multimodal
	model = nemo_multimodal.models.<MODEL_BASE_CLASS>.restore_from(restore_path="<path/to/checkpoint/file.nemo>")

	Replace `<MODEL_BASE_CLASS>` with the appropriate MM model class.

	Converting Community Checkpoints
	--------------------------------

	CLIP Checkpoints
	^^^^^^^^^^^^^^^^


	To migrate community checkpoints, use the following command:

	.. code-block:: bash

	torchrun --nproc-per-node=1 /opt/NeMo/scripts/checkpoint_converters/convert_clip_hf_to_nemo.py \
	--input_name_or_path=openai/clip-vit-large-patch14 \
	--output_path=openai_clip.nemo \
	--hparams_file=/opt/NeMo/examples/multimodal/vision_language_foundation/clip/conf/megatron_clip_VIT-L-14.yaml

	Ensure the NeMo hparams file has the correct model architectural parameters, placed at `path/to/saved.yaml`. An example can be found in `examples/multimodal/foundation/clip/conf/megatron_clip_config.yaml`.

	After conversion, you can verify the model with the following command:

	.. code-block:: bash

	wget https://upload.wikimedia.org/wikipedia/commons/0/0f/1665_Girl_with_a_Pearl_Earring.jpg
	torchrun --nproc-per-node=1 /opt/NeMo/examples/multimodal/vision_language_foundation/clip/megatron_clip_infer.py \
	model.restore_from_path=./openai_clip.nemo \
	image_path=./1665_Girl_with_a_Pearl_Earring.jpg \
	texts='["a dog", "a boy", "a girl"]'

	It should generate a high probability for the "a girl" tag. For example:

	.. code-block:: text

	Given image's CLIP text probability: [('a dog', 0.0049710185), ('a boy', 0.002258187), ('a girl', 0.99277073)]