|
NeMo Models |
|
=========== |
|
|
|
Basics |
|
------ |
|
|
|
NeMo models contain everything needed to train and reproduce conversational AI models: |
|
|
|
- neural network architectures |
|
- datasets/data loaders |
|
- data preprocessing/postprocessing |
|
- data augmentors |
|
- optimizers and schedulers |
|
- tokenizers |
|
- language models |
|
|
|
NeMo uses `Hydra <https://hydra.cc/>`_ for configuring both NeMo models and the PyTorch Lightning Trainer. |
|
|
|
.. note:: |
|
Every NeMo model has an example configuration file and training script that can be found `here <https://github.com/NVIDIA/NeMo/tree/stable/examples>`__. |
|
|
|
The end result of using NeMo, `Pytorch Lightning <https://github.com/PyTorchLightning/pytorch-lightning>`__, and Hydra is that NeMo models all have the same look and feel and are also fully compatible with the PyTorch ecosystem. |
|
|
|
Pretrained |
|
---------- |
|
|
|
NeMo comes with many pretrained models for each of our collections: ASR, NLP, and TTS. Every pretrained NeMo model can be downloaded |
|
and used with the ``from_pretrained()`` method. |
|
|
|
As an example, we can instantiate QuartzNet with the following: |
|
|
|
.. code-block:: Python |
|
|
|
import nemo.collections.asr as nemo_asr |
|
|
|
model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En") |
|
|
|
To see all available pretrained models for a specific NeMo model, use the ``list_available_models()`` method: |
|
|
|
.. code-block:: Python |
|
|
|
nemo_asr.models.EncDecCTCModel.list_available_models() |
|
|
|
For detailed information on the available pretrained models, refer to the collections documentation: |
|
|
|
- :doc:`Automatic Speech Recognition (ASR) <../asr/intro>` |
|
- :doc:`Natural Language Processing (NLP) <../nlp/models>` |
|
- :doc:`Text-to-Speech Synthesis (TTS) <../tts/intro>` |
|
|
|
Training |
|
-------- |
|
|
|
NeMo leverages `PyTorch Lightning <https://www.pytorchlightning.ai/>`__ for model training. PyTorch Lightning lets NeMo decouple the |
|
conversational AI code from the PyTorch training code. This means that NeMo users can focus on their domain (ASR, NLP, TTS) and |
|
build complex AI applications without having to rewrite boilerplate code for PyTorch training. |
|
|
|
When using PyTorch Lightning, NeMo users can automatically train with: |
|
|
|
- multi-GPU/multi-node |
|
- mixed precision |
|
- model checkpointing |
|
- logging |
|
- early stopping |
|
- and more |
|
|
|
The two main aspects of the Lightning API are the `LightningModule <https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#>`_ |
|
and the `Trainer <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html>`_. |
|
|
|
PyTorch Lightning ``LightningModule`` |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Every NeMo model is a ``LightningModule`` which is an ``nn.module``. This means that NeMo models are compatible with the PyTorch |
|
ecosystem and can be plugged into existing PyTorch workflows. |
|
|
|
Creating a NeMo model is similar to any other PyTorch workflow. We start by initializing our model architecture, then define the forward pass: |
|
|
|
.. code-block:: python |
|
|
|
class TextClassificationModel(NLPModel, Exportable): |
|
... |
|
def __init__(self, cfg: DictConfig, trainer: Trainer = None): |
|
"""Initializes the BERTTextClassifier model.""" |
|
... |
|
super().__init__(cfg=cfg, trainer=trainer) |
|
|
|
# instantiate a BERT based encoder |
|
self.bert_model = get_lm_model( |
|
config_file=cfg.language_model.config_file, |
|
config_dict=cfg.language_model.config, |
|
vocab_file=cfg.tokenizer.vocab_file, |
|
trainer=trainer, |
|
cfg=cfg, |
|
) |
|
|
|
# instantiate the FFN for classification |
|
self.classifier = SequenceClassifier( |
|
hidden_size=self.bert_model.config.hidden_size, |
|
num_classes=cfg.dataset.num_classes, |
|
num_layers=cfg.classifier_head.num_output_layers, |
|
activation='relu', |
|
log_softmax=False, |
|
dropout=cfg.classifier_head.fc_dropout, |
|
use_transformer_init=True, |
|
idx_conditioned_on=0, |
|
) |
|
|
|
.. code-block:: python |
|
|
|
def forward(self, input_ids, token_type_ids, attention_mask): |
|
""" |
|
No special modification required for Lightning, define it as you normally would |
|
in the `nn.Module` in vanilla PyTorch. |
|
""" |
|
hidden_states = self.bert_model( |
|
input_ids=input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask |
|
) |
|
logits = self.classifier(hidden_states=hidden_states) |
|
return logits |
|
|
|
The ``LightningModule`` organizes PyTorch code so that across all NeMo models we have a similar look and feel. |
|
For example, the training logic can be found in ``training_step``: |
|
|
|
.. code-block:: python |
|
|
|
def training_step(self, batch, batch_idx): |
|
""" |
|
Lightning calls this inside the training loop with the data from the training dataloader |
|
passed in as `batch`. |
|
""" |
|
# forward pass |
|
input_ids, input_type_ids, input_mask, labels = batch |
|
logits = self.forward(input_ids=input_ids, token_type_ids=input_type_ids, attention_mask=input_mask) |
|
|
|
train_loss = self.loss(logits=logits, labels=labels) |
|
|
|
lr = self._optimizer.param_groups[0]['lr'] |
|
|
|
self.log('train_loss', train_loss) |
|
self.log('lr', lr, prog_bar=True) |
|
|
|
return { |
|
'loss': train_loss, |
|
'lr': lr, |
|
} |
|
|
|
While validation logic can be found in ``validation_step``: |
|
|
|
.. code-block:: python |
|
|
|
def validation_step(self, batch, batch_idx): |
|
""" |
|
Lightning calls this inside the validation loop with the data from the validation dataloader |
|
passed in as `batch`. |
|
""" |
|
if self.testing: |
|
prefix = 'test' |
|
else: |
|
prefix = 'val' |
|
|
|
input_ids, input_type_ids, input_mask, labels = batch |
|
logits = self.forward(input_ids=input_ids, token_type_ids=input_type_ids, attention_mask=input_mask) |
|
|
|
val_loss = self.loss(logits=logits, labels=labels) |
|
|
|
preds = torch.argmax(logits, axis=-1) |
|
|
|
tp, fn, fp, _ = self.classification_report(preds, labels) |
|
|
|
return {'val_loss': val_loss, 'tp': tp, 'fn': fn, 'fp': fp} |
|
|
|
PyTorch Lightning then handles all of the boilerplate code needed for training. Virtually any aspect of training can be customized |
|
via PyTorch Lightning `hooks <https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#hooks>`_, |
|
`Plugins <https://pytorch-lightning.readthedocs.io/en/stable/extensions/plugins.html>`_, |
|
`callbacks <https://pytorch-lightning.readthedocs.io/en/stable/extensions/callbacks.html>`_, or by overriding `methods <https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#methods>`_. |
|
|
|
For more domain-specific information, see: |
|
|
|
- :doc:`Automatic Speech Recognition (ASR) <../asr/intro>` |
|
- :doc:`Natural Language Processing (NLP) <../nlp/models>` |
|
- :doc:`Text-to-Speech Synthesis (TTS) <../tts/intro>` |
|
|
|
PyTorch Lightning Trainer |
|
~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Since every NeMo model is a ``LightningModule``, we can automatically take advantage of the PyTorch Lightning ``Trainer``. Every NeMo |
|
`example <https://github.com/NVIDIA/NeMo/tree/v1.0.2/examples>`_ training script uses the ``Trainer`` object to fit the model. |
|
|
|
First, instantiate the model and trainer, then call ``.fit``: |
|
|
|
.. code-block:: python |
|
|
|
# We first instantiate the trainer based on the model configuration. |
|
# See the model configuration documentation for details. |
|
trainer = pl.Trainer(**cfg.trainer) |
|
|
|
# Then pass the model configuration and trainer object into the NeMo model |
|
model = TextClassificationModel(cfg.model, trainer=trainer) |
|
|
|
# Now we can train with by calling .fit |
|
trainer.fit(model) |
|
|
|
# Or we can run the test loop on test data by calling |
|
trainer.test(model=model) |
|
|
|
All `trainer flags <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#trainer-flags>`_ can be set from from the NeMo configuration. |
|
|
|
|
|
Configuration |
|
------------- |
|
|
|
Hydra is an open-source Python framework that simplifies configuration for complex applications that must bring together many different |
|
software libraries. Conversational AI model training is a great example of such an application. To train a conversational AI model, we |
|
must be able to configure: |
|
|
|
- neural network architectures |
|
- training and optimization algorithms |
|
- data pre/post processing |
|
- data augmentation |
|
- experiment logging/visualization |
|
- model checkpointing |
|
|
|
For an introduction to using Hydra, refer to the `Hydra Tutorials <https://hydra.cc/docs/tutorials/intro>`_. |
|
|
|
With Hydra, we can configure everything needed for NeMo with three interfaces: |
|
|
|
- Command Line (CLI) |
|
- Configuration Files (YAML) |
|
- Dataclasses (Python) |
|
|
|
YAML |
|
~~~~ |
|
|
|
NeMo provides YAML configuration files for all of our `example <https://github.com/NVIDIA/NeMo/tree/v1.0.2/examples>`_ training scripts. |
|
YAML files make it easy to experiment with different model and training configurations. |
|
|
|
Every NeMo example YAML has the same underlying configuration structure: |
|
|
|
- trainer |
|
- exp_manager |
|
- model |
|
|
|
The model configuration always contains ``train_ds``, ``validation_ds``, ``test_ds``, and ``optim``. Model architectures, however, can vary across domains. |
|
Refer to the documentation of specific collections (LLM, ASR etc.) for detailed information on model architecture configuration. |
|
|
|
A NeMo configuration file should look similar to the following: |
|
|
|
.. code-block:: yaml |
|
|
|
# PyTorch Lightning Trainer configuration |
|
# any argument of the Trainer object can be set here |
|
trainer: |
|
devices: 1 # number of gpus per node |
|
accelerator: gpu |
|
num_nodes: 1 # number of nodes |
|
max_epochs: 10 # how many training epochs to run |
|
val_check_interval: 1.0 # run validation after every epoch |
|
|
|
# Experiment logging configuration |
|
exp_manager: |
|
exp_dir: /path/to/my/nemo/experiments |
|
name: name_of_my_experiment |
|
create_tensorboard_logger: True |
|
create_wandb_logger: True |
|
|
|
# Model configuration |
|
# model network architecture, train/val/test datasets, data augmentation, and optimization |
|
model: |
|
train_ds: |
|
manifest_filepath: /path/to/my/train/manifest.json |
|
batch_size: 256 |
|
shuffle: True |
|
validation_ds: |
|
manifest_filepath: /path/to/my/validation/manifest.json |
|
batch_size: 32 |
|
shuffle: False |
|
test_ds: |
|
manifest_filepath: /path/to/my/test/manifest.json |
|
batch_size: 32 |
|
shuffle: False |
|
optim: |
|
name: novograd |
|
lr: .01 |
|
betas: [0.8, 0.5] |
|
weight_decay: 0.001 |
|
# network architecture can vary greatly depending on the domain |
|
encoder: |
|
... |
|
decoder: |
|
... |
|
|
|
CLI |
|
~~~ |
|
|
|
With NeMo and Hydra, every aspect of model training can be modified from the command-line. This is extremely helpful for running lots |
|
of experiments on compute clusters or for quickly testing parameters during development. |
|
|
|
All NeMo `examples <https://github.com/NVIDIA/NeMo/tree/stable/examples>`_ come with instructions on how to |
|
run the training/inference script from the command-line (e.g. see `here <https://github.com/NVIDIA/NeMo/blob/stable/examples/asr/asr_ctc/speech_to_text_ctc.py>`__ |
|
for an example). |
|
|
|
With Hydra, arguments are set using the ``=`` operator: |
|
|
|
.. code-block:: bash |
|
|
|
python examples/asr/asr_ctc/speech_to_text_ctc.py \ |
|
model.train_ds.manifest_filepath=/path/to/my/train/manifest.json \ |
|
model.validation_ds.manifest_filepath=/path/to/my/validation/manifest.json \ |
|
trainer.devices=2 \ |
|
trainer.accelerator='gpu' \ |
|
trainer.max_epochs=50 |
|
|
|
We can use the ``+`` operator to add arguments from the CLI: |
|
|
|
.. code-block:: bash |
|
|
|
python examples/asr/asr_ctc/speech_to_text_ctc.py \ |
|
model.train_ds.manifest_filepath=/path/to/my/train/manifest.json \ |
|
model.validation_ds.manifest_filepath=/path/to/my/validation/manifest.json \ |
|
trainer.devices=2 \ |
|
trainer.accelerator='gpu' \ |
|
trainer.max_epochs=50 \ |
|
+trainer.fast_dev_run=true |
|
|
|
We can use the ``~`` operator to remove configurations: |
|
|
|
.. code-block:: bash |
|
|
|
python examples/asr/asr_ctc/speech_to_text_ctc.py \ |
|
model.train_ds.manifest_filepath=/path/to/my/train/manifest.json \ |
|
model.validation_ds.manifest_filepath=/path/to/my/validation/manifest.json \ |
|
~model.test_ds \ |
|
trainer.devices=2 \ |
|
trainer.accelerator='gpu' \ |
|
trainer.max_epochs=50 \ |
|
+trainer.fast_dev_run=true |
|
|
|
We can specify configuration files using the ``--config-path`` and ``--config-name`` flags: |
|
|
|
.. code-block:: bash |
|
|
|
python examples/asr/asr_ctc/speech_to_text_ctc.py \ |
|
--config-path=conf/quartznet \ |
|
--config-name=quartznet_15x5 \ |
|
model.train_ds.manifest_filepath=/path/to/my/train/manifest.json \ |
|
model.validation_ds.manifest_filepath=/path/to/my/validation/manifest.json \ |
|
~model.test_ds \ |
|
trainer.devices=2 \ |
|
trainer.accelerator='gpu' \ |
|
trainer.max_epochs=50 \ |
|
+trainer.fast_dev_run=true |
|
|
|
Dataclasses |
|
~~~~~~~~~~~ |
|
|
|
Dataclasses allow NeMo to ship model configurations as part of the NeMo library and also enables pure Python configuration of NeMo models. |
|
With Hydra, dataclasses can be used to create `structured configs <https://hydra.cc/docs/tutorials/structured_config/intro>`_ for the conversational AI application. |
|
|
|
As an example, refer to the code block below for an *Attenion is All You Need* machine translation model. The model configuration can |
|
be instantiated and modified like any Python `Dataclass <https://docs.python.org/3/library/dataclasses.html>`_. |
|
|
|
.. code-block:: Python |
|
|
|
from nemo.collections.nlp.models.machine_translation.mt_enc_dec_config import AAYNBaseConfig |
|
|
|
cfg = AAYNBaseConfig() |
|
|
|
# modify the number of layers in the encoder |
|
cfg.encoder.num_layers = 8 |
|
|
|
# modify the training batch size |
|
cfg.train_ds.tokens_in_batch = 8192 |
|
|
|
.. note:: Configuration with Hydra always has the following precedence CLI > YAML > Dataclass. |
|
|
|
.. _optimization-label: |
|
|
|
Optimization |
|
------------ |
|
|
|
Optimizers and learning rate schedules are configurable across all NeMo models and have their own namespace. Here is a sample YAML |
|
configuration for a Novograd optimizer with a Cosine Annealing learning rate schedule. |
|
|
|
.. code-block:: yaml |
|
|
|
optim: |
|
name: novograd |
|
lr: 0.01 |
|
|
|
# optimizer arguments |
|
betas: [0.8, 0.25] |
|
weight_decay: 0.001 |
|
|
|
# scheduler setup |
|
sched: |
|
name: CosineAnnealing |
|
|
|
# Optional arguments |
|
max_steps: -1 # computed at runtime or explicitly set here |
|
monitor: val_loss |
|
reduce_on_plateau: false |
|
|
|
# scheduler config override |
|
warmup_steps: 1000 |
|
warmup_ratio: null |
|
min_lr: 1e-9: |
|
|
|
.. note:: `NeMo Examples <https://github.com/NVIDIA/NeMo/tree/stable/examples>`_ has optimizer and scheduler configurations for every NeMo model. |
|
|
|
Optimizers can be configured from the CLI as well: |
|
|
|
.. code-block:: bash |
|
|
|
python examples/asr/asr_ctc/speech_to_text_ctc.py \ |
|
--config-path=conf/quartznet \ |
|
--config-name=quartznet_15x5 \ |
|
... |
|
# train with the adam optimizer |
|
model.optim=adam \ |
|
# change the learning rate |
|
model.optim.lr=.0004 \ |
|
# modify betas |
|
model.optim.betas=[.8, .5] |
|
|
|
.. _optimizers-label: |
|
|
|
Optimizers |
|
~~~~~~~~~~ |
|
|
|
``name`` corresponds to the lowercase name of the optimizer. To view a list of available optimizers, run: |
|
|
|
.. code-block:: Python |
|
|
|
from nemo.core.optim.optimizers import AVAILABLE_OPTIMIZERS |
|
|
|
for name, opt in AVAILABLE_OPTIMIZERS.items(): |
|
print(f'name: {name}, opt: {opt}') |
|
|
|
.. code-block:: bash |
|
|
|
name: sgd opt: <class 'torch.optim.sgd.SGD'> |
|
name: adam opt: <class 'torch.optim.adam.Adam'> |
|
name: adamw opt: <class 'torch.optim.adamw.AdamW'> |
|
name: adadelta opt: <class 'torch.optim.adadelta.Adadelta'> |
|
name: adamax opt: <class 'torch.optim.adamax.Adamax'> |
|
name: adagrad opt: <class 'torch.optim.adagrad.Adagrad'> |
|
name: rmsprop opt: <class 'torch.optim.rmsprop.RMSprop'> |
|
name: rprop opt: <class 'torch.optim.rprop.Rprop'> |
|
name: novograd opt: <class 'nemo.core.optim.novograd.Novograd'> |
|
|
|
Optimizer Params |
|
~~~~~~~~~~~~~~~~ |
|
|
|
Optimizer params can vary between optimizers but the ``lr`` param is required for all optimizers. To see the available params for an |
|
optimizer, we can look at its corresponding dataclass. |
|
|
|
.. code-block:: python |
|
|
|
from nemo.core.config.optimizers import NovogradParams |
|
|
|
print(NovogradParams()) |
|
|
|
.. code-block:: bash |
|
|
|
NovogradParams(lr='???', betas=(0.95, 0.98), eps=1e-08, weight_decay=0, grad_averaging=False, amsgrad=False, luc=False, luc_trust=0.001, luc_eps=1e-08) |
|
|
|
``'???'`` indicates that the lr argument is required. |
|
|
|
Register Optimizer |
|
~~~~~~~~~~~~~~~~~~ |
|
|
|
To register a new optimizer to be used with NeMo, run: |
|
|
|
.. autofunction:: nemo.core.optim.optimizers.register_optimizer |
|
|
|
.. _learning-rate-schedulers-label: |
|
|
|
Learning Rate Schedulers |
|
~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Learning rate schedulers can be optionally configured under the ``optim.sched`` namespace. |
|
|
|
``name`` corresponds to the name of the learning rate schedule. To view a list of available schedulers, run: |
|
|
|
.. code-block:: Python |
|
|
|
from nemo.core.optim.lr_scheduler import AVAILABLE_SCHEDULERS |
|
|
|
for name, opt in AVAILABLE_SCHEDULERS.items(): |
|
print(f'name: {name}, schedule: {opt}') |
|
|
|
.. code-block:: bash |
|
|
|
name: WarmupPolicy, schedule: <class 'nemo.core.optim.lr_scheduler.WarmupPolicy'> |
|
name: WarmupHoldPolicy, schedule: <class 'nemo.core.optim.lr_scheduler.WarmupHoldPolicy'> |
|
name: SquareAnnealing, schedule: <class 'nemo.core.optim.lr_scheduler.SquareAnnealing'> |
|
name: CosineAnnealing, schedule: <class 'nemo.core.optim.lr_scheduler.CosineAnnealing'> |
|
name: NoamAnnealing, schedule: <class 'nemo.core.optim.lr_scheduler.NoamAnnealing'> |
|
name: WarmupAnnealing, schedule: <class 'nemo.core.optim.lr_scheduler.WarmupAnnealing'> |
|
name: InverseSquareRootAnnealing, schedule: <class 'nemo.core.optim.lr_scheduler.InverseSquareRootAnnealing'> |
|
name: SquareRootAnnealing, schedule: <class 'nemo.core.optim.lr_scheduler.SquareRootAnnealing'> |
|
name: PolynomialDecayAnnealing, schedule: <class 'nemo.core.optim.lr_scheduler.PolynomialDecayAnnealing'> |
|
name: PolynomialHoldDecayAnnealing, schedule: <class 'nemo.core.optim.lr_scheduler.PolynomialHoldDecayAnnealing'> |
|
name: StepLR, schedule: <class 'torch.optim.lr_scheduler.StepLR'> |
|
name: ExponentialLR, schedule: <class 'torch.optim.lr_scheduler.ExponentialLR'> |
|
name: ReduceLROnPlateau, schedule: <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'> |
|
name: CyclicLR, schedule: <class 'torch.optim.lr_scheduler.CyclicLR'> |
|
|
|
Scheduler Params |
|
~~~~~~~~~~~~~~~~ |
|
|
|
To see the available params for a scheduler, we can look at its corresponding dataclass: |
|
|
|
.. code-block:: Python |
|
|
|
from nemo.core.config.schedulers import CosineAnnealingParams |
|
|
|
print(CosineAnnealingParams()) |
|
|
|
.. code-block:: bash |
|
|
|
CosineAnnealingParams(last_epoch=-1, warmup_steps=None, warmup_ratio=None, min_lr=0.0) |
|
|
|
Register scheduler |
|
~~~~~~~~~~~~~~~~~~ |
|
|
|
To register a new scheduler to be used with NeMo, run: |
|
|
|
.. autofunction:: nemo.core.optim.lr_scheduler.register_scheduler |
|
|
|
Save and Restore |
|
---------------- |
|
|
|
NeMo models all come with ``.save_to`` and ``.restore_from`` methods. |
|
|
|
Save |
|
~~~~ |
|
|
|
To save a NeMo model, run: |
|
|
|
.. code-block:: Python |
|
|
|
model.save_to('/path/to/model.nemo') |
|
|
|
Everything needed to use the trained model is packaged and saved in the ``.nemo`` file. For example, in the NLP domain, ``.nemo`` files |
|
include the necessary tokenizer models and/or vocabulary files, etc. |
|
|
|
.. note:: A ``.nemo`` file is simply an archive like any other ``.tar`` file. |
|
|
|
Restore |
|
~~~~~~~ |
|
|
|
To restore a NeMo model, run: |
|
|
|
.. code-block:: Python |
|
|
|
# Here, you should usually use the class of the model, or simply use ModelPT.restore_from() for simplicity. |
|
model.restore_from('/path/to/model.nemo') |
|
|
|
When using the PyTorch Lightning Trainer, a PyTorch Lightning checkpoint is created. These are mainly used within NeMo to auto-resume |
|
training. Since NeMo models are ``LightningModules``, the PyTorch Lightning method ``load_from_checkpoint`` is available. Note that |
|
``load_from_checkpoint`` won't necessarily work out-of-the-box for all models as some models require more artifacts than just the |
|
checkpoint to be restored. For these models, the user will have to override ``load_from_checkpoint`` if they want to use it. |
|
|
|
It's highly recommended to use ``restore_from`` to load NeMo models. |
|
|
|
Restore with Modified Config |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Sometimes, there may be a need to modify the model (or it's sub-components) prior to restoring a model. A common case is when |
|
the model's internal config must be updated due to various reasons (such as deprecation, newer versioning, support a new feature). |
|
As long as the model has the same parameters as compared to the original config, the parameters can once again be restored safely. |
|
|
|
In NeMo, as part of the .nemo file, the model's internal config will be preserved. This config is used during restoration, and |
|
as shown below we can update this config prior to restoring the model. |
|
|
|
.. code-block:: |
|
|
|
# When restoring a model, you should generally use the class of the model |
|
# Obtain the config (as an OmegaConf object) |
|
config = model_class.restore_from('/path/to/model.nemo', return_config=True) |
|
# OR |
|
config = model_class.from_pretrained('name_of_the_model', return_config=True) |
|
|
|
# Modify the config as needed |
|
config.x.y = z |
|
|
|
# Restore the model from the updated config |
|
model = model_class.restore_from('/path/to/model.nemo', override_config_path=config) |
|
# OR |
|
model = model_class.from_pretrained('name_of_the_model', override_config_path=config) |
|
|
|
Register Artifacts |
|
------------------ |
|
|
|
Restoring conversational AI models can be complicated because it requires more than just the checkpoint weights; additional information is also needed to use the model. |
|
NeMo models can save additional artifacts in the .nemo file by calling ``.register_artifact``. |
|
When restoring NeMo models using ``.restore_from`` or ``.from_pretrained``, any artifacts that were registered will be available automatically. |
|
|
|
As an example, consider an NLP model that requires a trained tokenizer model. |
|
The tokenizer model file can be automatically added to the .nemo file with the following: |
|
|
|
.. code-block:: python |
|
|
|
self.encoder_tokenizer = get_nmt_tokenizer( |
|
... |
|
tokenizer_model=self.register_artifact(config_path='encoder_tokenizer.tokenizer_model', |
|
src='/path/to/tokenizer.model', |
|
verify_src_exists=True), |
|
) |
|
|
|
By default, ``.register_artifact`` will always return a path. If the model is being restored from a .nemo file, |
|
then that path will be to the artifact in the .nemo file. Otherwise, ``.register_artifact`` will return the local path specified by the user. |
|
|
|
``config_path`` is the artifact key. It usually corresponds to a model configuration but does not have to. |
|
The model config that is packaged with the .nemo file will be updated according to the ``config_path`` key. |
|
In the above example, the model config will have |
|
|
|
.. code-block:: YAML |
|
|
|
encoder_tokenizer: |
|
... |
|
tokenizer_model: nemo:4978b28103264263a03439aaa6560e5e_tokenizer.model |
|
|
|
``src`` is the path to the artifact and the base-name of the path will be used when packaging the artifact in the .nemo file. |
|
Each artifact will have a hash prepended to the basename of ``src`` in the .nemo file. This is to prevent collisions with basenames |
|
base-names that are identical (say when there are two or more tokenizers, both called `tokenizer.model`). |
|
The resulting .nemo file will then have the following file: |
|
|
|
.. code-block:: bash |
|
|
|
4978b28103264263a03439aaa6560e5e_tokenizer.model |
|
|
|
If ``verify_src_exists`` is set to ``False``, then the artifact is optional. This means that ``.register_artifact`` will return ``None`` |
|
if the ``src`` cannot be found. |
|
|
|
Push to Hugging Face Hub |
|
------------------------ |
|
|
|
NeMo models can be pushed to the `Hugging Face Hub <https://huggingface.co/>`_ with the :meth:`~nemo.core.classes.mixins.hf_io_mixin.HuggingFaceFileIO.push_to_hf_hub` method. This method performs the same actions as ``save_to()`` and then uploads the model to the HuggingFace Hub. It offers an additional ``pack_nemo_file`` argument that allows the user to upload the entire NeMo file or just the ``.nemo`` file. This is useful for large language models that have a massive number of parameters, and a single NeMo file could exceed the max upload size of Hugging Face Hub. |
|
|
|
|
|
Upload a model to the Hub |
|
~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
.. code-block:: python |
|
|
|
token = "<HF TOKEN>" or None |
|
pack_nemo_file = True # False will upload multiple files that comprise the NeMo file onto HF Hub; Generally useful for LLMs |
|
|
|
model.push_to_hf_hub( |
|
repo_id=repo_id, pack_nemo_file=pack_nemo_file, token=token, |
|
) |
|
|
|
Use a Custom Model Card Template for the Hub |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
.. code-block:: python |
|
|
|
# Override the default model card |
|
template = """ <Your own custom template> |
|
# {model_name} |
|
""" |
|
kwargs = {"model_name": "ABC", "repo_id": "nvidia/ABC_XYZ"} |
|
model_card = model.generate_model_card(template=template, template_kwargs=kwargs, type="hf") |
|
|
|
model.push_to_hf_hub( |
|
repo_id=repo_id, token=token, model_card=model_card |
|
) |
|
|
|
# Write your own model card class |
|
class MyModelCard: |
|
def __init__(self, model_name): |
|
self.model_name = model_name |
|
|
|
def __repr__(self): |
|
template = """This is the {model_name} model""".format(model_name=self.model_name) |
|
return template |
|
|
|
model.push_to_hf_hub( |
|
repo_id=repo_id, token=token, model_card=MyModelCard("ABC") |
|
) |
|
|
|
|
|
Nested NeMo Models |
|
------------------ |
|
|
|
In some cases, it may be helpful to use NeMo models inside other NeMo models. For example, we can incorporate language models into ASR models to use in a decoding process to improve accuracy or use hybrid ASR-TTS models to generate audio from the text on the fly to train or fine-tune the ASR model. |
|
|
|
There are three ways to instantiate child models inside parent models: |
|
|
|
- use subconfig directly |
|
- use the ``.nemo`` checkpoint path to load the child model |
|
- use a pretrained NeMo model |
|
|
|
To register a child model, use the ``register_nemo_submodule`` method of the parent model. This method will add the child model to a specified model attribute. During serialization, it will correctly handle child artifacts and store the child model’s configuration in the parent model’s ``config_field``. |
|
|
|
.. code-block:: python |
|
|
|
from nemo.core.classes import ModelPT |
|
|
|
class ChildModel(ModelPT): |
|
... # implement necessary methods |
|
|
|
class ParentModel(ModelPT): |
|
def __init__(self, cfg, trainer=None): |
|
super().__init__(cfg=cfg, trainer=trainer) |
|
|
|
# optionally annotate type for IDE autocompletion and type checking |
|
self.child_model: Optional[ChildModel] |
|
if cfg.get("child_model") is not None: |
|
# load directly from config |
|
# either if config provided initially, or automatically |
|
# after model restoration |
|
self.register_nemo_submodule( |
|
name="child_model", |
|
config_field="child_model", |
|
model=ChildModel(self.cfg.child_model, trainer=trainer), |
|
) |
|
elif cfg.get('child_model_path') is not None: |
|
# load from .nemo model checkpoint |
|
# while saving, config will be automatically assigned/updated |
|
# in cfg.child_model |
|
self.register_nemo_submodule( |
|
name="child_model", |
|
config_field="child_model", |
|
model=ChildModel.restore_from(self.cfg.child_model_path, trainer=trainer), |
|
) |
|
elif cfg.get('child_model_name') is not None: |
|
# load from pretrained model |
|
# while saving, config will be automatically assigned/updated |
|
# in cfg.child_model |
|
self.register_nemo_submodule( |
|
name="child_model", |
|
config_field="child_model", |
|
model=ChildModel.from_pretrained(self.cfg.child_model_name, trainer=trainer), |
|
) |
|
else: |
|
self.child_model = None |
|
|
|
|
|
|
|
Profiling |
|
--------- |
|
|
|
NeMo offers users two options for profiling: Nsys and CUDA memory profiling. These two options allow users |
|
to debug performance issues as well as memory issues such as memory leaks. |
|
|
|
To enable Nsys profiling, add the following options to the model config: |
|
|
|
.. code-block:: yaml |
|
|
|
nsys_profile: False |
|
start_step: 10 # Global batch to start profiling |
|
end_step: 10 # Global batch to end profiling |
|
ranks: [0] # Global rank IDs to profile |
|
gen_shape: False # Generate model and kernel details including input shapes |
|
|
|
Finally, run the model training script with: |
|
|
|
.. code-block:: bash |
|
|
|
nsys profile -s none -o <profile filepath> -t cuda,nvtx --force-overwrite true --capture-range=cudaProfilerApi --capture-range-end=stop python ./examples/... |
|
|
|
See more options at `nsight user guide <https://docs.nvidia.com/nsight-systems/UserGuide/index.html#cli-profiling>`_. |
|
|
|
|
|
|
|
To enable CUDA memory profiling, add the following options to the model config: |
|
|
|
.. code-block:: yaml |
|
|
|
memory_profile: |
|
enabled: True |
|
start_step: 10 # Global batch to start profiling |
|
end_step: 10 # Global batch to end profiling |
|
rank: 0 # Global rank ID to profile |
|
output_path: None # Path to store the profile output file |
|
|
|
Then invoke your NeMo script without any changes in the invocation command. |
|
|