Transfer Learning of Multimodal Models
In the preceding sections, we’ve delved into the fundamental concepts of multimodal models such as CLIP and its related counterparts. In this chapter, we will try to find out how you can use different types of multimodal models for your tasks.
There are several approaches to how you can adapt multimodal models to your use case:
Zero\few-shot learning. Zero\few-shot learning involves leveraging large pretrained models capable of solving problems not present in the training data. These approaches can be useful when there is little labeled data for a task (5-10 examples) or there is none at all. Unit 11 will delve deeper into this topic.
Training the model from scratch. When pre-trained model weights are unavailable or the model’s dataset substantially differs from your own, this method becomes necessary. Here, we initialize model weights randomly (or via more sophisticated methods like He initialization) and proceed with the usual training. However, this approach demands substantial amounts of training data.
Transfer learning. Transfer learning, unlike training from scratch, uses the weights of the pretrained model as initial weights.
This chapter primarily focuses on the transfer learning aspect within multimodal models. It will recap what transfer learning entails, elucidate its advantages, and provide practical examples illustrating how you can apply transfer learning to your tasks!
Transfer learning
More formally, transfer learning is the set of machine learning techniques in which knowledge, representations or patterns that are obtained from solving one problem are reused to solve another, but similar problem.
In the context of deep learning, in transfer learning, when training a model for a particular task, we use the weights of another model as the initial weights. The pretrained model has typically been trained on a huge amount of data and has useful knowledge about the nature and relationships in that data. This knowledge is embedded in the weights of this model, and therefore if we use them as initial weights, we transfer the knowledge embedded in the pretrained model into the model we are training.
This approach has several advantages:
Resource Efficiency: Since the pretrained model was trained on a huge amount of data over a long period, transfer learning requires much less computing resources for model convergence.
Reducing the size of labeled data: For the same reason, less data is required to achieve decent quality on the test sample.
Knowledge Transfer: When fine-tuning to the new task, the model capitalizes on the pre-existing knowledge encoded within the pre-trained model’s weights. This integration of prior knowledge often leads to enhanced performance on the new task.
However, despite its advantages, transfer learning has some challenges that should also be taken into account:
Domain Shift: Adapting knowledge from the source domain to the target domain can be challenging if the data distributions differ substantially.
Catastrophic forgetting: During fine-tuning process, the model adjusts its parameters to adapt to the new task, often causing it to lose the previously learned knowledge or representations related to the initial task.
Transfer Learning Applications
We’ll explore practical applications of transfer learning across various tasks. Navigate to the Jupyter notebook relevant to your task of interest from the provided table.
Task | Description | Model | Notebook |
---|---|---|---|
Fine-tune CLIP | Fine-tuning CLIP on a custom dataset | openai/clip-vit-base-patch32 | CLIP notebook |
VQA | Answering a question in natural language based on an image | dandelin/vilt-b32-finetuned-vqa | VQA notebook |
Image-to-Text | Describing an image in natural language | Salesforce/blip-image-captioning-large | Text 2 Image notebook |
Open-set object detection | Detect objects by natural language input | Grounding DINO | Grounding DINO notebook |
Assistant (GTP-4V like) | Instruction tuning in the multimodal field | LLaVA | LLaVa notebook |