Community Computer Vision Course documentation

Transfer Learning

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Transfer Learning

Before we can dig into the details of what transfer learning and fine-tuning mean for neural networks, let’s take musical instruments as an example. The theremin is an electronic musical instrument that makes an eerie sound, commonly associated with thrillers and horror movies. It is very hard to play because it requires you to move both your hands in the air between two antennae to control the pitch and volume. So hard, that someone invented an instrument called the Tannerin (it is also known as slide Theremin or Eletric-Theremin) that makes a similar sound but it is easier to play. The player moves the slides at the side of the box to the desired frequency to create a pitch. There is still a learning curve to play it… Well, except if you play the trombone. When you play the trombone, you already know how to use the tannerin slide because it is the same as the telescope slide mechanisms on the trombone. Below, you see from left to right: the theremin, the tannerin, and the trombone.

Theremin, Tannerin, and the Trombone

In this case, the trombone player has effectively used what he learned by playing the trombone to play the tannerin. He transfers what he learned from one instrument to another. We can use this concept in neural networks as well. What a neural network learns while classifying dogs or cats can be used to recognize other animals. The explanation for why this works is due to the way networks learn features in the model. That is, the learned feature used to classify a dog, also classify a horse. We exploit what the model already knows to do different tasks.

Transfer learning requires that the previous knowledge is “useful” for the other task. Thus, the features we are trying to explore need to be general enough for the new application. If we go back to our musical instrument example, playing the saxophone instead of the trombone is not as helpful in learning how to play the tannerin. The main skill that gives the trombone player its head start is the intuitive understanding of where the slide should be.

Yet, the saxophone player is not starting from zero. He is familiar with things like music theory, rhythm, and timing. These general skills give them an edge over someone who never played any instrument at all. The act of playing an instrument gives all players a general set of skills that are useful across instruments. This generalization across domains (in our example, musical instruments) is what makes the model learn much faster as opposed to training from zero.

Transfer Learning and Fine-tuning

Let’s make a distinction between the concepts we are talking about. The trombone player needs no training to play the tannerin. He already knows how to do it unbeknownst to him. The saxophone player needs some training to fine-tune his skills to play the tannerin. In deep learning terms, the trombone player uses a model off-the-shelf. This is called transfer learning. The training of a model that needs more time to learn, like our saxophone player, is called fine-tuning.

When fine-tuning a model, we do not need to train all parts. We can train just the underperforming ones. Let’s take the example of a computer vision model that has three parts: feature extraction, feature enhancement, and a final task. In this case, you can use the same feature extraction and feature enhancement without any retraining. So, we focus on retraining only the final task.

If the results after fine-tuning the final task are not satisfactory, we still do not need to retrain the entire feature extraction part. A good compromise is to retrain only the weights of the top layers. In convolutional networks, the higher up a layer is, the more characteristic its features are to the task and dataset. In other words, the features in the first convolutional layers are more generic, while the last layers are more specific. With our player example, this is the equivalent of not wasting time trying to explain music theory to a seasoned saxophonist, but instead just teaching him how to change pitch in the tannerin.

Considerations on Transfer Learning

Our example also gives us an interesting nuance. The theremin was too hard to play, so they invented an easier instrument that produced the same sound. The output is nearly the same but needs a lot less training time. For computer vision, we might first do object detection to see where a dog is within an image, and then build a classifier to tell us which breed of dog instead of trying to build a classifier right away.

Finally, transfer learning is not a universal performance enhancer. In our example, playing an instrument might help us learn another one, but it might also hinder progress. There are patterns and vices from one instrument that might slow down the progress of another one. If these vices are deeply entrenched within the player, a novice player might surpass the new player with the same amount of training. If your players are stuck to their vices, it might be time to hire new ones.

Transfer learning and Self-training

Transfer learning shines especially when there is not enough labeled data to retrain a model from scratch. Using our example, we can think that given enough time, a player who attends just a few lessons can learn on their own by playing the instrument without the constant supervision of their professor. Learning, partially or entirely, on your own in deep learning is called self-training. It allows us to train the model using both labeled (the lessons) and unlabeled (the players on their own) data to learn the task.

Although we will not discuss the concept of self-training in this section, we mention it here as a resource to you because when transfer learning does not work and labeled data is scarce, self-training can be incredibly helpful. These concepts are also not mutually exclusive, a seasoned player might need just a couple of lessons to become autonomous in a new instrument training without supervision and, as it turns out, so do our deep learning models.

Resources

< > Update on GitHub