Reusing Knowledge: Transfer Learning of Deep Neural Networks

Intuition

How hard would it be for you to learn to identify a Puma, Leopard, Cheetah based on your prior understanding of domestic cats? Not quite, right.

How hard would it be for a child to learn to identify the above animals using knowledge of Bats, Strawberries and Cars?

Human cognitive abilities heavily rely on already procured knowledge to extrapolate to new knowledge. Let’s consider a scenario where you are learning gardening.

When you first saw a shovel, your brain immediately connected the shovel with a spoon, because it was evident that both had a handle and a small vessel on the far end. Subconsciously, you realized it was just a bigger spoon.

Floral photo created by freepik - www.freepik.com

So you immediately realized that this is going to be used much like how a spoon is used but there is something peculiar, it’s big and won’t fit in your mouth.

So your brain went a step further and realized that the soil will be scooped with it much like how food is scooped from a plate with a spoon.

Transfer Learning

To understand how transfer learning has helped us, let’s spend some time on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). This is an annual competition for building Deep Neural Networks which can identify at least 20,000 different categories of images, like, balloons and strawberry. There are 1.28 million images in the dataset.

Error rate of ImageNet Challenge[2]

Deep Neural Networks are able to extract features (read: patterns) themselves without supervision or control from human scientists. This is the power a large number of layers have provided over machine learning techniques.

This characteristic of Deep Nets (which now have more than 100 layers) has made them a blackbox technique. However, they pattern learning is hierarchical in nature. Small features are learnt first which are built up to bigger features in successive layers.

We can harness the models ability to self learn to make it learn a little more on a different type of data.

Models that do well in the ImageNet challenge have a host of general knowledge useful for identifying a diverse range of objects. They know how to use a spoon, can they be taught to use a shovel?

The answer to the above question, both rhetorically and literally, is yes. Transfer learning can be used to teach a model proficient on the ImageNet dataset to identify bees and ants.

Why bother at all?

The depth of the neural networks makes it hard (read: impossible) for them to converge without massive data which is why it helps to have a pretrained net and optimize it for out purpose.

In other cases, someone’s architecture can be very useful for our purpose so why reinvent the wheel if we can use someone else’s model to solve a problem.

Types

Fine Tuning Pre Trained Model

An already trained model is trained a little more on the new classification task.

Training Untrained Model

Only the model (it’s architecture and not the weights of the connections) are reused.

Feature Extraction

The final layer of an already trained model is removed a new layer is inserted. The data for the new problem is passed through the network and the final layer is trained on the output of the network.

Notable Work

Scientists at Stanford fine tuned a pre-trained GoogleNet from the ILSVRC 2014 using data of skin cancer. The ability of the transfer learned model was competitive with that of human doctors.[1]

Scientists at LMU Munich used Google’s NasNet to perform the same task with different data. They also used VGG, Inception V4 and ResNet 152 models.[2]

References

1. http://www.keywordhouse.com/c3Bvb25zIGFyZSBzaG92ZWxzIHRoYXQ/

2. https://en.wikipedia.org/wiki/File:ImageNet_error_rate_history_(just_systems).svg

3. https://cs.stanford.edu/people/esteva/nature/

4. https://s3.amazonaws.com/covalic-prod-assetstore/af/be/afbe2431f1b14e878f41157c3b320bb8


This is a companion discussion topic for the original entry at https://blog.datasciencedojo.com/p/d8899229-720b-443b-8da8-197f9b979cbc/

Autoencoders are a great source with respect to the feature extraction part. The last layer of a CNN model can be used as a feature vector for each image in the dataset if we are talking about images. Similarly, you can use unsupervised feature learning through autoencoders to extract the latent space of the image features which can be used in Image Classification or Image Retrieval tasks.