Sunil Surendra Singh commited on
Commit
67455cf
·
1 Parent(s): 3c18d7a

added deployed app link

Browse files
Files changed (1) hide show
  1. README.md +18 -18
README.md CHANGED
@@ -10,13 +10,13 @@ pinned: false
10
  license: mit
11
  ---
12
 
13
- <a href="https://tv-script-generation-rnn-sssingh.streamlit.app/" target="_blank"><img src="https://img.shields.io/badge/click_here_to_open_demo_app-orange?style=for-the-badge&logo=dependabot"/></a>
14
 
15
 
16
  # Landmarks Classification and Tagging using CNN
17
  In this project we solve a `multi-label-classification` problem by classifying/tagging a given image of a famous landmark using CNN (Convolutional Neural Network).
18
 
19
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/title_image_sydney_opera_house.jpg?raw=true" width="800" height="300" />
20
 
21
  ## Features
22
  ⚡Multi Label Image Classification
@@ -38,7 +38,7 @@ In this project we solve a `multi-label-classification` problem by classifying/t
38
 
39
  ## Introduction
40
 
41
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/app-screenshot.png?raw=true">
42
 
43
  Photo sharing and photo storage services like to have location data for each uploaded photo. In addition, these services can build advanced features with the location data, such as the automatic suggestion of relevant tags or automatic photo organization, which help provide a compelling user experience. However, although a photo's location can often be obtained by looking at the photo's metadata, many images uploaded to these services will not have location metadata available. This can happen when, for example, the camera capturing the picture does not have GPS or if a photo's metadata is scrubbed due to privacy concerns.
44
 
@@ -54,7 +54,7 @@ To build NN based model that'd accept any user-supplied image as input and sugge
54
  - Here, we aim to attain a test accuracy of at least 60%, which is pretty good given the complex nature of this task.
55
  4. Implement an inference function that will accept a file path to an image and an integer k and then predict the top k most likely landmarks this image belongs to. The print below displays the expected sample output from the predict function, indicating the top 3 (k = 3) possibilities for the image in question.
56
 
57
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/sample_output.png?raw=true">
58
 
59
  ## Dataset
60
  - Dataset to be downloaded from [here](https://udacity-dlnfd.s3-us-west-1.amazonaws.com/datasets/landmark_images.zip). Note that this is a mini dataset containing around 6,000 images); this dataset is a small subset of the [Original Landmark Dataset](https://github.com/cvdfoundation/google-landmark) that has over 700,000 images.
@@ -65,18 +65,18 @@ To build NN based model that'd accept any user-supplied image as input and sugge
65
  - Images in the dataset are of different sizes and resolution
66
  - Here are a few samples from the training dataset with their respective labels descriptions...
67
 
68
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/landmark_samples.png?raw=true">
69
 
70
  ## Evaluation Criteria
71
 
72
  ### Loss Function
73
  We will use `LogSoftmax` in the output layer of the network...
74
 
75
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/LogSoftmax.png?raw=true">
76
 
77
  We need a suitable loss function that consumes these `log-probabilities` outputs and produces a total loss. The function that we are looking for is `NLLLoss` (Negative Log-Likelihood Loss). In practice, `NLLLoss` is nothing but a generalization of `BCELoss` (Binary Cross EntropyLoss or Log Loss) extended from binary-class to multi-class problem.
78
 
79
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/NLLLoss.png?raw=true">
80
 
81
  <br>Note the `negative` sign in front `NLLLoss` formula hence negative in the name. The negative sign is put in front to make the average loss positive. Suppose we don't do this then since the `log` of a number less than 1 is negative. In that case, we will have a negative overall average loss. To reduce the loss, we need to `maximize` the loss function instead of `minimizing,` which is a much easier task mathematically than `maximizing.`
82
 
@@ -85,7 +85,7 @@ We need a suitable loss function that consumes these `log-probabilities` outputs
85
 
86
  `accuracy` is used as the model's performance metric on the test-set
87
 
88
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/accuracy.png?raw=true">
89
 
90
 
91
  ## Solution Approach
@@ -94,11 +94,11 @@ We need a suitable loss function that consumes these `log-probabilities` outputs
94
  `mean` and `standard deviation` is computed for the train dataset, and then the dataset is `normalized` using the calculated statistics.
95
  - The RGB channel histogram of the train set is shown below...
96
 
97
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/train_hist1.png?raw=true">
98
 
99
  - The RGB channel histogram of the train set after normalization is shown below...
100
 
101
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/train_hist2.png?raw=true">
102
 
103
  - Now, `test` and `val` Dataset objects are prepared in the same fashion where images are resized to 128x128 and then normalized.
104
  - The training, validation, and testing datasets are then wrapped in Pytorch `DataLoader` object so that we can iterate through them with ease. A typical `batch_size` 32 is used.
@@ -113,22 +113,22 @@ We need a suitable loss function that consumes these `log-probabilities` outputs
113
  - `ReLU` is used as an activation function, and `BatchNorm` is used after every layer except the last.
114
  - Final model architecture (from scratch) is shown below...
115
 
116
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/scratch_network.png?raw=true">
117
 
118
  - Network initial weights are initialized by numbers drawn from a `normal-distribution in the range...
119
 
120
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/sqrt_n_inputs.png?raw=true">
121
 
122
  - Network is then trained and validated for 15 epochs using the `NLLLoss` function and `Adam` optimizer with a learning rate of 0.001. We save the trained model here as `ignore.pt` (ignore because we are not using it for evaluation)
123
  - We keep track of training and validation losses. When plotted, we observe that the model starts to `overfit` very quickly.
124
 
125
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/loss1.png?raw=true">
126
 
127
  - Now, we reset the Network initial weights to Pytorch default weight to check if there are any improvements
128
  - Network is then again trained and validated for 15 epochs using the `NLLLoss` function and `Adam` optimizer with a learning rate of 0.001. We save the trained model here as `model_scratch.pt` (we will use this saved model for evaluation)
129
  - We keep track of training and validation losses. When plotted, we observe that result is almost the same as that of custom weight initialization
130
 
131
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/loss2.png?raw=true">
132
 
133
  - The trained network (`model_scratch.pt`) is then loaded and evaluated on unseen 1,250 testing images.
134
  The network can achieve around `38%` accuracy, which is more than we aimed for (i.e., 30%). Furthermore, the network can classify `475` images out of the total `1250` test images.
@@ -142,12 +142,12 @@ The network can achieve around `38%` accuracy, which is more than we aimed for (
142
  The original classifier layer in VGG19 is replaced by a `custom-classifier` with learnable weights.
143
  - Final model architecture (transfer learning) is shown below...
144
 
145
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/transfer_network.png?raw=true">
146
 
147
  - Network is then trained and validated for ten epochs using the `NLLLoss` function and `Adam` optimizer with a learning rate of 0.001. Note that the optimizer has been supplied with the learnable parameters of `custom-classifier` only and not the whole model. This is because we want to optimize our custom-classifier weights only and use ImageNet learned weights for the rest of the layers.
148
  - We keep track of training and validation losses and plot them.
149
 
150
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/loss3.png?raw=true">
151
 
152
  - The trained network is saved as `model_transfer.pt`
153
 
@@ -180,7 +180,7 @@ As we can see, the model built using transfer learning has outperformed the mode
180
  >>> suggest_locations('assets/Eiffel-tower_night.jpg')
181
  ```
182
 
183
- <img src="https://github.com/sssingh/landmark-classification-tagging/blob/master/assets/eiffel_tower_prediction.png?raw=true">
184
 
185
 
186
  ## How To Use
@@ -188,7 +188,7 @@ As we can see, the model built using transfer learning has outperformed the mode
188
  ### Open the LIVE app
189
 
190
  App has been deployed on `Hugging Face Spaces`. <br>
191
- <a href="https://gradio.app/" target="_blank"><img src="https://img.shields.io/badge/click_here_to_open_demo_app-orange?style=for-the-badge&logo=dependabot"/></a>
192
 
193
  ### Training and Testing using jupyter notebook
194
  1. Ensure the below-listed packages are installed
 
10
  license: mit
11
  ---
12
 
13
+ <a href="https://huggingface.co/spaces/sssingh/famous-landmarks-classifier-cnn" target="_blank"><img src="https://img.shields.io/badge/click_here_to_open_demo_app-orange?style=for-the-badge&logo=dependabot"/></a>
14
 
15
 
16
  # Landmarks Classification and Tagging using CNN
17
  In this project we solve a `multi-label-classification` problem by classifying/tagging a given image of a famous landmark using CNN (Convolutional Neural Network).
18
 
19
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/title_image_sydney_opera_house.jpg?raw=true" width="800" height="300" />
20
 
21
  ## Features
22
  ⚡Multi Label Image Classification
 
38
 
39
  ## Introduction
40
 
41
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/app-screenshot.png?raw=true">
42
 
43
  Photo sharing and photo storage services like to have location data for each uploaded photo. In addition, these services can build advanced features with the location data, such as the automatic suggestion of relevant tags or automatic photo organization, which help provide a compelling user experience. However, although a photo's location can often be obtained by looking at the photo's metadata, many images uploaded to these services will not have location metadata available. This can happen when, for example, the camera capturing the picture does not have GPS or if a photo's metadata is scrubbed due to privacy concerns.
44
 
 
54
  - Here, we aim to attain a test accuracy of at least 60%, which is pretty good given the complex nature of this task.
55
  4. Implement an inference function that will accept a file path to an image and an integer k and then predict the top k most likely landmarks this image belongs to. The print below displays the expected sample output from the predict function, indicating the top 3 (k = 3) possibilities for the image in question.
56
 
57
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/sample_output.png?raw=true">
58
 
59
  ## Dataset
60
  - Dataset to be downloaded from [here](https://udacity-dlnfd.s3-us-west-1.amazonaws.com/datasets/landmark_images.zip). Note that this is a mini dataset containing around 6,000 images); this dataset is a small subset of the [Original Landmark Dataset](https://github.com/cvdfoundation/google-landmark) that has over 700,000 images.
 
65
  - Images in the dataset are of different sizes and resolution
66
  - Here are a few samples from the training dataset with their respective labels descriptions...
67
 
68
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/landmark_samples.png?raw=true">
69
 
70
  ## Evaluation Criteria
71
 
72
  ### Loss Function
73
  We will use `LogSoftmax` in the output layer of the network...
74
 
75
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/LogSoftmax.png?raw=true">
76
 
77
  We need a suitable loss function that consumes these `log-probabilities` outputs and produces a total loss. The function that we are looking for is `NLLLoss` (Negative Log-Likelihood Loss). In practice, `NLLLoss` is nothing but a generalization of `BCELoss` (Binary Cross EntropyLoss or Log Loss) extended from binary-class to multi-class problem.
78
 
79
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/NLLLoss.png?raw=true">
80
 
81
  <br>Note the `negative` sign in front `NLLLoss` formula hence negative in the name. The negative sign is put in front to make the average loss positive. Suppose we don't do this then since the `log` of a number less than 1 is negative. In that case, we will have a negative overall average loss. To reduce the loss, we need to `maximize` the loss function instead of `minimizing,` which is a much easier task mathematically than `maximizing.`
82
 
 
85
 
86
  `accuracy` is used as the model's performance metric on the test-set
87
 
88
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/accuracy.png?raw=true">
89
 
90
 
91
  ## Solution Approach
 
94
  `mean` and `standard deviation` is computed for the train dataset, and then the dataset is `normalized` using the calculated statistics.
95
  - The RGB channel histogram of the train set is shown below...
96
 
97
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/train_hist1.png?raw=true">
98
 
99
  - The RGB channel histogram of the train set after normalization is shown below...
100
 
101
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/train_hist2.png?raw=true">
102
 
103
  - Now, `test` and `val` Dataset objects are prepared in the same fashion where images are resized to 128x128 and then normalized.
104
  - The training, validation, and testing datasets are then wrapped in Pytorch `DataLoader` object so that we can iterate through them with ease. A typical `batch_size` 32 is used.
 
113
  - `ReLU` is used as an activation function, and `BatchNorm` is used after every layer except the last.
114
  - Final model architecture (from scratch) is shown below...
115
 
116
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/scratch_network.png?raw=true">
117
 
118
  - Network initial weights are initialized by numbers drawn from a `normal-distribution in the range...
119
 
120
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/sqrt_n_inputs.png?raw=true">
121
 
122
  - Network is then trained and validated for 15 epochs using the `NLLLoss` function and `Adam` optimizer with a learning rate of 0.001. We save the trained model here as `ignore.pt` (ignore because we are not using it for evaluation)
123
  - We keep track of training and validation losses. When plotted, we observe that the model starts to `overfit` very quickly.
124
 
125
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/loss1.png?raw=true">
126
 
127
  - Now, we reset the Network initial weights to Pytorch default weight to check if there are any improvements
128
  - Network is then again trained and validated for 15 epochs using the `NLLLoss` function and `Adam` optimizer with a learning rate of 0.001. We save the trained model here as `model_scratch.pt` (we will use this saved model for evaluation)
129
  - We keep track of training and validation losses. When plotted, we observe that result is almost the same as that of custom weight initialization
130
 
131
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/loss2.png?raw=true">
132
 
133
  - The trained network (`model_scratch.pt`) is then loaded and evaluated on unseen 1,250 testing images.
134
  The network can achieve around `38%` accuracy, which is more than we aimed for (i.e., 30%). Furthermore, the network can classify `475` images out of the total `1250` test images.
 
142
  The original classifier layer in VGG19 is replaced by a `custom-classifier` with learnable weights.
143
  - Final model architecture (transfer learning) is shown below...
144
 
145
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/transfer_network.png?raw=true">
146
 
147
  - Network is then trained and validated for ten epochs using the `NLLLoss` function and `Adam` optimizer with a learning rate of 0.001. Note that the optimizer has been supplied with the learnable parameters of `custom-classifier` only and not the whole model. This is because we want to optimize our custom-classifier weights only and use ImageNet learned weights for the rest of the layers.
148
  - We keep track of training and validation losses and plot them.
149
 
150
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/loss3.png?raw=true">
151
 
152
  - The trained network is saved as `model_transfer.pt`
153
 
 
180
  >>> suggest_locations('assets/Eiffel-tower_night.jpg')
181
  ```
182
 
183
+ <img src="https://github.com/sssingh/landmark-classification-tagging/blob/main/assets/eiffel_tower_prediction.png?raw=true">
184
 
185
 
186
  ## How To Use
 
188
  ### Open the LIVE app
189
 
190
  App has been deployed on `Hugging Face Spaces`. <br>
191
+ <a href="https://huggingface.co/spaces/sssingh/famous-landmarks-classifier-cnn" target="_blank"><img src="https://img.shields.io/badge/click_here_to_open_demo_app-orange?style=for-the-badge&logo=dependabot"/></a>
192
 
193
  ### Training and Testing using jupyter notebook
194
  1. Ensure the below-listed packages are installed