File size: 18,337 Bytes
733c188
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
{
 "cells": [
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "---\n",
    "title: 07 Neural Network from Scratch\n",
    "description: An implementation of simple neural network from scratch\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/drive/1YBcEZMUHhJUiwOIwQbqwmAAGrRznpP_E?usp=sharing\" target=\"_blank\"><img align=\"left\" alt=\"Colab\" title=\"Open in Colab\" src=\"https://colab.research.google.com/assets/colab-badge.svg\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Ee4B4v5tAp1C"
   },
   "source": [
    "# A Simple Neural Network from Scratch with PyTorch and Google Colab\n",
    "\n",
    "In this tutorial we implement a simple neural network from scratch using PyTorch.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "w4cEhtf_Ap1E"
   },
   "source": [
    "## About\n",
    "\n",
    "In this tutorial we will implement a simple neural network from scratch using PyTorch. The idea of the tutorial is to teach you the basics of PyTorch and how it can be used to implement a neural network from scratch. I will go over some of the basic functionalities and concepts available in PyTorch that will allow you to build your own neural networks. \n",
    "\n",
    "This tutorial assumes you have prior knowledge of how a neural network works. Don’t worry! Even if you are not so sure, you will be okay. For advanced PyTorch users, this tutorial may still serve as a refresher. This tutorial is heavily inspired by this [Neural Network implementation](https://repl.it/talk/announcements/Build-a-Neural-Network-in-Python/5457) coded purely using Numpy. In fact, I tried re-implementing the code using PyTorch instead and added my own intuitions and explanations. Thanks to [Samay](https://repl.it/@shamdasani) for his phenomenal work, I hope this inspires many others as it did with me."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MP9ewMSlC7JU"
   },
   "source": [
    "\n",
    "The `torch` module provides all the necessary **tensor** operators you will need to implement your first neural network from scratch in PyTorch. That's right! In PyTorch everything is a Tensor, so this is the first thing you will need to get used to. Let's import the libraries we will need for this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "bKmXKSQnAp1G"
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1EWBBl1nAp1M"
   },
   "source": [
    "## Data\n",
    "Let's start by creating some sample data using the `torch.tensor` command. In Numpy, this could be done with `np.array`. Both functions serve the same purpose, but in PyTorch everything is a Tensor as opposed to a vector or matrix. We define types in PyTorch using the `dtype=torch.xxx` command. \n",
    "\n",
    "In the data below, `X` represents the amount of hours studied and how much time students spent sleeping, whereas `y` represent grades. The variable `xPredicted` is a single input for which we want to predict a grade using the parameters learned by the neural network. Remember, the neural network wants to learn a mapping between `X` and `y`, so it will try to take a guess from what it has learned from the training data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "fsAVbHnjAp1P"
   },
   "outputs": [],
   "source": [
    "X = torch.tensor(([2, 9], [1, 5], [3, 6]), dtype=torch.float) # 3 X 2 tensor\n",
    "y = torch.tensor(([92], [100], [89]), dtype=torch.float) # 3 X 1 tensor\n",
    "xPredicted = torch.tensor(([4, 8]), dtype=torch.float) # 1 X 2 tensor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RC0ru9kCAp1U"
   },
   "source": [
    "You can check the size of the tensors we have just created with the `size` command. This is equivalent to the `shape` command used in tools such as Numpy and Tensorflow. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "sfC-B1BEAp1W",
    "outputId": "fbe6380d-f76b-4dee-c744-155022ce83d6"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([3, 2])\n",
      "torch.Size([3, 1])\n"
     ]
    }
   ],
   "source": [
    "print(X.size())\n",
    "print(y.size())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zrND9MS9Ap1f"
   },
   "source": [
    "## Scaling\n",
    "\n",
    "Below we are performing some scaling on the sample data. Notice that the `max` function returns both a tensor and the corresponding indices. So we use `_` to capture the indices which we won't use here because we are only interested in the max values to conduct the scaling. Perfect! Our data is now in a very nice format our neural network will appreciate later on. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "hlBvtfAmAp1i",
    "outputId": "db5ba4e4-cf29-4761-a058-677b16e5dcef"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([0.5000, 1.0000])\n"
     ]
    }
   ],
   "source": [
    "# scale units\n",
    "X_max, _ = torch.max(X, 0)\n",
    "xPredicted_max, _ = torch.max(xPredicted, 0)\n",
    "\n",
    "X = torch.div(X, X_max)\n",
    "xPredicted = torch.div(xPredicted, xPredicted_max)\n",
    "y = y / 100  # max test score is 100\n",
    "print(xPredicted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "R1kTs5S5Ap1m"
   },
   "source": [
    "Notice that there are two functions `max` and `div` that I didn't discuss above. They do exactly what they imply: `max` finds the maximum value in a vector... I mean tensor; and `div` is basically a nice little function to divide two tensors. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xRvMSpEFAp1n"
   },
   "source": [
    "## Model (Computation Graph)\n",
    "Once the data has been processed and it is in the proper format, all you need to do now is to define your model. Here is where things begin to change a little as compared to how you would build your neural networks using, say, something like Keras or Tensorflow. However, you will realize quickly as you go along that PyTorch doesn't differ much from other deep learning tools. At the end of the day we are constructing a computation graph, which is used to dictate how data should flow and what type of operations are performed on this information. \n",
    "\n",
    "For illustration purposes, we are building the following neural network or computation graph:\n",
    "\n",
    "\n",
    "![alt text](https://drive.google.com/uc?export=view&id=1l-sKpcCJCEUJV1BlAqcVAvLXLpYCInV6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "C7pDC5SfAp1p"
   },
   "outputs": [],
   "source": [
    "class Neural_Network(nn.Module):\n",
    "    def __init__(self, ):\n",
    "        super(Neural_Network, self).__init__()\n",
    "        # parameters\n",
    "        # TODO: parameters can be parameterized instead of declaring them here\n",
    "        self.inputSize = 2\n",
    "        self.outputSize = 1\n",
    "        self.hiddenSize = 3\n",
    "        \n",
    "        # weights\n",
    "        self.W1 = torch.randn(self.inputSize, self.hiddenSize) # 3 X 2 tensor\n",
    "        self.W2 = torch.randn(self.hiddenSize, self.outputSize) # 3 X 1 tensor\n",
    "        \n",
    "    def forward(self, X):\n",
    "        self.z = torch.matmul(X, self.W1) # 3 X 3 \".dot\" does not broadcast in PyTorch\n",
    "        self.z2 = self.sigmoid(self.z) # activation function\n",
    "        self.z3 = torch.matmul(self.z2, self.W2)\n",
    "        o = self.sigmoid(self.z3) # final activation function\n",
    "        return o\n",
    "        \n",
    "    def sigmoid(self, s):\n",
    "        return 1 / (1 + torch.exp(-s))\n",
    "    \n",
    "    def sigmoidPrime(self, s):\n",
    "        # derivative of sigmoid\n",
    "        return s * (1 - s)\n",
    "    \n",
    "    def backward(self, X, y, o):\n",
    "        self.o_error = y - o # error in output\n",
    "        self.o_delta = self.o_error * self.sigmoidPrime(o) # derivative of sig to error\n",
    "        self.z2_error = torch.matmul(self.o_delta, torch.t(self.W2))\n",
    "        self.z2_delta = self.z2_error * self.sigmoidPrime(self.z2)\n",
    "        self.W1 += torch.matmul(torch.t(X), self.z2_delta)\n",
    "        self.W2 += torch.matmul(torch.t(self.z2), self.o_delta)\n",
    "        \n",
    "    def train(self, X, y):\n",
    "        # forward + backward pass for training\n",
    "        o = self.forward(X)\n",
    "        self.backward(X, y, o)\n",
    "        \n",
    "    def saveWeights(self, model):\n",
    "        # we will use the PyTorch internal storage functions\n",
    "        torch.save(model, \"NN\")\n",
    "        # you can reload model with all the weights and so forth with:\n",
    "        # torch.load(\"NN\")\n",
    "        \n",
    "    def predict(self):\n",
    "        print (\"Predicted data based on trained weights: \")\n",
    "        print (\"Input (scaled): \\n\" + str(xPredicted))\n",
    "        print (\"Output: \\n\" + str(self.forward(xPredicted)))\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qm5gimnyAp1s"
   },
   "source": [
    "For the purpose of this tutorial, we are not going to be talking math stuff, that's for another day. I just want you to get a gist of what it takes to build a neural network from scratch using PyTorch. Let's break down the model which was declared via the class above. \n",
    "\n",
    "## Class Header\n",
    "First, we defined our model via a class because that is the recommended way to build the computation graph. The class header contains the name of the class `Neural Network` and the parameter `nn.Module` which basically indicates that we are defining our own neural network. \n",
    "\n",
    "```python\n",
    "class Neural_Network(nn.Module):\n",
    "```\n",
    "\n",
    "## Initialization\n",
    "The next step is to define the initializations ( `def __init__(self,)`) that will be performed upon creating an instance of the customized neural network. You can declare the parameters of your model here, but typically, you would declare the structure of your network in this section -- the size of the hidden layers and so forth. Since we are building the neural network from scratch, we explicitly declared the size of the weights matrices: one that stores the parameters from the input to hidden layer; and one that stores the parameter from the hidden to output layer. Both weight matrices are initialized with values randomly chosen from a normal distribution via `torch.randn(...)`. Note that we are not using bias just to keep things as simple as possible.  \n",
    "\n",
    "```python\n",
    "def __init__(self, ):\n",
    "    super(Neural_Network, self).__init__()\n",
    "    # parameters\n",
    "    # TODO: parameters can be parameterized instead of declaring them here\n",
    "    self.inputSize = 2\n",
    "    self.outputSize = 1\n",
    "    self.hiddenSize = 3\n",
    "\n",
    "    # weights\n",
    "    self.W1 = torch.randn(self.inputSize, self.hiddenSize) # 3 X 2 tensor\n",
    "    self.W2 = torch.randn(self.hiddenSize, self.outputSize) # 3 X 1 tensor\n",
    "```\n",
    "\n",
    "## The Forward Function\n",
    "The `forward` function is where all the magic happens (see below). This is where the data enters and is fed into the computation graph (i.e., the neural network structure we have built). Since we are building a simple neural network with one hidden layer, our forward function looks very simple:\n",
    "\n",
    "```python\n",
    "def forward(self, X):\n",
    "    self.z = torch.matmul(X, self.W1) \n",
    "    self.z2 = self.sigmoid(self.z) # activation function\n",
    "    self.z3 = torch.matmul(self.z2, self.W2)\n",
    "    o = self.sigmoid(self.z3) # final activation function\n",
    "    return o\n",
    "```\n",
    "\n",
    "The `forward` function above takes the input `X`and then performs a matrix multiplication (`torch.matmul(...)`) with the first weight matrix `self.W1`. Then the result is applied an activation function, `sigmoid`. The resulting matrix of the activation is then multiplied with the second weight matrix `self.W2`. Then another activation if performed, which renders the output of the neural network or computation graph. The process I described above is simply what's known as a `feedforward pass`. In order for the weights to optimize when training, we need a backpropagation algorithm. \n",
    "\n",
    "## The Backward Function\n",
    "The `backward` function contains the backpropagation algorithm, where the goal is to essentially minimize the loss with respect to our weights. In other words, the weights need to be updated in such  a way that the loss decreases while the neural network is training (well, that is what we hope for). All this magic is possible with the gradient descent algorithm which is declared in the `backward` function. Take a minute or two to inspect what is happening in the code below:\n",
    "\n",
    "```python\n",
    "def backward(self, X, y, o):\n",
    "    self.o_error = y - o # error in output\n",
    "    self.o_delta = self.o_error * self.sigmoidPrime(o) \n",
    "    self.z2_error = torch.matmul(self.o_delta, torch.t(self.W2))\n",
    "    self.z2_delta = self.z2_error * self.sigmoidPrime(self.z2)\n",
    "    self.W1 += torch.matmul(torch.t(X), self.z2_delta)\n",
    "    self.W2 += torch.matmul(torch.t(self.z2), self.o_delta)\n",
    "```\n",
    "\n",
    "Notice that we are performing a lot of matrix multiplications along with the transpose operations via the `torch.matmul(...)` and `torch.t(...)` operations, respectively. The rest is simply gradient descent -- there is nothing to it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9t26Dr5zAp1u"
   },
   "source": [
    "## Training\n",
    "All that is left now is to train the neural network. First we create an instance of the computation graph we have just built:\n",
    "\n",
    "```python\n",
    "NN = Neural_Network()\n",
    "```\n",
    "\n",
    "Then we train the model for `1000` rounds. Notice that in PyTorch `NN(X)` automatically calls the `forward` function so there is no need to explicitly call `NN.forward(X)`. \n",
    "\n",
    "After we have obtained the predicted output for ever round of training, we compute the loss, with the following code:\n",
    "\n",
    "```python\n",
    "torch.mean((y - NN(X))**2).detach().item()\n",
    "```\n",
    "\n",
    "The next step is to start the training (foward + backward) via `NN.train(X, y)`. After we have trained the neural network, we can store the model and output the predicted value of the single instance we declared in the beginning, `xPredicted`.  \n",
    "\n",
    "Let's train!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "9sTddOpLAp1w",
    "outputId": "c2ca53f6-8710-440f-af85-b8cf850340c7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#0 Loss: 0.26455259323120117\n",
      "#100 Loss: 0.0024994986597448587\n",
      "#200 Loss: 0.002286414382979274\n",
      "#300 Loss: 0.0021202608477324247\n",
      "#400 Loss: 0.0019605369307100773\n",
      "#500 Loss: 0.0018112537218257785\n",
      "#600 Loss: 0.0016757562989369035\n",
      "#700 Loss: 0.0015555238351225853\n",
      "#800 Loss: 0.0014504743739962578\n",
      "#900 Loss: 0.001359498011879623\n",
      "Predicted data based on trained weights: \n",
      "Input (scaled): \n",
      "tensor([0.5000, 1.0000])\n",
      "Output: \n",
      "tensor([0.9522])\n",
      "Finished training!\n"
     ]
    }
   ],
   "source": [
    "NN = Neural_Network()\n",
    "for i in range(1000):  # trains the NN 1,000 times\n",
    "    if (i % 100) == 0:\n",
    "        print (\"#\" + str(i) + \" Loss: \" + str(torch.mean((y - NN(X))**2).detach().item()))  # mean sum squared loss\n",
    "    NN.train(X, y)\n",
    "NN.saveWeights(NN)\n",
    "NN.predict()\n",
    "\n",
    "print(\"Finished training!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "L9nBzkgdbjcA"
   },
   "source": [
    "The loss keeps decreasing, which means that the neural network is learning something. That's it. Congratulations! You have just learned how to create and train a neural network from scratch using PyTorch. There are so many things you can do with the shallow network we have just implemented. You can add more hidden layers or try to incorporate the bias terms for practice. I would love to see what you will build from here. Reach me out on [Twitter](https://twitter.com/omarsar0) if you have any further questions or leave your comments here. Until next time!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zcms4BCySKXj"
   },
   "source": [
    "## References:\n",
    "- [PyTorch nn. Modules](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-custom-nn-modules)\n",
    "- [Build a Neural Network with Numpy](https://enlight.nyc/neural-network)\n"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "Neural Network from Scratch.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}