Spaces:
Runtime error
Runtime error
Hello! How are you? I see this is yes, I mean it. What? The rest of your mates? Let me share this screen. Okay. Well, let's start. How about you? Okay. Let's see. We move forward here to understanding these deep learning layers. So they, the real, deep neural nets, were where we put more than one hidden layer. So how did you find that the Forum? So what? How was your first approach to do a run on that into tensorflow? Did you find it easy, or was it pretty hard? I see. Well, you need to think of that as a like a lab architecture where we are putting different things, and what is important is to understand which are the ingredients, and where to find the different things that we can do. So, for instance, some of you I start to use different optimizers. Now you have the inputs, which is something that we are not going to change, or what we we that when working with images to represent the color. The outputs obviously is going to be the one that you want. If it's classification or regression, that is something that is going to be set, and very easy to fix. Then we have the topology of that P to, and this is what we are going to change, to see how to work with images, how to work with with videos or something with a time component. So we will use different algorithms. and then the so that you will start to use Ada, grab the Adam or Rms pro something that I like it. A. And the way to understand those concepts, and and we will see that in the next lecture, or we are going to have some words about it. But it's going to the documentation. and the only that you need to do is well. I'm going to use the optimizer and the team, I said. I know that the the typical one is stochastic right in the same. But let's see which are all the optimizers that I can choose there. If you type rms pro in Google or Adam. No, they don't have beautiful videos like in two minutes are going to explain you the the difference between a stochastic that in the same, or the the can only calibrating the same on, and those kind of forwarding so, and also in care as you can finally create your own optimization you want by using customizing fans, so it's step by step, a little by leader, trying to discover the the kind of things that the people is using. But don't be scared for the amount of information, because finally all the people is using the the same building blocks, and it's easy to to find out those where you can see different examples of using different techniques or or topology. So thank you. So it was much more difficult to find those kind of things. But right now the people is certain. All these these things in terms of loc at us, or having face in Github, in in, in. Yes, in in a few minutes. You are going to be able to understand, at least, or maybe not the the whole math process behind. But at least the idea of the the intuition about all this kind of things. Remember that we are going. This is a basic course in the sense that we need to to to see the building blocks. and we will see these multi-layer perception, which is the first DNA the one that you use for emiss. Then we will move to other interesting blocks which are convolutional neural networks. and then we will move to recurrent neural networks. Lets. Those are the main 3 blocks for supervised learning in your ownets. Okay, so we can find other kind of topologies, and we will see we have time all the topologies which are built with those spaces building blocks. Also we will see autonomous, and you know that if I personally, on that to see how can we work with a supervised learning, with with neural net. and the rest of the ability that we are seeing nowadays? The fossil models or transformers? All of them are just topologies, which are. you know, built with those main building blocks. So it's going to be easy to pass from one to another.It is a good time to be working with neural nets. We have the libraries, the tools, and the frameworks to run them. It is much easier than it used to be, as you no longer need to allocate memory for each training sample, or to follow through messy code. It is easy to put your hands on it and create something quick, as you can see in TensorFlow. You can use any cloud, such as Google Collapse or Microsoft Data Breaks, and you can use their frameworks to make it even easier. Transfer learning is also important, as it allows you to use pre-trained neural nets with a small amount of data. Today, I want you to do a good review of the videos I put in the forum on cost, as it is an essential part of understanding how neural nets work.The answer is a definite article.That's why, dividing by 255 is something we also do in this complex task. It seems very easy for a human, but it's not easy to do some rules or write rules for that number. The problem is in neural nets something like this: we have this input 20 by 28. We have numbers here between 0 and 255, something that we can move into 0 or 1 if we want. Basically, this is the input, and the output is a right of 10 elements where we have the probability to belong to any of these classes. So in this case, we have a class, but we want it to have the probability of it given X, which is near to 1 and the rest. And you have to see them so obviously. We will use the log loss that we call the standard log loss that you use. It's measuring the distance between two probabilities. In this case, the log loss expects the probabilities to be numbers between 0 and 1. So it's a perfect match when you have outputs which are between 0 and 1. The cross entropy is measuring the amount of information that differs from one function to another. This is the theoretical part, but the practical part is that if you have something at the output which runs between 0 and 1, use cross entropy and we are using binary cross entropy when we put the sigmoid, and we use categorical cross entropy when we have more than two classes. Okay. And in tensorflow, you also see a sparse categorical cross entropy. This is the same as categorical cross entropy, and that everybody calls categorical cross entropy is doing exactly the same. The only difference is how you are representing the labels. If you represent the labels as an integer, so you say, "Well, this is a," and you directly put that this is, then use the sparse categorical cross entropy because the tensorflow or the library is going to understand that because the canonical input as a label for cross entropy is not a class, it's the one-hot vector which represents the class. We will be 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. So the way to represent the label when you are using log loss or cross entropy is a one-hot vector. Why? Because remember how log loss works. You are multiplying the label by the log of the probability, and there you know a lot of whatever. Okay. But you are multiplying this label. The one or 0 for all the terms is not going, they are not going to contribute to the cost. Okay. The important is the number that you put here. In tensorflow, when you say sparse categorical cross entropy, because you are very lazy to create those one-hot vectors. And you have this. And the library will convert this into this. Okay, that's basically it. The difference between categorical cross entropy or its parts in terms of what it's doing behind is none. The only important is that it's changing this to this.Okay, so this is for me. Our experiments, you're working with this. You don't have to worry about that. You have the classes representing us in the form of one-hot code vectors, because you can move between categorical control and the sparse costs. So we have here a bunch of numbers and we don't know what to put here. We also have here those 10 outputs. By the way, the first time someone started to use a neural net for this problem, they had some good results. It was a milestone, but at that moment, my colleagues at Google, Corinna, Cortez, Back, and Nick, were working with supervised kernels and they were better for this problem at that moment. So now things have changed a little bit. So let me go through the videos in three different parts, which are very important for the problem and statement, something that more or less we really know. Then the divide and conquer problem, which I really like to highlight, because this divide and conquer is the essential heart of any neural net, and we will see that in convolutional or emails. And what is important is to understand for the very first time how a neural net is able to detect a pattern. Okay. So we are going to work in the next three videos with I you that from 3Blue1Brown, which is an amazing source for these kinds of things. We are going to work with a pre-trained neural net. So it's a neural net that is already solving the problem. Okay. Then we will move on to how to train this and to train this we need to do what we want with stochastic gradient descent. But in this case we need something more interesting. Now we have that help with the stochastic gradient descent, which is backpropagation. Okay, backpropagation is the algorithm that you are going to hear about every moment, or in every book, or I'm. Finally, why do we need something else that got in the same? Well, the problem was that in the seventies and eighties, if we had an architecture like this, for instance, those are the inputs. This is a little layer. This is another little layer even to. And this is the output. For instance. And they are all connected with all. All is connected with all. I'm just plotting this, but all is connected with all, because we are working with dense layers. Then dense means. Remember that all the units in the dense layer have to be connected with all the preceding units. This, and with this okay, and in every yet we have a weight. Well, the problem was that when they started to work with the external problem, they didn't have an algorithm to train all the weights. Why? Because when they started to work with the stochastic gradient descent, finally stochastic gradient descent was only for a structure where you just have one level, like in supervised learning or a logistic regression, because you are taking the cost function, and you are taking the derivative. You are taking the cost function, the loss function. And you are taking the derivative of that. Those patterns. Okay. But here it's not so easy. How to apply this derivative over all the weights that we have here. I think it's very easy to do it for those ones the weights that you have in the last layer. Okay, because immediately you can say, Well, I can do this derivation of the cost function, because because you have the cost function here to these. But the problem is that those weights or the errors that you can have here are going to depend on the weights, but also on the output of this layer. And again, the output of this layer depends on the weight and the output on the all in all the layers. Okay, how can I change the weights in all the levels? For what? For minimizing the cost function? Well, this is what Jeff Flinton and Bryant, how and Williams solved with their algorithm, the backpropagation. And it's an algorithm that was developed in the eighties.Until about 2000 or something like that, a new set of very large problems was created, because we didn't have much data or computational power. The outline of this was made about 20 or 30 years before we started to apply it. From the timing, let's try to understand how a neuron is able to detect a pattern. How is it even possible for these neurons to detect something that we believe is necessary? How is the math working once the weights are set? Let's start by stating the problem. I'm going to put the video up. Let me know if you can access it. If not, I'll put something else up. Let me know. This is a 3. It's sloppily written and rendered at an extremely low resolution of 28 by 28 pixels, but your brain has no trouble recognizing it as a 3. Something in that crazy, smart, visual cortex of yours resolves these as representing the same idea, while at the same time recognizing other images as their own distinct ideas. But if I told you to sit down and write a program that takes in a grid of 28 by 28 pixels like this and outputs a single number between 0 and 10, the task goes from comically trivial to dauntingly difficult, unless you've been living under a rock. I hardly need to motivate the relevance and importance of machine learning and neural networks to the present and future. But what I want to do here is show you what a neural network actually is, assuming no background, and to help visualize what it's doing, not as a buzzword, but as a piece of math. So that you come away feeling like this structure itself is motivated, and to feel like you know what it means when you read or hear about a neural network "learning". This video is just going to be devoted to the structure component of that. The following one is going to tackle learning. What we're going to do is put together a neural network that can learn to recognize handwritten digits. This is a somewhat classic example for introducing the topic, and I'm happy to stick with the status quo here, because at the end of the two videos I want to point you to a couple of good resources where you can learn more and where you can download the code that does this and play with it on your own computer. The number here is 784, because we are working with images of 28 by 28. So we are putting all the pixels in order, taking from the first row to the last row, and we have numbers in this as simple, we have a number between 0 and 1. Those are normalized; the original one has a number between 0 and 255. Okay. And the output is basically this number. We have many, many variants of neural networks, and in recent years there's been sort of a boom in research towards these variants. But in these two introductory videos, you and I are just going to look at the simplest, plain vanilla form with no added frills.This is a necessary prerequisite for understanding any of the more powerful modern variants; it still has plenty of complexity for us to wrap our minds around. But even in its simplest form, it can learn to recognize handwritten digits, which is a pretty cool thing for a computer to be able to do. At the same time, you'll see how it does fall short of a couple of hopes that we might have for it. As the name suggests, neural networks are inspired by the brain. Let's break that down. What are the neurons and in what sense are they linked together? When I say neuron, all I want you to think about is a thing that holds a number, specifically a number between 0 and 1. It's really not more than that, although you know that is more than that, because what we are doing is combining the inputs and putting them in a nonlinear function. But it's a good thing for the sake of simplicity to think about the neuron. Finally, what passes is a number. This helps you to understand that the neuron can be inactivated or activated. So just think about the number you put in, or finally create, when the combination of the inputs and the weights with this dot product that we are doing finally is a number between minus infinity and plus infinity. But if we plug the sigmoid over that, finally what we have is a number between 0 and 1. So you want to do the analogy with our brain and think about neurons and how they can be activated or not, something that a negative input into the sigmoid that finally leads to a sigmoid in the output. Of this, in the output of the sigmoid is basically a non-activated neuron, something which is very positive and creates an output in the same way. With an output of this near to 0, it will be an activated neuron. Okay, so we are going to play with that. The neurons are going to be activated when they are detecting what they are looking for, and inactivated when they are not. That's the idea of thinking about the neuron as a number. Okay? And I think that helps to understand how it's going to work all together. For example, the network starts with a bunch of neurons corresponding to each of the 28 times 28 pixels of the input image, which is 784 neurons in total. Each one of these holds a number that represents the gray scale value of the corresponding pixel, ranging from 0 for black pixels up to 1 for white pixels. This number inside the neuron is called its activation, and the image you might have in mind here is that each neuron is lit up when its activation is a high number. So all of these 784 neurons make up the first layer of our network. Now jumping over to the last layer, this has 10 neurons, each representing one of the digits. The activation in these neurons, again, some number that's between 0 and 1, represents how much the system thinks that a given image corresponds with a given digit. There's also a couple of layers in between, called the Hidden Layers, which for the time being just be a giant question mark for how on earth this process of recognizing digits is going to be handled in this network. I chose two hidden layers, each one with 16 neurons. And admittedly, that's kind of an arbitrary choice. To be honest, I chose two layers based on how I wanted to motivate the structure in just a moment, and 16 was just a nice number to fit on the screen. In practice, there is a lot of room for experimentation with the specific structure. Here, the way the network operates is that activations in one layer determine the activations of the next layer. And of course, the heart of the network as an information processing mechanism comes down to exactly how those activations from one layer bring about activations in the next layer. It's meant to be loosely analogous to how, in biological networks of neurons, some groups of neurons firing cause certain others to fire. Now the network I'm.Showing here has already been trained to recognize digits, and let me show you what I mean by that. It means if you feed in an image lighting up all 784 neurons of the input layer according to the brightness of each pixel in the image, that pattern of activation causes some very specific pattern in the next layer, which causes some pattern in the one after it, which finally gives some pattern in the output layer. And the brightest neuron of that output layer is the network's choice, so to speak, for what digit this image represents. Well, this is the idea, again, of working with a net, and under the idea of divide and conquer. So we have a group of neurons in the first layer that, depending on the output, are going to be activated or not. Those groups of neurons that have been activated are a kind of pattern for the next layer and the next one. It again is going to be some of them. Some of the neurons are going to be activated, depending on the activation in the past layer. So they are combining patterns to finally decide for the class which is the class that you are looking for for this input. So now this is training. So this is working right now. We put here a 6, and we have a C here. Obviously we need to see how to train it to do that. But right now the idea, or what I want you to see first is how is it even possible that the weights can be configured to do that, which is the idea behind doing this, and then we will be worried about how can we train it for doing exactly this. But before that, yes, this is what I want you to see is that, well, and how, by changing the way the weights, I can finally do this. Okay. So let's move to how can we do that, or why is it even possible to do this? And before jumping into the math for how one layer influences the next, or how training works, let's just talk about why it's even reasonable to expect a layered structure like this to behave intelligently. What are we expecting here? What is the best hope for what those middle layers might be doing? Well, when you or I recognize digits, we piece together various components. A 9 has a loop up top and a line on the right. An 8 also has a loop up top, but it's paired with another loop down. A 4 basically breaks down into 3 specific lines and things like that. In a perfect world, we might hope that each neuron in the second to last layer corresponds with one of these sub components, so that anytime you feed in an image with, say, a loop up top like a 9 or an 8, there's some specific neuron whose activation is going to be close to one. And I don't mean this specific loop of pixels. The hope would be that any generally loopy pattern towards the top sets off this neuron that way. Going from the third layer to the last one just requires learning which combination of sub components corresponds to which digits. Of course, that just kicks the problem down the road. Because how would you recognize these sub components, or even learn what the right sub components should be? And I still haven't even talked about how one layer influences the next. But run with me on this one for a moment. Recognizing a loop can also break down into some problems. One reasonable way to do this would be to first recognize the various little edges that make it up. Similarly, a long line like the kind you might see in the digits 1, 4, or 7 is really just a long edge, or maybe you think of it as a certain pattern of several smaller edges. So maybe our hope is that each neuron in the second layer of the network corresponds with the various relevant little edges. Maybe when an image like this one comes in, it lights up all of the neurons associated with around 8 to 10 specific little edges, which in turn lights up the neurons associated with the upper loop and the long vertical line, and those light up the neurons associated with the digit. So again, this is just the divide and conquer.It's not so easy to see. Finally, because this is a kind of black box where all the information is, it's difficult to plot this, but the intuition remains and is what is driving the neural net. The neural net is basically dividing the problem into smaller problems. So the problem of the 15 9 is the same as the one in this upper loop in yellow. This is why the line is in red. But if you want to split this problem into easier problems, well, you have to detect the loop. You are able to do that. Does it? Third or yes, the things for this highlight in the same. So what we are doing is converting the hard problem, or the complex one, into a lot of simpler problems like we do in the external problem. It's the same with how a neural net is able to find, or that just a single pattern is something that we are going to see in a few minutes. But the idea of divide and conquer is the one that is going to dominate the neural net, not only for this, but also for the convolutional. And that's where we are trying to classify images or in recorded neural nets, or in transformers for natural language processing. We are always dividing our problem into something simpler that, finally, we can combine layer by layer. And the good thing, or the most interesting part here, is that we don't need to use our intuition about which is what then? You don't know every neuron needs to be that for and generating those bigger patterns, because we are not going to do that. We are not going to decide. We start the things that the neural net needs to see or put the attention we are just going to put that cost function and we are going to say the neural net you need to minimize the cost function. And just by using the during training we'll understand which is more important or not for what? For solving your problem, not for anything else, because the only goal that we are going to put is to minimize the cost. If in the cost function what we put is that the neural net has to be served between numbers between 0 and 9, it's going to be all the things that we needed for doing that. The things that the neural network is going to do are going to be much better than our intuition, okay, because we are not able with our brain to work in those high dimensional spaces. We don't know if we need to right now to right those weights that we have here in every LED. We have a wait. We need to write those. Yeah, it's a kind of the compost. Okay, yes, humans. And we are not even saying, well, I think that you need to recognize this loop, because I know that the 9 and the 8 have different loops, and you need to also decide like no, no, no, no, no! We are going to let the neural net to decide that for us. Okay. How? Just by using running the same, minimizing the cost function, because the only thing we know is that by minimizing the cost function, we are having less errors because, because once it is measured in a whatever okay is always in saving them. So. And we are able to minimize the cost function. This is working. We don't know exactly how it's doing all the things in the mapping function. But finally, this is an ongoing, some very, very large, my being function, but it's a mapping function is something that is mapping this 784 world of the continuous number we can see on one into something which are my classes. Okay, and there are a lot of ways there. But finally, there is some mathematical funds. You end up taking this input, give you some output. Okay. We are not going to decide how our has to be does just wait that the weights are going to be trained by using a stochastic to your to your notebook, and try to do something like that to explain or to figure out the sadly with neurons are in the generating an upper low, but it in an upper loop, or or this is sideline. It's going to be very difficult, because probably is this rewarded alone a lot of neurons all across your neural net. It's going to be difficult.Probably you have some units in the first single layer that are combined with some neurons in the second layer, and all of these are doing something similar to what you are looking for. But it's not easy to see that. The concept of divide and conquer is what it is indeed doing. Now what we have to do is try to understand how a neuron is able to detect one of those small or similar problems that we have here, which is the depth team. For instance, this part in this right line, the one, or this part in the upper loop, the blue one, how is a neuron able to be activated if it's in that pattern or inactivated? We will see that in practice, and it's not going to be a problem, because we are going to see problems where other people do a lot of configuration and try different topologies. But it's very easy to set this up with a few experiments. We start to use two layers of something, two or four layers, and then we start to see the results. This will last years, and in the last year we can figure out if we have a high bias or high variance problem. The typical one is in this case, as you say, all of them about what we have to do. The normal thing is that the first layer has a little bit less neurons than your input, because probably what you want is to do a kind of PCA and eliminate redundant information. In other cases, we will see that probably you have more neurons, but something that you can see finally in your training, if some of the weights are with the same value with the side, meaning that the new ones are not working with that input or they are mostly 90% of the time inactivated. But there is not a way to decide the number of layers, or even hidden units. Another thing is to also work with. So you can start by extending this problem with two hidden layers with 128 hidden units. If you see that it's not working well and you're still having data and you don't have a rolling of 5 audience, you have a high bias. In that moment, the training is not working well with the training data set. So you put 200 feet and see how it's working. If it is still working with high value, you put 512 in some moment. You are going to see that. Hmm. It's not going to be a difference between 250, c. Or 512 or 501,024, because probably you have to rate and in that moment. What you have to do is to be worried about all the kind of things, the data, more hidden layers, or something like that. But we will see that. Yes, with a few experiments, it's going to be easy to get that apology done more or less. Where are you going to use the on the denote on that, for instance, if you want the neural net to work on a mobile, probably what you want is to try to reduce the number of Kevin units. Probably you have more because you want something lighter for putting into them alive. No, no; there are different applications. But this is one of the first questions when you start to work with neural net: Is it going to be a real problem to this? And the answer is no, because we are going to work with training neural net in some moments, or you are going to work with topologies that you know are working for similar products like yours, and the only thing that we have to do is just fine tuning. Okay, so the neural net is able to do more fancy scenes, no linear boundaries that finally can over feed your training data. Okay. In that moment, probably in the evaluation part, you have a problem. But let me tell you something about the high bias and high variance also in neural nets when you are working with millions of training samples.And it's something that we are going to do by using, pretending never on that of Google or from there is some scenario that we are going to overfit the training data set. Okay, we said something that in classical machine learning is always a very bad thing. But here it is not going to be a very bad thing. It's going to be awesome. And I'll tell you why, because if you overfit 5000000000 of images, which is the kind of database that is working with Meta or Google. If you are able to overfit this training set, you are overfitting in the humongous amount of material that you have in training. So you have. And we will see that we will see an amazing training loss, more than 95. And we will see the same for the validation set. And this is because you are understanding the real problem with all the variability that we have. So this was unthinkable 10 years ago, talking about classical machine learning. You have something that you're working with 95% in your training set, and it's very complex. This is because you are doing overfitting. Well, I want to do overfitting. If the training material that I have is indeed explaining all the okay. This is something that at the very beginning is going to cause a little bit, but it is what it is. So we have imagine, face verification or faces notification. Or I get to that, too. If you are working with a database of Meta, where every single user puts more than I don't know, 100 pictures. The least they probably put just 100 pictures of them. If you have a 1000000000 of users and you have 20 pictures for every one of them, in some cases probably more than 10,000 for every one of them, and you put a neural net to work in face verification there. Once you have the weight of that, this is working for your test data set. No, mother. Did you overfeed that? Because what you are overfitting is the whole problem. Okay. Whether or not this is what our final network actually does is another question, one that I'll come back to once we see how to train the network. But this is a hope that we might have a sort of goal with the layered structure like this. Moreover, you can imagine how being able to detect edges and patterns like this would be really useful for other image recognition tasks, and even beyond image recognition. There are all sorts of intelligent things you might want to do, that break down into layers of abstraction. For example, forcing speech involves taking raw audio and speaking out to stage sounds which combine to make certain syllables which combine to form words which combine to make up phrases and more abstract thoughts, etc. But getting back to how any of this actually works. Picture yourself right now, designing how exactly the activations in one layer might determine the activations in the next. The goal is to have some mechanism that could conceivably combine pixels into edges or edges into patterns or patterns into digits. And to zoom in on what? Well, let's go to the last part, which is probably the most interesting part, which is how a neural net is able to detect a pattern, and we are going to put a toy sample. But this part is really important for understanding how it will be in another net. So it's told me if you have any questions, and then we are going to see this in a very toy problem, where I think it is easier to see a very specific example. Let's say the hope is for one particular neuron in the second layer to pick up on whether or not the image has an edge in this region here. The question of him is, what program? Okay, in this? A specific neuron, whatever is in the in the yeah. Whatever it is in the in.This mirror is in charge of detecting the button for a particular neuron in the second layer to pick up on whether or not the image has an edge. This has to be activated. If they input this pattern, the question is what parameters should the network have? What dials and knobs should you be able to adjust to capture these patterns? We have weights, which are the trainable parameters of our weights. We have a weight in every connection. We have a connecting of all the preceding neurons, which are represented by numbers between 0 and 1, representing the intensity of the pixel between black and white. We have to tweak the weights so that it is expressive enough to potentially capture this pattern or any other pixel pattern. To pick up on an edge, we might have some negative weights associated with the surrounding pixels. Then the sum is largest when those middle pixels are bright, but the surrounding pixels are darker. We take this weighted sum and pump it into a function that squishes the real number line into the range between 0 and 1, such as the sigmoid function. We also add in some other number, like negative 10, to this weighted sum before plugging it through the sigmoid squishification function. This is the logistic regressor, and we want to have positive weights in certain positions and some negative weights. This will change the sensitivity of the neuron.If I need to put a number that is very positive or just a little bit positive, it is enough to trigger the run. This is the role of the bias. That additional number is called the bias. The weights tell you what pixel pattern this neuron in the second layer is picking up on, and the bias tells you how high the weighted sum needs to be before the neuron starts getting meaningfully active. Every other neuron in this layer is connected to all 784 pixel neurons from the first layer, and each one of those 784 connections has its own weight associated with it, as well as its own bias. All said and done, this network has almost exactly 13,000 total weights and biases. 13,000 knobs and dials that can be tweaked to make this network behave in different ways. So when we talk about learning, what we are referring to is getting the computer to find a valid setting for all of these many numbers, so that it will actually solve the problem. The important part of this is how the network is able to recognize a pattern once it is trained, because those weights are already set. In the next class, we will see how the network is able to recognize different patterns, and finally combine them. What is important for today is to understand how this works. Do you more or less understand how this works? How the network is able to recognize a given pattern by changing the weights? If someone says yes, they fully understand this, then we can finish by discussing building the dot product, putting the sigmoid, and once the weights are set, and whatever numbers you can initialize randomly, this is just by multiplying matrices. There is an infinite number of combinations for putting different weights and biases. We are not going to try to change this using an heuristic or something like brute force combination of weights. What we are going to use is our learning procedure. Our learning procedure is going to minimize the cost function, following just one path which is the path that minimizes the cost function. This is a 13,000-dimensional space, and we are in a point in some moment of that space. What we want to do is to find the global minima or the local minima, if we have something that is non-convex. We are going to do this during training by going down from this surface of 13,000 dimensions, and in our way, we are changing the weights. This is what we call learning. Then the new run is going to be activated. So we can play with those numbers. We can change the weight. We can change the bias. We can change the activation function. But we are going to follow the direction of the gradient. So we are going to go in the opposite direction of the gradient. That is the straight direction to go into a minimum.We are going to have something here similar to Hi. Get that one. We are going to have something more or less one. Okay, because this is my Okay. So if we want this for this time, one of the first things that we are seeing is that this 10 is going to be 0, no matter how is W. One? Because x one is 0. Okay, this is one is 0. So no matter if I'm multiplying for whatever number this is going to be 0. It's certainly the same here, because I have this number a single that's 2. So no matter how is W. 2. This is going to be 0. Okay, now, X 3, out of one, because starting white. and how has to be those W. 3 and W. 4. Well, if I put something like. I am going to create a number which is 40, let me pull it again. I'm going to be. I'm going to put a number. We tease less than 0 if I put those in negative or 0. So what I want to do is to have W. 3 and W. 4 positive. Are you following me? Those are going to be 0, and what they want is to put W. 3 and 4. Let me put it put it in colors. So in red are now they've wait with this configuration. With the configuration of wait. This new one is not going to be activated with this pattern. Why? Because this is going to be 0. This is going to be less than 0. This is going to be less than 0. Well depending on the bias. But forget the bias now in that moment, if they input that this is less than 0. Why is going to be more or less 0 or similar to 0. Okay. So this Newton is not activated. If I put something like this. What do you think that is going to happen? It is that the weights are positive. The weights that are connecting the four input into my new run. What do you think? How is going to be? Why. It's going to be Well, this part is going to be positive this is going to be 0 again. because it's one, and x 2 are so, no matter how our W. One and W, 2, W. 3 is now positive. So this is going to be positive. W. 4. It's going to be positive, and as well as 4 and 3 are one. So this is going to be positive. If we put something which is positive in the nominator or the it might. You can't forget this, because this is for for putting the threshold. If we have this finally fine. they output is going to be similar to similar to one. Okay. Let me go again with this I mean problem with this every day. Just so. But the problem here is in that moment. In that moment this nearer is able to detect this pattern in the sense or in the in in in terms of that, is going to be activated. Are you following me? You find this pattern into the input. Okay. But probably it's not the best idea to have this configuration exactly this configuration. because these. These. This near them. They follow the weights. If all the weights are what's it, Dave? It's not only detecting this pattern is also the tech team I bought them. Where is one? I need to r equal one. So will be the that team also something like what it will be detecting something like like this, with all black, all all on white. and also will be the team. Something like this is black. and this is why so? These new run will be very specific. We'll be detecting whatever pattern we have here, so probably the most interesting will be to have if we want to, the that just this pattern where those this part is white, and this part is black. Probably what we want is something like this. Okay. Because we have this configuration. We have this configuration. This Newton is going to be activated just in the case. We find this pattern. Okay, because in the case that, for instance, we have here all in white. Okay. We have all in white here in the input as W. 3 as We will be penalizing this part. Okay. And we were in February of discovering that X 3 and S. 4 are positive. because we have green or positive weights here in X 3 and S. 4, W. 3, and W. 4 positive. Okay. So with this configuration. This Newton ease the tech team exactly this button.What is prepared for the satellite? This pattern has all the different patterns here, and you can play with the other 15 different buttons. You are going to find less value here something similar to the one, something that could be similar. You have those times in why, but you can move between 0 and something which is not going to be so high as putting this pattern. You see this so just by changing the weights. Yes, by changing the weights, the new run is the team pattern, or all the kind of buttons. We have a lot of neurons working together. Every neuron will be detecting different things. And then what we are going to do in the next layer is combining those information and rejecting those mutants. Okay, so then we have something bigger than this. We want to be a more complex button, probably, as we saw in the video, it is interesting to have a new run that does this simpler problem. Solve this, you are more or less clear. This is the idea. The idea is yes, that by changing it's not very difficult, just by changing the weights again. What are the weights here in the connections? They can be changed by changing the mapping function. Remember, we chain in the linear regression or in the logistic regression of the dry line that is. We have this lobe that was the weight and the bias. Well, here it is certainly the same. We are putting our hypotheses, saying, "Well, I'm going to work with this mathematical function, but those weights. It's something that I can change." So they are where we are putting the flexibility in our mapping. Okay, so just by changing those weights, this number is going to be different. And following the analogy of triggering or not, new run. If we put this in my. This is going to be a number between 0 and 1. 0 is non-activated, 1 is activated. Who is changing the things, not only the inputs which is important, are the weights. Okay, because in my name that I have here all the different patterns, the same patterns. But I put here, see those if I if I put cereals exactly in all my weights. No, mother, how is the input why is going to be exactly see you? On the contrary, if I put very large numbers on the wait. I can do have something similar to one at least one of those 10 B. It's actually one one of the inputs. So by playing with this, I enable by changing that the weights, I'm able to do different patterns into my input. And this is how it's going to work. We are going to have newness that forms or patterns that can be very simple, that can be combined in the next layer. Questions. We haven't done anything right for the sake of understanding or doing the analogy with is activated or not. But really, if you think about it, you have moments where you have something which is non-activated in the negative parties to satellite. So if you have finally the input of the okay, so this is non-activated. Other one you have in the positive part is that the largest, a positive number the largest is that division. And you can play with that. So it's putting more flexibility in the activation. So you can have degrees of activation can be. The new room can be more or less activated. Learn, or we try to do the training with large, deep, neural net with a lot of layers we're finally the gradient that we are propagating is going to become near to zero when we have a we we are not learning anymore. When red. What you are doing is having a larger audience, because you have a positive part that is following the input that you have. But in terms of it's activated or not, it's more or less the same. So it's not linear in that sense. So you have a part with this in the negative part, it's not activated in the positive part it's activated, but could be in that some certain degree okay. Negative activation. If you can get the negative output with it between me as one.Okay, so in some cases, a negative number can't help because it's normalizing the outputs. With Cedo, I'm balanced one, which is interesting with the turboly tanya. Indeed, if you are not using brillo in the figure layers, the best of them is using a negative number because they are working and doing the magic. So the idea of putting some numbers that are negative is making it easier to the debt. But in respect of the pattern, you have W. One and W. 2 as narrative because they are going to penalize those as one and 2 that are positive. Okay, so we are going to represent our images as 3 matrices. The normal thing is to do a RGB representation, so you have a matrix for the red, matrix for the blue, and matrix for the green. We will see that in this case, for a multilateral perception, the things are going to explode. We are going to have so many neurons, and we will need to move to all the more complex topologies like convolution and remnants. But we are going to represent that, and we are going to put all this information so the neural net has to decide itself which information is better or not for solving the problem, which basically is to minimize the cost function. Using the blue part of the matrices of whatever image then it's going to use this blue bar. If not, we'll use the red or the green. So it's happening in satellite images that some of the colors or frequencies are much more important than others. And you see, when you're on that, it's just basically relying all the information into the son of the frequencies. So when we have different things to do, we will see that when we work with different things into an image, for instance, different objects, we will use other kind of techniques. But right now, what we are working with is gas in this matrices with gas, a member of one of the classes. Okay, and we are not detecting the color itself. What we are passing is the color there, and the problem is that the terminal to the same between 0 and 9, no matter the color. Okay, but we will see some samples where you want to be that different. The good thing is that it's going to the side, which is which information is better or not for solving the problem. The bias is setting the threshold that the neuron has to be activated or not. Because I can put a dot product between the weights and the image. I can put that dot product between the weights and the inputs being something really. I mean, there's 1,000. But if I want to put activate this neuron or this neuron has to be activated in that moment. You can put a very large bias and the bias is also renewable and can be 2,000. So what is putting the bias is again more flexibility to the neural net to have different thresholds for every neuron to be activated or not. One of the neurons can be more sensitive to the chain and say, "Well, I'm going to be activated if I have something that is positive, a 10 or something like that."In others, you want to penalize that and say, "No, you have to be activated only if the positive number of the dot product is greater than 1,000. Then they buy. Yes, it is going to be learned to mean 1,000. Okay, so he's putting the threshold. You think about it because I said, my sigmoid of 10 is almost one the same way, for example, if it is 0.7 or something like that. But if you want to say, "Well, I won't just activate this when I have a dot product near to 1,000," then the neuron is going to use the virus. You need something bigger than 1,000 to finally activate the save money. Okay, he is doing this. These are the variables. And the idea is that this again is all a trainable pattern. You are not setting all the neurons with the same threshold. You are allowing the neurons to have different thresholds. Okay, anything else? We got it. We learned how the neurons are able to do that pattern which is the essential part of this sign, the Forum. We will work with globalization also, seeing if I buy something else, and then we move to other fields. Have a good day again. Thank you. |