Deep Learning!!
I have always wondered about the concept of perception i.e. how does our brain understand the objects we see? How does it know the difference between a cat and a dog? How are we able to differentiate people from each other? The most amazing thing is how does the brain identify the same person over the years even when they age? I haven't got answers to most of these questions, but what if I say there are technologies out there today that can do this same task as a human being in terms of perceiving objects? Yes, it has become a reality!! Take for example the iPhone X's face recognition based locking system. Isn't is amazing? There are many more applications of the same technology. When we dig a bit deep into what is the most commonly used technique in these applications, we find that the answer is neural networks. To be more specific, people have started using a neural network based technique called Deep Learning extensively for this purpose.
Now, this post is intended to provide a basic conceptual overview of this technique of Deep Learning starting from what is a neural network, what is an activation function, how do we build a simple deep neural network model and what error measures are used in a deep neural network to optimize model predictions. The article is more of an overview of all the terms one might hear while working in the Machine Learning or Deep Learning space. Lets get started!
WHAT is Deep Learning?
Simply put, Deep Learning is one of those techniques used in Machine learning for pattern recognition. It has become one of the most common methods in building an AI in recent times. Technically a Deep Neural Network(DNN) is nothing but a neural network which is deep i.e there are layers after layers trying to identify patterns in the data. One beauty of this method is that as we move forward from the input layer through the hidden layers to the output layer, the size of patterns being identified gets bigger and bigger thus helping in predicting objects at the final layer.
For example, consider the process of identifying a human face in a picture. The DNN would start with identifying segments, mostly edges like dark horizontal strips, long vertical strips and dark round shapes. In the next layer, the network would try finding associations between these edges identified and comes up with bigger patterns like the eye brows, eyes, nose and so on. In the final layer the network will be able to identify the face from an image by using all these features identified in the previous layers. This is shown in the image below.
Deep Learning!!
I have always wondered about the concept of perception i.e. how does our brain understand the objects we see? How does it know the difference between a cat and a dog? How are we able to differentiate people from each other? The most amazing thing is how does the brain identify the same person over the years even when they age? I haven't got answers to most of these questions, but what if I say there are technologies out there today that can do this same task as a human being in terms of perceiving objects? Yes, it has become a reality!! Take for example the iPhone X's face recognition based locking system. Isn't is amazing? There are many more applications of the same technology. When we dig a bit deep into what is the most commonly used technique in these applications, we find that the answer is neural networks. To be more specific, people have started using a neural network based technique called Deep Learning extensively for this purpose.
Now, this post is intended to provide a basic conceptual overview of this technique of Deep Learning starting from what is a neural network, what is an activation function, how do we build a simple deep neural network model and what error measures are used in a deep neural network to optimize model predictions. The article is more of an overview of all the terms one might hear while working in the Machine Learning or Deep Learning space. Lets get started!
WHAT is Deep Learning?Simply put, Deep Learning is one of those techniques used in Machine learning for pattern recognition. It has become one of the most common methods in building an AI in recent times. Technically a Deep Neural Network(DNN) is nothing but a neural network which is deep i.e there are layers after layers trying to identify patterns in the data. One beauty of this method is that as we move forward from the input layer through the hidden layers to the output layer, the size of patterns being identified gets bigger and bigger thus helping in predicting objects at the final layer.
For example, consider the process of identifying a human face in a picture. The DNN would start with identifying segments, mostly edges like dark horizontal strips, long vertical strips and dark round shapes. In the next layer, the network would try finding associations between these edges identified and comes up with bigger patterns like the eye brows, eyes, nose and so on. In the final layer the network will be able to identify the face from an image by using all these features identified in the previous layers. This is shown in the image below.
The major applications for deep learning are:
- Image classification (Convolutional Neural Networks(CNN) are generally used)
- Text classification (Recurrent Neural Networks(RNN's) are generally used)
- Speech processing (LSTM's are generally used )
The explanation for each of these types of neural networks is out of scope for this article and will be covered in separate articles going forward.
WHY Deep Learning??
If we closely observe the method, we may realize that this might be the way in which our human brain is able to identify objects around us. May be this is the basic concept of perception. That is the motivation behind deep neural networks. The general idea is to mimic the way the neurons in our brain identify patterns. The neurons are called nodes and they transmit the information from one to another.
But what is the differentiating factor between a simple neural network and a deep neural network? The answer is not that straight forward. But there are differences if we consider the architecture, flexibility and the prediction accuracy between a neural network and a Deep Learning model. In case of conventional neural networks, mostly each node is connected to every other node when we move from one layer to the other. In case of a Deep Neural Network, we can restrict the number of nodes participating in the decision making process, which is more representative of what our brain does in most cases. Also, a conventional neural network mostly uses a feed forward technique for prediction. A deep neural net makes use of the concept of back propagation on top of the feed forward technique through which error can be reduced at every step by updating the weights in the network accordingly.
HOW is Deep Learning implemented?
Architecture:
The simplest architecture of a Deep Neural network will consist of an input layer, one or more hidden layers and an output layer. Each layer can contain any number of nodes. Every node can be connected to any other node in the subsequent layer with a weight associated to it. The input layer is nothing but the list of nodes representing the values of the features being used for the prediction. The number of levels in the output of the classification or regression problem determines the number of nodes in the output layer. For example, in the case of a digit recognizer, the output can have values between 0 to 9 and so there will be 10 nodes at the output layer.
The hidden layer can have any number of nodes and the value of a node in the hidden layer is computed as the sum of products between the input value to the node and the corresponding weights. The math behind the computation is fairly simple.
The major applications for deep learning are:
- Image classification (Convolutional Neural Networks(CNN) are generally used)
- Text classification (Recurrent Neural Networks(RNN's) are generally used)
- Speech processing (LSTM's are generally used )
But what is the differentiating factor between a simple neural network and a deep neural network? The answer is not that straight forward. But there are differences if we consider the architecture, flexibility and the prediction accuracy between a neural network and a Deep Learning model. In case of conventional neural networks, mostly each node is connected to every other node when we move from one layer to the other. In case of a Deep Neural Network, we can restrict the number of nodes participating in the decision making process, which is more representative of what our brain does in most cases. Also, a conventional neural network mostly uses a feed forward technique for prediction. A deep neural net makes use of the concept of back propagation on top of the feed forward technique through which error can be reduced at every step by updating the weights in the network accordingly.
HOW is Deep Learning implemented?
Architecture:
The simplest architecture of a Deep Neural network will consist of an input layer, one or more hidden layers and an output layer. Each layer can contain any number of nodes. Every node can be connected to any other node in the subsequent layer with a weight associated to it. The input layer is nothing but the list of nodes representing the values of the features being used for the prediction. The number of levels in the output of the classification or regression problem determines the number of nodes in the output layer. For example, in the case of a digit recognizer, the output can have values between 0 to 9 and so there will be 10 nodes at the output layer.
The hidden layer can have any number of nodes and the value of a node in the hidden layer is computed as the sum of products between the input value to the node and the corresponding weights. The math behind the computation is fairly simple.
Activation Function:
Our brain works in a way such that when we perceive an image, there are certain neurons that are activated and certain neurons that are not activated. This is the mechanism which is helping humans identify cats as cats and dogs as dogs even though they have a lot of common features. This mechanism is implemented in deep learning by using the concept of activation function. An activation function determines whether a node should be included for making predictions or not. In earlier days, the commonly used activation function was the s-shaped tanh function. But in recent times the most commonly used activation function is the reLu(rectified linear unit) function. There are several other activation functions which can be used.
Measure of Error:
As in the case of any machine learning algorithm, once the architecture or framework for the model has been built, the model is trained using the input data and the objective is to reduce the cost function or the error. The most commonly used measure of error in Deep Learning models is the MSE(Mean squared error) in case of predicting a continuous variable or a CrossEntropy measure in case of predicting a categorical variable.
Optimization techniques:
The crux of a Deep Learning algorithm is in identifying the right values to the parameters(weights between layers) for making accurate predictions. In order to get the values for the parameters, the Deep Neural network is trained using the training data in such a way that the overall error measure converges to its minimal value. There are multiple optimization techniques that can be used out of which the most commonly used ones are stochastic gradient descent(sgd) and ADAM optimizer. The choice of the optimizer again depends on the type of problem being solved. Once the training is completed, the neural network will have a final set of values for the weights for the nodes. These weights are then used for making predictions on a future unknown input data.
This article presented an overview of the basic concepts that one needs to be familiar with while working on building deep learning models. In any technological field there is a strong notion that one should understand the why, what and how of the technology and in that aspect, this article presented a high level overview about Deep Learning. Hope this was helpful as a starting step towards understanding deeper concepts. Kindly leave your comments and suggestions below.
Happy learning!
Cheers!
Renga
References and image courtesy:
https://www.datarobot.com/blog/a-primer-on-deep-learning/
https://ethervision.net/neural-network-applications-business/
Activation Function:
Our brain works in a way such that when we perceive an image, there are certain neurons that are activated and certain neurons that are not activated. This is the mechanism which is helping humans identify cats as cats and dogs as dogs even though they have a lot of common features. This mechanism is implemented in deep learning by using the concept of activation function. An activation function determines whether a node should be included for making predictions or not. In earlier days, the commonly used activation function was the s-shaped tanh function. But in recent times the most commonly used activation function is the reLu(rectified linear unit) function. There are several other activation functions which can be used.
Our brain works in a way such that when we perceive an image, there are certain neurons that are activated and certain neurons that are not activated. This is the mechanism which is helping humans identify cats as cats and dogs as dogs even though they have a lot of common features. This mechanism is implemented in deep learning by using the concept of activation function. An activation function determines whether a node should be included for making predictions or not. In earlier days, the commonly used activation function was the s-shaped tanh function. But in recent times the most commonly used activation function is the reLu(rectified linear unit) function. There are several other activation functions which can be used.
Measure of Error:
As in the case of any machine learning algorithm, once the architecture or framework for the model has been built, the model is trained using the input data and the objective is to reduce the cost function or the error. The most commonly used measure of error in Deep Learning models is the MSE(Mean squared error) in case of predicting a continuous variable or a CrossEntropy measure in case of predicting a categorical variable.
As in the case of any machine learning algorithm, once the architecture or framework for the model has been built, the model is trained using the input data and the objective is to reduce the cost function or the error. The most commonly used measure of error in Deep Learning models is the MSE(Mean squared error) in case of predicting a continuous variable or a CrossEntropy measure in case of predicting a categorical variable.
Optimization techniques:
The crux of a Deep Learning algorithm is in identifying the right values to the parameters(weights between layers) for making accurate predictions. In order to get the values for the parameters, the Deep Neural network is trained using the training data in such a way that the overall error measure converges to its minimal value. There are multiple optimization techniques that can be used out of which the most commonly used ones are stochastic gradient descent(sgd) and ADAM optimizer. The choice of the optimizer again depends on the type of problem being solved. Once the training is completed, the neural network will have a final set of values for the weights for the nodes. These weights are then used for making predictions on a future unknown input data.
The crux of a Deep Learning algorithm is in identifying the right values to the parameters(weights between layers) for making accurate predictions. In order to get the values for the parameters, the Deep Neural network is trained using the training data in such a way that the overall error measure converges to its minimal value. There are multiple optimization techniques that can be used out of which the most commonly used ones are stochastic gradient descent(sgd) and ADAM optimizer. The choice of the optimizer again depends on the type of problem being solved. Once the training is completed, the neural network will have a final set of values for the weights for the nodes. These weights are then used for making predictions on a future unknown input data.
This article presented an overview of the basic concepts that one needs to be familiar with while working on building deep learning models. In any technological field there is a strong notion that one should understand the why, what and how of the technology and in that aspect, this article presented a high level overview about Deep Learning. Hope this was helpful as a starting step towards understanding deeper concepts. Kindly leave your comments and suggestions below.
Happy learning!
Cheers!
Renga
Happy learning!
Cheers!
Renga
References and image courtesy:
https://www.datarobot.com/blog/a-primer-on-deep-learning/
https://ethervision.net/neural-network-applications-business/
https://www.datarobot.com/blog/a-primer-on-deep-learning/
https://ethervision.net/neural-network-applications-business/
Comments
Post a Comment