Skip to main content

An Introduction to Deep Learning

Deep Learning!!

I have always wondered about the concept of perception i.e. how does our brain understand the objects we see? How does it know the difference between a cat and a dog? How are we able to differentiate people from each other? The most amazing thing is how does the brain identify the same person over the years even when they age? I haven't got answers to most of these questions, but what if I say there are technologies out there today that can do this same task as a human being in terms of perceiving objects? Yes, it has become a reality!! Take for example the iPhone X's face recognition based locking system. Isn't is amazing? There are many more applications of the same technology. When we dig a bit deep into what is the most commonly used technique in these applications, we find that the answer is neural networks. To be more specific, people have started using a neural network based technique called Deep Learning extensively for this purpose.
Now, this post is intended to provide a basic conceptual overview of this technique of Deep Learning starting from what is a neural network, what is an activation function, how do we build a simple deep neural network model and what error measures are used in a deep neural network to optimize model predictions. The article is more of an overview of all the terms one might hear while working in the Machine Learning or Deep Learning space. Lets get started!

WHAT is Deep Learning?
Simply put, Deep Learning is one of those techniques used in Machine learning for pattern recognition. It has become one of the most common methods in building an AI in recent times. Technically a Deep Neural Network(DNN) is nothing but a neural network which is deep i.e there are layers after layers trying to identify patterns in the data. One beauty of this method is that as we move forward from the input layer through the hidden layers to the output layer, the size of patterns being identified gets bigger and bigger thus helping in predicting objects at the final layer.
For example, consider the process of identifying a human face in a picture. The DNN would start with identifying segments, mostly edges like dark horizontal strips, long vertical strips and dark round shapes. In the next layer, the network would try finding associations between these edges identified and comes up with bigger patterns like the eye brows, eyes, nose and so on. In the final layer the network will be able to identify the face from an image by using all these features identified in the previous layers. This is shown in the image below. 






The major applications for deep learning are:
  • Image classification (Convolutional Neural Networks(CNN) are generally used)
  • Text classification (Recurrent Neural Networks(RNN's) are generally used) 
  • Speech processing (LSTM's are generally used ) 
The explanation for each of these types of neural networks is out of scope for this article and will be covered in separate articles going forward.

WHY Deep Learning??
If we closely observe the method, we may realize that this might be the way in which our human brain is able to identify objects around us. May be this is the basic concept of perception. That is the motivation behind deep neural networks. The general idea is to mimic the way the neurons in our brain identify patterns. The neurons are called nodes and they transmit the information from one to another.
But what is the differentiating factor between a simple neural network and a deep neural network? The answer is not that straight forward. But there are differences if we consider the architecture, flexibility and the prediction accuracy between a neural network and a Deep Learning model. In case of conventional neural networks, mostly each node is connected to every other node when we move from one layer to the other. In case of a Deep Neural Network, we can restrict the number of nodes participating in the decision making process, which is more representative of what our brain does in most cases. Also, a conventional neural network mostly uses a feed forward technique for prediction. A deep neural net makes use of the concept of back propagation on top of the feed forward technique through which error can be reduced at every step by updating the weights in the network accordingly.
HOW is Deep Learning implemented?

Architecture:
The simplest architecture of a Deep Neural network will consist of an input layer, one or more hidden layers and an output layer. Each layer can contain any number of nodes. Every node can be connected to any other node in the subsequent layer with a weight associated to it. The input layer is nothing but the list of nodes representing the values of the features being used for the prediction. The number of levels in the output of the classification or regression problem determines the number of nodes in the output layer. For example, in the case of a digit recognizer, the output can have values between 0 to 9 and so there will be 10 nodes at the output layer.
The hidden layer can have any number of nodes and the value of a node in the hidden layer is computed as the sum of products between the input value to the node and the corresponding weights. The math behind the computation is fairly simple.






Activation Function:
Our brain works in a way such that when we perceive an image, there are certain neurons that are activated and certain neurons that are not activated. This is the mechanism which is helping humans identify cats as cats and dogs as dogs even though they have a lot of common features. This mechanism is implemented in deep learning by using the concept of activation function. An activation function determines whether a node should be included for making predictions or not. In earlier days, the commonly used activation function was the s-shaped tanh function. But in recent times the most commonly used activation function is the reLu(rectified linear unit) function. There are several other activation functions which can be used.
Measure of Error:
As in the case of any machine learning algorithm, once the architecture or framework for the model has been built, the model is trained using the input data and the objective is to reduce the cost function or the error. The most commonly used measure of error in Deep Learning models is the MSE(Mean squared error) in case of predicting a continuous variable or a CrossEntropy measure in case of predicting a categorical variable.
Optimization techniques:
The crux of a Deep Learning algorithm is in identifying the right values to the parameters(weights between layers) for making accurate predictions. In order to get the values for the parameters, the Deep Neural network is trained using the training data in such a way that the overall error measure converges to its minimal value. There are multiple optimization techniques that can be used out of which the most commonly used ones are stochastic gradient descent(sgd) and ADAM optimizer. The choice of the optimizer again depends on the type of problem being solved. Once the training is completed, the neural network will have a final set of values for the weights for the nodes. These weights are then used for making predictions on a future unknown input data.
This article presented an overview of the basic concepts that one needs to be familiar with while working on building deep learning models. In any technological field there is a strong notion that one should understand the why, what and how of the technology and in that aspect, this article presented a high level overview about Deep Learning. Hope this was helpful as a starting step towards understanding deeper concepts. Kindly leave your comments and suggestions below.

Happy learning!

Cheers!
Renga 
References and image courtesy:
https://www.datarobot.com/blog/a-primer-on-deep-learning/
https://ethervision.net/neural-network-applications-business/

Comments

Popular posts from this blog

Inferential Statistics & Hypothesis testing

WHY Inferential Statistics and Hypothesis testing?? In today’s world, with the abundance in data, every organization wishes to optimize on their products by making it more customer centric. To do that, they need to develop an understanding of the population. In order to understand the population, companies need to categorize customers based on certain parameters like age, education, location, life style, income and so on. But in most cases, it is extremely difficult to collect data for the entire population. That is where the concept of samples and sampling comes into the picture. Sampling is the process of selecting a subset from a population that would be representative of the population. The idea is to perform the analysis on a sample and make inferences about the population. Let us look at a simple example to understand this concept. Consider a case where we want to find the average height of men and women in the city of Cincinnati and compare it to a value that is cl

Introduction and Motivation

Hello All - When I look around, I am fascinated by the way the world has transformed over the past decade with the digital revolution and data democratization. Gadgets have become ubiquitous and business decisions are driven by the data obtained from numerous data points available across platforms, impacting a plethora of fields like education, healthcare, retail, banking, sports and so on. Data Science and analytics being the core of this transformation, I am one among those who are fascinated about the field and aspiring to be at the forefront of this revolution. I am an engineer at heart which makes me the guy with a WHY wherever I go. Over the years I have also realized that learning and sharing knowledge is really important in a technical field like data science. I strongly believe that knowledge sharing is one of the most important asset for any individual and in that regard, this series of blog posts are a set of insights that I have gained with respect to Data Science

Data Analytics: A brief overview

Analytics!! There has been a lot of talk about data and analytics in the past decade. But what exactly is data analytics? Why has it become a buzz word in recent times? If data can help companies make decisions, how can organizations take this approach? Why was this domain not prominent earlier? Being a data science enthusiast and practitioner myself, I wanted to try answering these questions which I have been frequently hearing from people around me. This post is a compilation of my understanding of what is analytics along with my take on the questions in a brief and simple context. WHY Data Analytics?? In the past 10 to 20 years, we have witnessed a tremendous growth in Digitization. Everything around us has been digitized in some form or the other. All these digital transformations have led to a humongous digital imprint in the world which can be used to harness a lot of insights for businesses ranging from retail, banking, manufacturing, healthcare and so on.