Different types of Supervised Machine Learning Models
Machine learning has gained a rapid traction in the recent decade. We see how different machine learning models are created and being deployed in different applications respectively.
With a lot of applications being created with machine learning, it becomes necessary that people learn to use this technology which ensures that these applications reach a lot of people and could have a significant impact in the society.
When dealing with data, it is good to divide the overall data into three parts namely the training data, cross-validation data and the test data respectively. It would be a good split if we could take about 70 percent of our values as the training data, and 15 percent into cross-validation and the other 15 percent as the test set respectively.
During the training, the machine learning models generally learn the most important parameters that are important for prediction. We would later have to change some of the hyperparameters that ensure that we get the best accuracy on the cross-validation data. Finally, we are going to be getting the model that performs well on the cross-validation data. Later, we have to use the test data to see how well it performs on the data that it has not seen before. This ensures that we are using our models to test the unseen data that is very much similar to real-world data. Therefore, we could deploy the machine learning and deep learning models after ensuring that they reach a certain values for some metrics that we are taking into consideration. Later, we could be deploying those algorithms after hyperparameter tuning on the cross-validation data.
It would be good to go over some of the important machine learning models that are currently being used in different industries. Below are some of the machine learning models along with different techniques that are important for machine learning.
Below are a list of supervised machine learning models.
- Linear Regression
- Logistic Regression
- kNN
- Naive Bayes
- K-Means
- Random Forest
- Dimensionalty Reduction Algorithms
- SVM
- Decision Tree
- Gradient Boosting Algorithms
Linear Regression
When we are using the linear regression machine learning model, one must understand that it is applicable for regression problems where the output that we are going to be predicting is continuous. With the aid of linear regression, we can use it for different applications such as predicting the prices of houses, predicting the sales returns and predicting the weather where the output values are continuous.
In the linear regression machine learning model, we are going to give different features that are important for the machine learning predictions. The linear regression model would take these features and assign weights to these values so that those features would be influential when predicting the output label.
In the above figure, we considered only 2 features just to plot the data. Usually, the independent variable would be on the X-axis and the dependent variable would be on the Y-axis respectively. In linear regression, the dependent variable is predicted with the help of all the independent variables. In this method, the distance between the prediction line and the actual output values is minimized leading to better predictions.
Logistic Regression
Logistic regression is similar to linear regression except that in the former, there is an additional layer which is a sigmoid function that vacillates between the values 0 and 1 respectively. Moreover, there can be multi-class classification problems in Logistic regression where it follows a one-vs-rest strategy to optimize some of the weights and biases respectively. As a result, we have a discrete set of output values we give to the machine learning models respectively.
From the image above, we see that we are using a logistic regression approach where there is a sigmoid function whose values lie between 0 and 1 respectively. We are going to take those values and we are going to make predictions in the future about the overall working of the machine learning algorithms respectively.
K-NN
In this machine learning model, we are going to be predicting the output based on the number of nearest neighbors output values and we are going to make the decision based on the count of the majority class. Therefore, we are going to select the value of ‘k’ which is the hyperparameter which is nothing but the number of nearest neighbors that we are going to be using in the long term.
Naive Bayes
When using naive bayes machine learning model, we are going to be using the bayes theorem which ensures that we get the probability values based on conditional probability. Below are some of the equations related to conditional probability and bayes theorem.
In Naive Bayes, we are assuming that there is no dependency between features and the occurence of one feature does not impede or increase the chances of teh other feature respectively. That’s the reason we call it naive bayes where naive stands for the assumption that the features are independent of each other. Therefore, one cannot the features to be independent all the time during the machine learning applications. There could be a few errors in between and we might not be able to get the most accurate values. But it is a good algorithm that could be used for predictions respectively.
K-Means
There could be a possibility for the datasets to be some clusters and they having separate labels for themselves. Therefore, it would be a good idea to divide the overall data along with their features into clusters so that the K-means algorithm would be able to identify those values and generate output labels to them.
Below is a 2D representation of K-means clustering algorithm respectively. It means that, in general, we are only considering only 2 features and trying to separate them into clusters.
In K-means algorithm, one of the hyperparameters to be taken into account is the number of clusters that we want the algorithm to use in dividing the data respectively. Once that parameter is specified, it would work it’s way and ensures that there are clusters formed every iteration by taking the average of those features respectively.
Random Forest
Before understanding the working of Random Forests, it would be a good idea to understand how a decision tree works respectively. Decision trees take into account the overall information gain that is important for machine learning operations respectively. It it going to take the features and it is going to divide those features based on how much information is gained when a particular features is taken as basis for division. Therefore, the decision trees always take the if-and-else conditions when making decisions and therefore, it does not take a really long time to train the decision trees.
There are some problems associated with using the decision trees. Sometimes, there is a possibility for the decision tree models to overfit whichi is nothing but learning too much from the training data but the model not being able to generalize to new examples. Therefore, we are going to be using random forests which is a topic that must be discussed after learning about decision trees.
In random forests, we are going to take into account many decision trees and we are not going to consider very high depth for these decision trees. Later when we are doing the machine learning predictions, we are going to take the majority class prediction as the actual output prediction in the case of classification problems. In the case of regression, we are going to just take the average of the outputs of all the decision trees in order to get the predictions respectively.
As can be seen above, there are 9 decision trees that are used when making the predictions for the random forests models. We are going to take the values and prediction to be “1” as it is the majority class. In the case of regression problems, we are just going to take the average of all the outputs that are being generated the different decision trees in the random forest models respectively.
Dimensionality Reduction Algorithm
Sometimes it is possible for the machine learning models to overfit due to having a lot of features and a few training examples. This would be due to a lot of features in the data set that we are using and working. Therefore, we have to use dimensionality reduction techniques that ensure that we are getting the best results in the test set respectively.
In the above image, we see that there is a 3D image that we have taken into consideration. We are going to convert that 3D image into a 2D image using dimensionality reduction technique respectively. Since this image is just a 3D image, we are able to plot it. However, if we consider N-dimensional space or features, we cannot plot it like we did it here. Therefore, we can see how the 3D image is being taken and converted into 2D while preserving the images. This ensures that “Curse of Dimensionality” is reduced respectively.
SVM
In support vector machines, there are decision boundaries which are also called hyperplanes that try to maximize the distance between the support vectors respectively. Below is a figure about how the support vector machines work when performing the machine learning analysis.
In the above image, we see that there is a line that separates the class 1 and class 2 respectively. The dashed lines represent the lines that are parallel to the decision boundary and which intersect the support vectors as can be seen above. The decision boundary would be used which tries to maximize the distance between the “class-1” and “class-2”. For that, there are various optimization metrics that we have to consider and we have to consider different loss values so that it could be used for machine learning respectively.
Decision Tree
Decision trees are the machine learning models that would give a good idea about the feature importance of different features that we are giving in the data for the machine learning predictions respectively. Below, we see how a decision tree would be classified and the mechanisms that it uses in order to get the best outputs. It would first take the most important feature that would give a lot of information about the data. Later, it would keep dividing the data until we get the list of if-else conditions that would ensure that we are getting the best output results.