Do you have logs of wood at home? Set them in an organized way according to their weight. BUT, you can’t weigh them. You must guess by observing the height and its diameter ( visual analysis ) to organize them by combining these “visible” parameters. That’s a linear regression .
What do we do? We establish a relationship between independent and dependent variables by fitting them to one line, our regression line, represented as Y = a * X + b
How do we calculate coefficients a & b? minimizing the sum of the squared difference of distance between data points and the regression line.
We will talk about 0s and 1s, our discrete values and we will have to estimate those from a set of independent variables.
The model will try to fit a line to those points, it cannot be a straight line but rather a S-shape curve. It is also called logit regression .
What if we want to improve our logistic regression models? Include interaction terms, eliminate features, regularize techniques or use a non-linear model.
One of Data Scientist’s favorite algorithms nowadays.
This is a supervised learning algorithm used for classifying problems. We will go down from the observations of a certain item (represented in the branches) to conclusions about the item’s target value (represented in the leaves). It works well classifying for both categorical and continuous dependent variables.
How does it work? We split the population into two or more homogeneous sets based on the most significant attributes / independent variables and… We go down!
SVM is a method of classification in which the algorithm plot raw data as points in an n-dimensional space, where n is the number of features you have. Each feature is then tied to a particular coordinate, making it easy to classify the data.
What makes the algorithm special? We will take apples and oranges as an example. Instead of looking within the group for those that are best characterized as being an apple and an orange, it will look for cases in which an apple is more like an orange, the most extreme case. That will be our Support vector! Once identified, we will use lines called “classifiers” to split the data and plot them on a graph.
The Naïve Bayes Classifier. It is a classification technique based on Bayes’ theorem with an assumption of independence between predictors. Naive Bayes is “naive” because it makes the assumption that features of a measurement are independent of each other.
A quick and easy way to predict classes, for binary and multiclass classification problems.
Imagine that you have to visit someone, who better than your friends or family, right?
Well this works the same. I WANT TO BE WITH MY FRIENDS .
We can apply this algorithm to both classification problems and regression problems. How does it work? Simple, we will arrange all the available cases and classify the new cases by the majority vote of their K neighbors. The case will be assigned to the class or category with the one that has the most in common. How is the measurement carried out? With a distance function, generally Euclidean distance.
Note: KNN is computationally expensive and variables should be normalized.
It is an unsupervised algorithm that solves clustering problems. Oranges with oranges, apples with apples. Data sets are classified into a particular number of clusters, our k number, so that all the data points within a cluster are homogenous and at the same time heterogeneous from data in other clusters. It’s simple:
And step 4 again…5, and again, and again…
The fundamental idea behind a random forest is to combine many decision trees into a single model.
Why? A large number of relatively uncorrelated models (trees) operating as a committee will outperform any of the individual constituent models.
This models ensures bagging (allowing each individual tree to randomly sample from the dataset with replacement, resulting in different trees) and feature randomness (where each tree can pick only from a random subset of features, forcing even more variation) .
Although it seems illogical, “the more the better” is not always right. Thanks to this type of algorithms, we reduce the number of random variables under consideration by obtaining a set of principal variables. PCA (Principal Component Analysis), Missing Value Ratio or Factor Analysis are some examples of it.
OUR WINNER! Its main goal? Make predictions with high precision.
We will combine multiple “weak or average” predictors to build a strong predictor. It combines the predictive power of several basic estimators to improve the robustness of the model.
It’s the algorithm par excellence and the most preferred today.