Linear Regression
In this blog I will be writing about Linear Regression, that is, what is linear regression, finding best fit regression line, checking goodness of fit etc.
Introduction
1. Supervised learning methods: It contains past data with labels which are then used for building the model.
- Regression: The output variable to be predicted is continuous in nature, e.g. scores of a student, diamond prices, etc.
- Classification: The output variable to be predicted is categorical in nature, e.g.classifying incoming emails as spam or ham, Yes or No, True or False, 0 or 1.
2. Unsupervised learning methods: It contains no predefined labels assigned to the past data.
- Clustering: No predefined labels are assigned to groups/clusters formed,e.g. customer segmentation.
What is Regression?
Regression is a statistical method used in finance, investing, and other disciplines that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables).
Uses of Regression
Well there are plenty of use cases for regression but I will mention 3 major applications.
Three major uses for regression analysis are:
- Determining the strength of predictors
- Forecasting an effect
- Trend forecasting
Simple Linear Regression
Linear regression is a quiet and the simplest statistical regression method used for predictive analysis in machine learning. Linear regression shows the linear relationship between the independent(predictor) variable i.e. X-axis and the dependent(output) variable i.e. Y-axis, called linear regression. If there is a single input variable X(independent variable), such linear regression is called simple linear regression.
The above graph presents the linear relationship between the output(y) variable and predictor(X) variables. The blue line is referred to as the best fit straight line. Based on the given data points, we attempt to plot a line that fits the points the best.
To calculate best-fit line linear regression uses a traditional slope-intercept form which is given below,
But how the linear regression finds out which is the best fit line?
Random Error(Residuals)
In regression, the difference between the observed value of the dependent variable(yi) and the predicted value(predicted) is called the residuals.
εi = ypredicted – yi (actual)
What is the best fit line?
In simple terms, the best fit line is a line that fits the given scatter plot in the best way. Mathematically, the best fit line is obtained by minimizing the Residual Sum of Squares(RSS).
Cost Function for Linear Regression
The cost function helps to work out the optimal values for B0 and B1, which provides the best fit line for the data points.
In Linear Regression, generally Mean Squared Error (MSE) cost function is used, which is the average of squared error that occurred between the ypredicted and yi.
We calculate MSE using
Using the MSE function, we’ll update the values of B0 and B1 such that the MSE value settles at the minima. These parameters can be determined using the gradient descent method such that the value for the cost function is minimum.
Some of cost function are: MSE, MAE, RMSE, HUBER LOSS.
MSE :
- This equation also has one global minima.
MAE :
MAE evaluates the absolute distance of the observations (the entries of the dataset) to the predictions on a regression, taking the average over all observations. We use the absolute value of the distances so that negative errors are accounted properly. This is exactly the situation described on the image above.
A backlash in MSE is the fact that the unit of the metric is also squared, so if the model tries to predict price in US
Gradient Descent for Linear Regression
To optimize the changes of B0 and B1 values
Let’s take an example to understand this. Imagine a U-shaped pit. And you are standing at the uppermost point in the pit, and your motive is to reach the bottom of the pit. Suppose there is a treasure at the bottom of the pit, and you can only take a discrete number of steps to reach the bottom. If you opted to take one step at a time, you would get to the bottom of the pit in the end but, this would take a longer time. If you decide to take larger steps each time, you may achieve the bottom sooner but, there’s a probability that you could overshoot the bottom of the pit and not even near the bottom. In the gradient descent algorithm, the number of steps you’re taking can be considered as the learning rate , and this decides how fast the algorithm converges to the minima. were Learning rate is speed of convergence.
To update B0 and B1, we take gradients from the cost function. To find these gradients, we take partial derivatives for B0 and B1.
Performance Metrics:
Here in r square when gender feature is add then the r square value is increase were the feature is not linear to dependent feature.
In adjusted r square if the feature is important or linear to the dependent feature then the adjusted r square value will increase.
Overfitting And Underfitting:
Overfitting:
A Statistical model is said to be overfitted when the model does not make accurate predictions on testing data. When a model gets trained with so much data,it starts learning from the noise and inaccurate data entries in our data set. And when testing with test data results in High variance. And when testing with train data results in low bias.
Underfitting:
Your model is underfitting the training data when the model performs poorly on the training data. This is because the model is unable to capture the relationship between the input examples(often called X) and the Target values(often called Y).
And when testing with test data results in High/Low variance. And when testing with train data results in High bias.
Comments
Post a Comment