by Moshe Binieli

Machine learning: an introduction to mean squared error and regression lines

gTEepgnyLsstSg1f91QYpxRBtTlPz0jZkSPG — Introduction image

Introduction

This article will deal with the statistical method mean squared error, and I’ll describe the relationship of this method to the regression line.

The example consists of points on the Cartesian axis. We will define a mathematical function that will give us the straight line that passes best between all points on the Cartesian axis.

And in this way, we will learn the connection between these two methods, and how the result of their connection looks together.

General explanation

This is the definition from Wikipedia:

In statistics, the mean squared error (MSE) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors — that is, the average squared difference between the estimated values and what is estimated. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive (and not zero) is because of randomness or because the estimator does not account for information that could produce a more accurate estimate.

The structure of the article

Get a feel for the idea, graph visualization, mean squared error equation.
The mathematical part which contains algebraic manipulations and a derivative of two-variable functions for finding a minimum. This section is for those who want to understand how we get the mathematical formulas later, you can skip it if that doesn’t interest you.
An explanation of the mathematical formulae we received and the role of each variable in the formula.
Examples

Get a feel for the idea

Let’s say we have seven points, and our goal is to find a line that minimizes the squared distances to these different points.

Let’s try to understand that.

I will take an example and I will draw a line between the points. Of course, my drawing isn’t the best, but it’s just for demonstration purposes.

MNskFmGPKuQfMLdmpkT-X7-8w2cJXulP3683 — Points on a simple graph.

You might be asking yourself, what is this graph?

the purple dots are the points on the graph. Each point has an x-coordinate and a y-coordinate.
The blue line is our prediction line. This is a line that passes through all the points and fits them in the best way. This line contains the predicted points.
The red line between each purple point and the prediction line are the errors. Each error is the distance from the point to its predicted point.

You should remember this equation from your school days, y=Mx+B, where M is the slope of the line and B is y-intercept of the line.

We want to find M (slope) and B (y-intercept) that minimizes the squared error!

Let’s define a mathematical equation that will give us the mean squared error for all our points.

hmZydSW9YegiMVPWq2JBpOpai3CejzQpGkNG — General formula for mean squared error.

Let’s analyze what this equation actually means.

In mathematics, the character that looks like weird E is called summation (Greek sigma). It is the sum of a sequence of numbers, from i=1 to n. Let’s imagine this like an array of points, where we go through all the points, from the first (i=1) to the last (i=n).
For each point, we take the y-coordinate of the point, and the y’-coordinate. The y-coordinate is our purple dot. The y’ point sits on the line we created. We subtract the y-coordinate value from the y’-coordinate value, and calculate the square of the result.
The third part is to take the sum of all the (y-y’)² values, and divide it by n, which will give the mean.

Our goal is to minimize this mean, which will provide us with the best line that goes through all the points.

From concept to mathematical equations

This part is for people who want to understand how we got to the mathematical equations. You can skip to the next part if you want.

As you know, the line equation is y=mx+b, where m is the slope and b is the y-intercept.

Let’s take each point on the graph, and we’ll do our calculation (y-y’)².
But what is y’, and how do we calculate it? We do not have it as part of the data.

But we do know that, in order to calculate y’, we need to use our line equation, y=mx+b, and put the x in the equation.

From here we get the following equation:

Let’s rewrite this expression to simplify it.

Let’s begin by opening all the brackets in the equation. I colored the difference between the equations to make it easier to understand.

Now, let’s apply another manipulation. We will take each part and put it together. We will take all the y, and (-2ymx) and etc, and we will put them all side-by-side.

At this point we’re starting to be messy, so let’s take the mean of all squared values for y, xy, x, x².

Let’s define, for each one, a new character which will represent the mean of all the squared values.

Let’s see an example, let’s take all the y values, and divide them by n since it’s the mean, and call it y(HeadLine).

If we multiply both sides of the equation by n we get:

Which will lead us to the following equation:

If we look at what we got, we can see that we have a 3D surface. It looks like a glass, which rises sharply upwards.

We want to find M and B that minimize the function. We will make a partial derivative with respect to M and a partial derivative with respect to B.

Since we are looking for a minimum point, we will take the partial derivatives and compare to 0.

88voRjo799rIopVP8YjsHlNhrBSJ8REg26hY — Partial derivatives formula

6t-4Uq4Y4GMGg9mYWPUUmHHsmaTvxuDPZCj3 — Partial derivatives

Let’s take the two equations we received, isolating the variable b from both, and then subtracting the upper equation from the bottom equation.

-I3Ly2wOtJf9WiecfOjvFiY6U9DXB4PJBQ6t — Different writing of the equations after the derivation by parts

Let’s subtract the first equation from the second equation

6WzsJxr0jSG8XPYz-F2dSmINqnexxJLxWsxi — Merge two equations together

Let’s get rid of the denominators from the equation.

Ac05NR92faqptoFE35F2XFcKjllJhJPdwGnE — Final equation to find M.

And there we go, this is the equation to find M, let’s take this and write down B equation.

pjxjeSICBJNckegf3WXCHtfrf7dyIxVfqbBB — Final equation to find B.

Equations for slope and y-intercept

Let’s provide the mathematical equations that will help us find the required slope and y-intercept.

290zZ8roKAfKNCrfq1LN7QuTooJjbH19Isiv — Slope and y-intercept equations

So you probably thinking to yourself, what the heck are those weird equations?

They are actually simple to understand, so let’s talk about them a little bit.

KTFy4uhGXnGSrCoyInhSWfHH4VTEnAJyncpm — Sum of x divided by n

lQSFx0h7KiRB0uOcriwpFrmhsev3kt4cCUU5 — Sum of x² divided by n

LYZL8LPc8vyZ0wPV2J2sp-pXiuCzvslY8EAQ — Sum of xy divided by n

0E27klUj208HeeecnRKR9Eokb2PmKfUNoO-O — Sum of y divided by n

Now that we understand our equations it’s time to get all things together and show some examples.

Examples

A big thank you to Khan Academy for the examples.

Example #1

Let’s take 3 points, (1,2), (2,1), (4,3).

IudmVD0mo4BMYqPEjFyETchb5GGsDv5ikxwB — Points on graph.

Let’s find M and B for the equation y=mx+b.

KFDixcE4WidM6Pez8RNDwOgBorpnj1QuLw5S — Sum the x values and divide by n

Rqkh4dC9zZ11V4McMwJFspxv5UySTiI9Sv1L — Sum the y values and divide by n

tkUVYMlF-9qDaK69dWj0bFy1ApEK4DHw05vK — Sum the xy values and divide by n

80W3OcjPxF9ek2HIjv0VYnwCEhpzURavMAlj — Sum the x² values and divide by n

After we’ve calculated the relevant parts for our M equation and B equation, let’s put those values inside the equations and get the slope and y-intercept.

Hri9luC8oVUAgZLnLoDgey4X0T6LEZwIFMav — Slope calculation

H4Ss6UYBdSfJgx63lz93uXaubcE3-6e1niFS — y-intercept calculation

Let’s take those results and set them inside the line equation y=mx+b.

Now let’s draw the line and see how the line passes through the lines in such a way that it minimizes the squared distances.

DlKy-Eekc0SdHpcOeQPGJobo7jYLfTh0pI8Q — Regression line that minimizes the MSE.

Example #2

Let’s take 4 points, (-2,-3), (-1,-1), (1,2), (4,3).

MrlSNVYUJEh-4OcRGXEe3hbeU10wjTH-vmDB — Points on graph.

Let’s find M and B for the equation y=mx+b.

MqNv9HXhu7koehCq1WgBSH2Mje3VoHUM6Dsb — Sum the x values and divide by n

I8bZESRhxejhmNWbxMlusVlxfCgnrJPbn2En — Sum the y values and divide by n

YwF2k-wP1YkSiPUoZZ5kV99p5xpS4VeBtlxP — Sum the xy values and divide by n

Sbo7-PaRePrfBM1sOME5du5GDQ-1r1ntdoD1 — Sum the x² values and divide by n

Same as before, let’s put those values inside our equations to find M and B.

LUideJM-zrCgulLv83Gh08ySgcChQXY6BpxC — Slope calculation

F9K53LF0Dp3kjIYYC3UJoLfGJqICCIhtqTMo — y-intercept calculation

Let’s take those results and set them inside line equation y=mx+b.

Now let’s draw the line and see how the line passes through the lines in such a way that it minimizes the squared distances.

yAMNsNJmTBdZ2MKPbD8JX-es3d-5Oj4OIHRl — Regression line that minimizes the MSE

In conclusion

As you can see, the whole idea is simple. We just need to understand the main parts and how we work with them.

You can work with the formulas to find the line on another graph, and perform a simple calculation and get the results for the slope and y-intercept.

That’s all, simple eh? ?

Every comment and all feedback is welcome — if it’s necessary, I will fix the article.

Feel free to contact me directly at LinkedIn — Click Here.