<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ #Regression - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ #Regression - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Thu, 25 Jun 2026 10:02:33 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/regression/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ Linear vs Logistic Regression: How to Choose the Right Regression Model for Your Data ]]>
                </title>
                <description>
                    <![CDATA[ Regression models identify trends in a dataset and predict outcomes based on the trends they have analyzed and identified. Linear and Logistic Regression are two types of regression models that are similar but carry out their functions in distinct wa... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/linear-regression-vs-logistic-regression/</link>
                <guid isPermaLink="false">66d46091a326133d12440a57</guid>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MathJax ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #Regression ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Oluwadamisi Samuel ]]>
                </dc:creator>
                <pubDate>Tue, 28 May 2024 13:02:08 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/05/Linear-vs-Logistic-Regession.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Regression models identify trends in a dataset and predict outcomes based on the trends they have analyzed and identified.</p>
<p>Linear and Logistic Regression are two types of regression models that are similar but carry out their functions in distinct ways. They're also two fundamental techniques in machine learning that predict outcomes by analyzing previously provided data.</p>
<p>Both Linear and Logistic Regression are supervised learning models that appear intertwined – so distinguishing between them can be confusing, as they share the same notion of predicting outcomes based on the input variables.</p>
<p>But here's the main difference: Linear Regression focuses on predicting continuous values, while Logistic Regression is designed specifically for binary classification (Yes or No). So although they have similar-sounding names, there are key differences in their applications, equations, and objectives.</p>
<p>In this article, you'll learn about the similarities and differences between Linear and Logistic Regression, explore key characteristics of each, and learn how to choose between them.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-how-linear-and-logistic-regression-make-predictions">How Linear and Logistic Regression Make Predictions</a><br> – <a class="post-section-overview" href="#heading-linear-regression">Linear Regression</a><br> – <a class="post-section-overview" href="#heading-logistic-regression">Logistic Regression</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-are-the-similarities-between-linear-and-logistic-regression">What are the Similarities between Linear and Logistic Regression?</a></p>
</li>
<li><p><a class="post-section-overview" href="#what-are-the-differences-between-linear-and-logistic-regression-">What are the Differences between Linear and Logisstic Regression?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-to-use-linear-vs-logistic-regression-for-your-data-projects">When to Use Linear vs Logistic Regression for Your Data Projects</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-are-other-types-of-regression-models">What Are Other Types of Regression Models?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-how-linear-and-logistic-regression-make-predictions">How Linear and Logistic Regression Make Predictions</h2>
<h3 id="heading-linear-regression">Linear Regression</h3>
<p>Linear regression is the simplest form of regression, assuming a linear (straight line) relationship between the input and the output variable. In simple terms, it harnesses the power of a straight line.</p>
<p>The equation for simple linear regression can be expressed as y = mx + b, where:</p>
<ul>
<li><p>y is the dependent variable</p>
</li>
<li><p>x is the independent variable</p>
</li>
<li><p>m is the slope</p>
</li>
<li><p>and b is the intercept.</p>
</li>
</ul>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/05/New-Linear-regression-image-1.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Linear regression graph (</em><a target="_blank" href="https://images.app.goo.gl/hnuLSSqSZewaDsN18"><em>Source</em></a><em>)</em></p>
<p>In a house price dataset, the independent variables are columns used to predict the price of the house, such as the “Area”, “Bedrooms”, “Age”, and “Location”. The Dependent variable will be the “Price” column – the feature to be predicted.</p>
<p>You can <a target="_blank" href="https://www.freecodecamp.org/news/how-to-build-a-house-price-prediction-model/">read more on Linear Regression here</a>.</p>
<h3 id="heading-logistic-regression">Logistic Regression</h3>
<p>Logistic Regression is a powerful supervised machine learning technique. It helps categorize outcomes into two groups by assuming a Linear relationship between the features and the outcome and then calculating the possibility that the outcome will fall into one group or the other.</p>
<p>The mathematical equation calculates an output based on the relationship and the output is then transformed using a sigmoid function to constrain it between <code>0 and 1</code>. Here it is:</p>
<p>$$y = e^(β0 + β1X1 + β2X2+… βnXn) / (1 + e^(β0 + β1 x 1 + β2 x 2 +… βn x n))$$</p><p>Where:</p>
<ul>
<li><p>y gives the probability of success of the y categorical variable</p>
</li>
<li><p>e (x) is Euler’s number, the inverse of the natural logarithm function or sigmoid function, ln (x)</p>
</li>
<li><p>β0 is the y-intercept when all independent input variables equal 0</p>
</li>
<li><p>β1X1 is the regression coefficient (B1) of the first independent variable (X1), the impact value of the first independent variable on the dependent variable</p>
</li>
<li><p>βnXn is the regression coefficient (BN) of the last independent variable (XN), when there are multiple input values</p>
</li>
</ul>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/05/New-Logistic-Regression-image.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Logistic Regression Graph (https://images.app.goo.gl/vfYBcVSrdvR2Mkki9)</em></p>
<p>This is commonly employed in areas like Spam detection and for medical diagnosis. It is used to interpret the likelihood of an observation falling into a specific class.</p>
<p>You can <a target="_blank" href="https://dev.to/oluwadamisisamuel1/how-to-build-a-logistic-regression-model-a-spam-filter-tutorial-261b">read more on Logistic Regression here</a>.</p>
<h2 id="heading-what-are-the-similarities-between-linear-and-logistic-regression">What are the Similarities between Linear and Logistic Regression?</h2>
<ol>
<li><p><strong>Linear Relationship:</strong> Both linear and logistic regression assume a linear relationship between the input features and the output.</p>
</li>
<li><p><strong>Supervised Learning:</strong> Both are supervised machine learning algorithms, meaning they require labeled training data.</p>
</li>
<li><p><strong>L</strong>i<strong>mitations:</strong> Both algorithms have similar limitations including:</p>
</li>
</ol>
<ul>
<li><p>Non-linear relationships between input and output variables will lead to inaccurate results.</p>
</li>
<li><p>Unclean data and missing values will lead to poor model performance. You can <a target="_blank" href="https://www.freecodecamp.org/news/data-cleaning-and-preprocessing-with-pandasbdvhj/">read more on data cleaning here</a>.</p>
</li>
<li><p>Both models are prone to <a target="_blank" href="https://www.freecodecamp.org/news/what-is-overfitting-machine-learning/">overfitting</a>, which reduces the use of feature selection.</p>
</li>
</ul>
<h2 id="heading-what-are-the-differences-between-linear-and-logistic-regression">What are the Differences between Linear and Logistic Regression?</h2>
<ol>
<li><p><strong>Output Type</strong>: Linear regression predicts continuous output (for example, the price of a house) on a straight line graph, while logistic regression predicts probabilities for binary classification (like if a patient has cancer or not) on an S-shaped curve.</p>
</li>
<li><p><strong>Equation and Activation Function:</strong> Linear regression uses a simple linear equation while logistic regression uses the logistic function (sigmoid) to transform the output into probabilities.</p>
</li>
<li><p><strong>Loss Function</strong>: Linear regression minimizes the sum of squared differences, while logistic regression minimizes the logistic loss (log loss).</p>
</li>
<li><p><strong>Type of Supervised Learning :</strong> Linear regression is a regression model. Logistic regression is a classification model.</p>
</li>
</ol>
<h2 id="heading-when-to-use-linear-vs-logistic-regression-for-your-data-projects">When to Use Linear vs Logistic Regression for Your Data Projects</h2>
<p>You can use Linear Regression to solve problems where the relationship between variables can be reasonably approximated by a straight line. This means it's well-suited for understanding gradual changes or trends, rather than abrupt jumps or complex relationships. Some examples of these use-cases are:</p>
<ul>
<li><p>House Price prediction</p>
</li>
<li><p>Identifying Relationships</p>
</li>
<li><p>Market Trends and Analysis</p>
</li>
<li><p>Business risk assessment</p>
</li>
<li><p>Scientific Research</p>
</li>
<li><p>Price Estimation</p>
</li>
<li><p>Understanding Impact</p>
</li>
</ul>
<p>On the other hand, Logistic Regression is a powerful tool for understanding binary events and making predictions based on the features given. It excels in calculating the probability of an outcome being "Yes" or "No". This applies to a wide range of scenarios like:</p>
<ul>
<li><p>Fraud Detection</p>
</li>
<li><p>Spam filter</p>
</li>
<li><p>Applications in Medicine</p>
</li>
<li><p>Customer Churn</p>
</li>
<li><p>Probability Estimation</p>
</li>
</ul>
<h2 id="heading-what-are-other-types-of-regression-models">What Are Other Types of Regression Models?</h2>
<p>Linear and Logistic regression are not the only regression models available. There are other models you can use where linear and logistic regression fail:</p>
<ul>
<li><p><strong>Ridge regression</strong> is a regularization technique used to reduce the complexity of a model by introducing a small amount of bias. It makes the model less susceptible to overfitting.</p>
</li>
<li><p><strong>Lasso regression</strong> is a regularization technique which also reduces the complexity of a model. It avoids overfitting by reducing the coefficient to become closer to zero. It is particularly useful when feature selection is crucial</p>
</li>
<li><p><strong>Polynomial regression</strong> captures non-linear relationship using a curved line. It directly addresses the limitations of linear and logistic regression by modeling a non-linear relationship between variables.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Linear and logistic regression share the fundamental concept of a linear relationship between input variables and output variables. But their applications, mathematical equations, and use cases differ significantly.</p>
<p>Understanding these differences is crucial when choosing the appropriate model for a given problem.</p>
<p>This article has shed light on their inner workings and use cases, thereby equipping you to make the right and informed choice. Make sure you explore further to increase your knowledge and skills, and take the time to learn more complex machine learning models that will best fit your data problems.</p>
<p>If you found this helpful, you can connect with me on <a target="_blank" href="http://www.linkedin.com/in/samuel-oluwadamisi-01b3a4236">LinkedIn</a>, <a target="_blank" href="https://dev.to/oluwadamisisamuel1">my personal blog</a> and on <a target="_blank" href="https://x.com/Data_Steve_">X (formerly Twitter)</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Top Evaluation Metrics for Regression Problems in Machine Learning ]]>
                </title>
                <description>
                    <![CDATA[ A regression problem is a common type of supervised learning problem in Machine Learning. The end goal is to predict quantitative values – for example, continuous values such as the price of a car, the weight of a dog, and so on. But to be sure that ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/evaluation-metrics-for-regression-problems-machine-learning/</link>
                <guid isPermaLink="false">66d45f359208fb118cc6cfc3</guid>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ metrics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #Regression ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ibrahim Ogunbiyi ]]>
                </dc:creator>
                <pubDate>Mon, 01 Aug 2022 14:37:27 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/07/regression-metrics-image.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>A regression problem is a common type of supervised learning problem in Machine Learning. The end goal is to predict quantitative values – for example, continuous values such as the price of a car, the weight of a dog, and so on.</p>
<p>But to be sure that your model is doing well in its predictions, you need to evaluate the model.</p>
<p>There are some evaluation metrics that can help you determine whether the model’s predictions are accurate to a certain level of performance.</p>
<p>In this tutorial, you will learn the top evaluation metrics for regression problems, as well as when to use each of them. Without further ado let’s get started.</p>
<h2 id="heading-what-are-residuals">What are Residuals?</h2>
<p>Before we get into the top evaluation metrics, you need to understand what "residual" means when you're evaluating a regression model.</p>
<p>It is not ideal or possible for a model to accurately predict the value of a continuous variable in a regression problem. A regression model can only predict values that are lower or higher than the actual value. As a result, the only way to determine the model’s accuracy is through residuals.</p>
<p>Residuals are the difference between the actual and predicted values. You can think of residuals as being a distance. So, the closer the residual is to zero, the better our model performs in making its predictions.</p>
<p>Here's the formula for calculating residuals:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/07/residuals.png" alt="Image" width="600" height="400" loading="lazy"></p>
<pre><code class="lang-javascript">In the above formula:

ei -- stands <span class="hljs-keyword">for</span> the residual value.
yi -- stands <span class="hljs-keyword">for</span> the actual value.
y^i -- stands <span class="hljs-keyword">for</span> the predicted value.

So say, <span class="hljs-keyword">for</span> instance, that the actual value <span class="hljs-keyword">in</span> the dataset is <span class="hljs-number">5</span> and the predicted value is <span class="hljs-number">8.</span> The residual value will be <span class="hljs-number">-3.</span>
</code></pre>
<h2 id="heading-top-evaluation-metrics-for-regression-problems">Top Evaluation Metrics for Regression Problems</h2>
<p>The top evaluation metrics you need to know for regression problems include:</p>
<h3 id="heading-r2-score">R2 Score</h3>
<p>The R2 score (pronounced R-Squared Score) is a statistical measure that tells us how well our model is making all its predictions on a scale of zero to one.</p>
<p>As mentioned above, it's not ideal for a model to predict the actual values in a regression problem (as opposed to a classification problem that has discrete levels of value).</p>
<p>But we can use the R2 score to determine the accuracy of our model in terms of distance or residual. You can calculate the R2 score using the formula below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/08/image.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h4 id="heading-when-to-use-the-r2-score">When to Use the R2 Score</h4>
<p>You can use the R2 score to get the accuracy of your model on a percentage scale, that is 0–100, just like in a classification model.</p>
<p>Let’s go over how to implement the R2 score in Python. So we have a small dataset that contains the actual values and the predictions.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/07/1_mzvi2wZRSVv5W0pPmod3ag.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>To implement the R2 score in Python we'll leverage the Scikit-Learn evaluation metrics library.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> r2_score
score = r2_score(data[<span class="hljs-string">"Actual Value"</span>], data[<span class="hljs-string">"Preds"</span>])
print(<span class="hljs-string">"The accuracy of our model is {}%"</span>.format(round(score, <span class="hljs-number">2</span>) *<span class="hljs-number">100</span>))
</code></pre>
<p>The <code>r2_score</code> requires two parameters – the actual value and the predicted values in which we have passed to it above. The result from the metrics is this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/07/1_0xW0Hg0DXj5vhFJoAGC_nw-1.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>So we can say that our model predicted those values with 82% accuracy.</p>
<h3 id="heading-mean-absolute-error-mae">Mean Absolute Error (MAE)</h3>
<p>The MAE is simply defined as the sum of all the distances/residual s(the difference between the actual and predicted value) divided by the total number of points in the dataset.</p>
<p>It is the absolute average distance of our model prediction.</p>
<p>You can calculate the MAE using the following formula:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/07/1_tu6FSDz_FhQbR3UHQIaZNg.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>We can see that the above formula has two pipelines represented by the absolute symbol. The absolute symbol makes sure that the negative residual (which may be a result where the predicted value is greater than the actual value) is converted to positive so that it doesn’t cancel out other positive residuals.</p>
<h4 id="heading-when-to-use-mae">When to Use MAE</h4>
<p>If you want to know the model’s average absolute distance when making a prediction, you can use MAE. In other words, you want to know how close the predictions are to the actual model on average.</p>
<p>Just keep in mind that low MAE values indicate that the model is correctly predicting. Larger MAE values indicate that the model is poor at prediction.</p>
<p>Let’s now see how to implement MAE in Python. We will be working with the previous dataset we used to find the r2_score.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/07/1_mzvi2wZRSVv5W0pPmod3ag.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>To implement the MAE in Python we'll leverage the Scikit-Learn evaluation metrics library.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> mean_absolute_error
score = mean_absolute_error(data[<span class="hljs-string">"Actual Value"</span>], data[<span class="hljs-string">"Preds"</span>])
print(<span class="hljs-string">"The Mean Absolute Error of our Model is {}"</span>.format(round(score, <span class="hljs-number">2</span>)))
</code></pre>
<p>MAE also requires two parameters, the actual value and the predicted value.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/07/1_muu_mmrUYI6YFn2_LnD8Rw.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h3 id="heading-root-mean-squared-error-rmse">Root Mean Squared Error (RMSE)</h3>
<p>Another commonly used metric is the root mean squared error, which is the square root of the average squared distance (difference between actual and predicted value).</p>
<p>RMSE is defined as the square root of all the squares of the distance divided by the total number of points.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/07/0_2IuTz3Tr_dYNc6Df.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>RMSE functions similarly to MAE (that is, you use it to determine how close the prediction is to the actual value on average), but with a minor difference.</p>
<p>You use the RMSE to determine whether there are any large errors or distances that could be caused if the model overestimated the prediction (that is the model predicted values that were significantly higher than the actual value) or underestimated the predictions (that is, predicted values less than actual prediction).</p>
<h4 id="heading-when-to-use-rmse">When to Use RMSE</h4>
<p>If you are concerned about large errors, RMSE is a good metric to use. If the model overestimated or underestimated some points in the prediction (because the residual will be square, resulting in a large error), you should use RMSE.</p>
<p>RMSE is a popular evaluation metric for regression problems because it not only calculates how close the prediction is to the actual value on average, but it also indicates the effect of large errors. Large errors will have an impact on the RMSE result.</p>
<p>Let’s take a look at how you can implement RMSE in Python.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/07/1_mzvi2wZRSVv5W0pPmod3ag-2.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The Scikit-learn evaluation metric library has no RMSE metric, but it does include the mean squared error method. The square root of the mean squared error is referred to as RMSE.</p>
<p>To get the RMSE, we can use the Numpy square root method to find the square root of mean squared error, and the result obtained is our RMSE.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> mean_squared_error
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
score = np.sqrt(mean_absolute_error(data[<span class="hljs-string">"Actual Value"</span>], data[<span class="hljs-string">"Preds"</span>]))
print(<span class="hljs-string">"The Mean Absolute Error of our Model is {}"</span>.format(round(score, <span class="hljs-number">2</span>)))
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/07/1_URsnCspxUYxXV5vlacxcew.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>We can see that the RMSE value is larger than the MAE. This is a result of some large errors in the dataset.</p>
<h2 id="heading-conclusion-and-learning-more">Conclusion and Learning More</h2>
<p>In this tutorial you’ve learned some of the top evaluation metrics for regression problems that you will use on a daily basis.</p>
<p>Thank you for reading. Here are some helpful resources I also included below.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://scikit-learn.org/stable/modules/model_evaluation.html">https://scikit-learn.org/stable/modules/model_evaluation.html</a></div>
<p> </p>
<p><a target="_blank" href="https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d">MAE and RMSE — Which Metric is Better? | by JJ | Human in a Machine World | Medium</a></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Least Squares Regression Method – How to Find the Line of Best Fit ]]>
                </title>
                <description>
                    <![CDATA[ By Diogo Spínola Would you like to know how to predict the future with a simple formula and some data? There are multiple ways to tackle the problem of attempting to predict the future. But we're going to look into the theory of how we could do it wi... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-least-squares-regression-method-explained/</link>
                <guid isPermaLink="false">66d45e057df3a1f32ee7f7ff</guid>
                
                    <category>
                        <![CDATA[ Advanced Mathematics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Math ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Mathematics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #Regression ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Tue, 08 Sep 2020 18:59:02 +0000</pubDate>
                <media:content url="https://cdn-media-2.freecodecamp.org/w1280/5f9c98d1740569d1a4ca1c34.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Diogo Spínola</p>
<p>Would you like to know how to predict the future with a simple formula and some data?</p>
<p>There are multiple ways to tackle the problem of attempting to predict the future. But we're going to look into the theory of how we could do it with the formula <strong>Y = a + b * X</strong>.</p>
<p>After we cover the theory we're going to be creating a JavaScript project. This will help us more easily visualize the formula in action using <a target="_blank" href="https://www.chartjs.org/">Chart.js</a> to represent the data.</p>
<h2 id="heading-what-is-the-least-squares-regression-method-and-why-use-it">What is the Least Squares Regression method and why use it?</h2>
<p>Least squares is a method to apply linear regression. It helps us predict results based on an existing set of data as well as clear anomalies in our data. Anomalies are values that are too good, or bad, to be true or that represent rare cases.</p>
<p>For example, say we have a list of how many topics future engineers here at freeCodeCamp can solve if they invest 1, 2, or 3 hours continuously. Then we can predict how many topics will be covered after 4 hours of continuous study even without that data being available to us.</p>
<p>This method is used by a multitude of professionals, for example statisticians, accountants, managers, and engineers (like in machine learning problems).</p>
<h2 id="heading-setting-up-an-example">Setting up an example</h2>
<p>Before we jump into the formula and code, let's define the data we're going to use.</p>
<p>To do that let's expand on the example mentioned earlier. </p>
<p>Let's assume that our objective is to figure out how many topics are covered by a student per hour of learning. </p>
<p>Each pair (X, Y) will represent a student. Since we all have different rates of learning, the number of topics solved can be higher or lower for the same time invested.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Hours (X)</td><td>Topics Solved (Y)</td></tr>
</thead>
<tbody>
<tr>
<td>1</td><td>1.5</td></tr>
<tr>
<td>1.2</td><td>2</td></tr>
<tr>
<td>1.5</td><td>3</td></tr>
<tr>
<td>2</td><td>1.8</td></tr>
<tr>
<td>2.3</td><td>2.7</td></tr>
<tr>
<td>2.5</td><td>4.7</td></tr>
<tr>
<td>2.7</td><td>7.1</td></tr>
<tr>
<td>3</td><td>10</td></tr>
<tr>
<td>3.1</td><td>6</td></tr>
<tr>
<td>3.2</td><td>5</td></tr>
<tr>
<td>3.6</td><td>8.9</td></tr>
</tbody>
</table>
</div><p>You can read it like this: "Someone spent 1 hour and solved 2 topics" or "One student after 3 hours solved 10 topics".</p>
<p>In a graph these points look like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/09/image-8.png" alt="Image" width="600" height="400" loading="lazy">
<em>Each point is a student (X, Y) and how long it took that specific student to complete a certain number of topics</em></p>
<p><strong>Disclaimer:</strong> This data is fictional and was made by hitting random keys. I have no idea of the actual values.</p>
<h2 id="heading-the-formula">The formula</h2>
<blockquote>
<p><strong>Y = a + bX</strong></p>
</blockquote>
<p>The formula, for those unfamiliar with it, probably looks underwhelming – even more so given the fact that we already have the values for <strong>Y</strong> and <strong>X</strong> in our example.</p>
<p>Having said that, and now that we're not scared by the formula, we just need to figure out the <strong>a</strong> and <strong>b</strong> values.</p>
<p>To give some context as to what they mean:</p>
<ul>
<li><strong>a</strong> is the intercept, in other words the value that we expect, on average, from a student that practices for one hour. One hour is the least amount of time we're going to accept into our example data set.</li>
<li><strong>b</strong> is the slope or coefficient, in other words the number of topics solved in a specific hour (<strong>X)</strong>. As we increase in hours (<strong>X</strong>) spent studying, <strong>b</strong> increases more and more.</li>
</ul>
<h2 id="heading-calculating-b">Calculating "b"</h2>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/08/image-50.png" alt="Image" width="600" height="400" loading="lazy">
<em>Looks scarier than it is</em></p>
<p><strong>X</strong> and <strong>Y</strong> are our positions from our earlier table. When they have a <strong>-</strong> (macron) above them, it means we should use the average which we obtain by summing them all up and dividing by the total amount:</p>
<p><strong>͞x</strong> -&gt; 1+1.2+1.5+2+2.3+2.5+2.7+3+3.1+3.2+3.6 = <strong>2.37</strong></p>
<p><strong>͞y</strong> -&gt; 1,5+2+3+1,8+2,7+4,7+7,1+10+6+5+8,9 / 11 = <strong>4.79</strong></p>
<p>Now that we have the average we can expand our table to include the new results:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Hours (X)</td><td>Topics Solved (Y)</td><td>(X - ͞x)</td><td>(y - ͞y)</td><td>(X - ͞x)*(y - ͞y)</td><td>(x - ͞x)²</td></tr>
</thead>
<tbody>
<tr>
<td>1</td><td>1.5</td><td>-1.37</td><td>-3.29</td><td>4.51</td><td>1.88</td></tr>
<tr>
<td>1.2</td><td>2</td><td>-1.17</td><td>-2.79</td><td>3.26</td><td>1.37</td></tr>
<tr>
<td>1.5</td><td>3</td><td>-0.87</td><td>-1.79</td><td>1.56</td><td>0.76</td></tr>
<tr>
<td>2</td><td>1.8</td><td>-0.37</td><td>-2.99</td><td>1.11</td><td>0.14</td></tr>
<tr>
<td>2.3</td><td>2.7</td><td>-0.07</td><td>-2.09</td><td>0.15</td><td>0.00</td></tr>
<tr>
<td>2.5</td><td>4.7</td><td>0.13</td><td>-0.09</td><td>-0.01</td><td>0.02</td></tr>
<tr>
<td>2.7</td><td>7.1</td><td>0.33</td><td>2.31</td><td>0.76</td><td>0.11</td></tr>
<tr>
<td>3</td><td>10</td><td>0.63</td><td>5.21</td><td>3.28</td><td>0.40</td></tr>
<tr>
<td>3.1</td><td>6</td><td>0.73</td><td>1.21</td><td>0.88</td><td>0.53</td></tr>
<tr>
<td>3.2</td><td>5</td><td>0.83</td><td>0.21</td><td>0.17</td><td>0.69</td></tr>
<tr>
<td>3.6</td><td>8.9</td><td>1.23</td><td>4.11</td><td>5.06</td><td>1.51</td></tr>
</tbody>
</table>
</div><p>The weird symbol sigma (<strong>∑</strong>) tells us to sum everything up:</p>
<p><strong>∑(x - ͞x)*(y - ͞y)</strong> -&gt; 4.51+3.26+1.56+1.11+0.15+-0.01+0.76+3.28+0.88+0.17+5.06 = <strong>20.73</strong></p>
<p><strong>∑(x - ͞x)²</strong> -&gt; 1.88+1.37+0.76+0.14+0.00+0.02+0.11+0.40+0.53+0.69+1.51 = <strong>7.41</strong></p>
<p>And finally we do <strong>20.73 / 7.41</strong> and we get <strong>b = 2.8</strong></p>
<p><strong>Note:</strong> When using an expression input calculator, like the one that's available in Ubuntu, -2² returns -4 instead of 4. To avoid that input (-2)².</p>
<h2 id="heading-calculating-a">Calculating "a"</h2>
<p>All that is left is <strong>a</strong>, for which the formula is <strong>͞͞͞y = a +</strong> <strong>b ͞x.</strong> We've already obtained all those other values, so we can substitute them and we get:</p>
<ul>
<li>4.79 = <strong>a</strong> + 2.8*2.37</li>
<li>4.79 = <strong>a</strong> + 6.64</li>
<li><strong>a</strong> = -6.64+4.79</li>
<li><strong>a = -1.85</strong></li>
</ul>
<h2 id="heading-the-result">The result</h2>
<p>Our final formula becomes:</p>
<blockquote>
<p><strong>Y = -1.85 + 2.8*X</strong></p>
</blockquote>
<p>Now we replace the <strong>X</strong> in our formula with each value that we have:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Hours (X)</td><td>-1.85 + 2.8 * X</td></tr>
</thead>
<tbody>
<tr>
<td>1</td><td>0.95</td></tr>
<tr>
<td>1.2</td><td>1.51</td></tr>
<tr>
<td>1.5</td><td>2.35</td></tr>
<tr>
<td>2</td><td>3.75</td></tr>
<tr>
<td>2.3</td><td>4.59</td></tr>
<tr>
<td>2.5</td><td>5.15</td></tr>
<tr>
<td>2.7</td><td>5.71</td></tr>
<tr>
<td>3</td><td>6.55</td></tr>
<tr>
<td>3.1</td><td>6.83</td></tr>
<tr>
<td>3.2</td><td>7.11</td></tr>
<tr>
<td>3.6</td><td>8.23</td></tr>
</tbody>
</table>
</div><p>Which is a graph that looks something like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/09/image-9.png" alt="Image" width="600" height="400" loading="lazy">
<em>We now have a line that represents how many topics we expect to be solved for each hour of study</em></p>
<p>If we want to predict how many topics we expect a student to solve with 8 hours of study, we replace it in our formula:</p>
<ul>
<li><strong>Y = -1.85 + 2.8*8</strong></li>
<li><strong>Y = 20.55</strong></li>
</ul>
<p>An in a graph we can see:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/09/image-10.png" alt="Image" width="600" height="400" loading="lazy">
<em>The further it is in the future the least accuracy we should expect</em></p>
<h2 id="heading-limitations">Limitations</h2>
<p>Always bear in mind the limitations of a method. This will hopefully help you avoid incorrect results.</p>
<p>And this method, like any other, has its limitations. Here are a couple:</p>
<ul>
<li>It doesn't take into account the complexity of the topics solved. A topic covered at the start of the "<a target="_blank" href="https://www.freecodecamp.org/learn/responsive-web-design/basic-html-and-html5/">Responsive Web Design Certification</a>" will most likely take less time to learn and solve than doing one of the final projects. So if the data we have is from different starting points of a course, the predictions won't be accurate</li>
<li>It's impossible for someone to study 240 hours continuously or to solve more topics than those available. Regardless, the method allows us to predict those values. At that point the method is no longer accurately giving results since it's an impossibility.</li>
</ul>
<h2 id="heading-example-javascript-project">Example JavaScript Project</h2>
<p>Doing this by hand is not necessary. We can create our project where we input the X and Y values, it draws a graph with those points, and applies the linear regression formula.</p>
<p>The project folder will have the following contents:</p>
<pre><code>src/
  |-public <span class="hljs-comment">// folder with the content that we will feed to the browser</span>
    |-index.html
    |-style.css
    |-least-squares.js
  package.json
  server.js <span class="hljs-comment">// our Node.js server</span>
</code></pre><p>And <strong>package.json</strong>:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"least-squares-regression"</span>,
  <span class="hljs-attr">"version"</span>: <span class="hljs-string">"1.0.0"</span>,
  <span class="hljs-attr">"description"</span>: <span class="hljs-string">"Visualize linear least squares"</span>,
  <span class="hljs-attr">"main"</span>: <span class="hljs-string">"server.js"</span>,
  <span class="hljs-attr">"scripts"</span>: {
    <span class="hljs-attr">"start"</span>: <span class="hljs-string">"node server.js"</span>,
    <span class="hljs-attr">"server-debug"</span>: <span class="hljs-string">"nodemon --inspect server.js"</span>
  },
  <span class="hljs-attr">"author"</span>: <span class="hljs-string">"daspinola"</span>,
  <span class="hljs-attr">"license"</span>: <span class="hljs-string">"MIT"</span>,
  <span class="hljs-attr">"devDependencies"</span>: {
    <span class="hljs-attr">"nodemon"</span>: <span class="hljs-string">"2.0.4"</span>
  },
  <span class="hljs-attr">"dependencies"</span>: {
    <span class="hljs-attr">"express"</span>: <span class="hljs-string">"4.17.1"</span>
  }
}
</code></pre>
<p>Once we have the package.json and we run <em>npm install</em> we will have Express and nodemon available. You can switch them out for others as you prefer, but I use these out of convenience.</p>
<p>In <strong>server.js</strong>:</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> express = <span class="hljs-built_in">require</span>(<span class="hljs-string">'express'</span>)
<span class="hljs-keyword">const</span> path = <span class="hljs-built_in">require</span>(<span class="hljs-string">'path'</span>)

<span class="hljs-keyword">const</span> app = express()

app.use(express.static(path.join(__dirname, <span class="hljs-string">'public'</span>)))

app.get(<span class="hljs-string">'/'</span>, <span class="hljs-function"><span class="hljs-keyword">function</span>(<span class="hljs-params">req, res</span>) </span>{
  res.sendFile(path.join(__dirname, <span class="hljs-string">'public/index.html'</span>))
})

app.listen(<span class="hljs-number">5000</span>, <span class="hljs-function"><span class="hljs-keyword">function</span> (<span class="hljs-params"></span>) </span>{
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Listening on port <span class="hljs-subst">${<span class="hljs-number">5000</span>}</span>!`</span>)
})
</code></pre>
<p>This tiny server is made so we can access our page when we write in the browser <em>localhost:5000.</em> Before we run it let's create the remaining files:</p>
<p><strong>public/index.html</strong></p>
<pre><code class="lang-html"><span class="hljs-tag">&lt;<span class="hljs-name">html</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">head</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">title</span>&gt;</span>Least Squares Regression<span class="hljs-tag">&lt;/<span class="hljs-name">title</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"https://cdn.jsdelivr.net/npm/chart.js@2.9.3/dist/Chart.min.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">link</span> <span class="hljs-attr">rel</span>=<span class="hljs-string">"stylesheet"</span> <span class="hljs-attr">href</span>=<span class="hljs-string">"style.css"</span>&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">head</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">body</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"container"</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"left-half"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"number"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"input-x"</span> <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"X"</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"number"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"input-y"</span> <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Y"</span>&gt;</span>

          <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"btn-update-graph"</span>&gt;</span>Add<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span> 
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">span</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"span-formula"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">span</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">table</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"table-pairs"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">thead</span>&gt;</span>
              <span class="hljs-tag">&lt;<span class="hljs-name">th</span>&gt;</span>
                X
              <span class="hljs-tag">&lt;/<span class="hljs-name">th</span>&gt;</span>
              <span class="hljs-tag">&lt;<span class="hljs-name">th</span>&gt;</span>
                Y
              <span class="hljs-tag">&lt;/<span class="hljs-name">th</span>&gt;</span>
            <span class="hljs-tag">&lt;/<span class="hljs-name">thead</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">tbody</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">tbody</span>&gt;</span>
          <span class="hljs-tag">&lt;/<span class="hljs-name">table</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"right-half"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"myChart"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"/js/least-squares.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">body</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">html</span>&gt;</span>
</code></pre>
<p>We create our elements:</p>
<ul>
<li>Two inputs for our pairs, one for X and one for Y</li>
<li>A button to add those values to a table</li>
<li>A span to show the current formula as values are added</li>
<li>A table to show the pairs we've been adding</li>
<li>And a canvas for our chart</li>
</ul>
<p>We also import the <a target="_blank" href="https://www.chartjs.org/">Chart.js</a> library with a CDN and add our CSS and JavaScript files.</p>
<p><strong>public/style.css</strong></p>
<pre><code class="lang-css"><span class="hljs-selector-class">.container</span> {
  <span class="hljs-attribute">display</span>: grid; 
}

<span class="hljs-selector-class">.left-half</span> {
  <span class="hljs-attribute">grid-column</span>: <span class="hljs-number">1</span>;
}

<span class="hljs-selector-class">.right-half</span> {
  <span class="hljs-attribute">grid-column</span>: <span class="hljs-number">2</span>;
}
</code></pre>
<p>We add some rules so we have our inputs and table to the left and our graph to the right. This takes advantage of CSS grid.</p>
<p><strong>public/least-squares.js</strong></p>
<pre><code class="lang-js"><span class="hljs-built_in">document</span>.addEventListener(<span class="hljs-string">'DOMContentLoaded'</span>, init, <span class="hljs-literal">false</span>);

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">init</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> currentData = {
    <span class="hljs-attr">pairs</span>: [],
    <span class="hljs-attr">slope</span>: <span class="hljs-number">0</span>,
    <span class="hljs-attr">coeficient</span>: <span class="hljs-number">0</span>,
    <span class="hljs-attr">line</span>: [],
  };

  <span class="hljs-keyword">const</span> chart = initChart();
}

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">initChart</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> ctx = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">'myChart'</span>).getContext(<span class="hljs-string">'2d'</span>);

  <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> Chart(ctx, {
    <span class="hljs-attr">type</span>: <span class="hljs-string">'scatter'</span>,
    <span class="hljs-attr">data</span>: {
      <span class="hljs-attr">datasets</span>: [{
        <span class="hljs-attr">label</span>: <span class="hljs-string">'Scatter Dataset'</span>,
        <span class="hljs-attr">backgroundColor</span>: <span class="hljs-string">'rgb(125,67,120)'</span>,
        <span class="hljs-attr">data</span>: [],
      }, {
        <span class="hljs-attr">label</span>: <span class="hljs-string">'Line Dataset'</span>,
        <span class="hljs-attr">fill</span>: <span class="hljs-literal">false</span>,
        <span class="hljs-attr">data</span>: [],
        <span class="hljs-attr">type</span>: <span class="hljs-string">'line'</span>,
      }],
    },
    <span class="hljs-attr">options</span>: {
      <span class="hljs-attr">scales</span>: {
        <span class="hljs-attr">xAxes</span>: [{
          <span class="hljs-attr">type</span>: <span class="hljs-string">'linear'</span>,
          <span class="hljs-attr">position</span>: <span class="hljs-string">'bottom'</span>,
          <span class="hljs-attr">display</span>: <span class="hljs-literal">true</span>,
          <span class="hljs-attr">scaleLabel</span>: {
            <span class="hljs-attr">display</span>: <span class="hljs-literal">true</span>,
            <span class="hljs-attr">labelString</span>: <span class="hljs-string">'(X)'</span>,
          },
        }],
        <span class="hljs-attr">yAxes</span>: [{
          <span class="hljs-attr">type</span>: <span class="hljs-string">'linear'</span>,
          <span class="hljs-attr">position</span>: <span class="hljs-string">'bottom'</span>,
          <span class="hljs-attr">display</span>: <span class="hljs-literal">true</span>,
          <span class="hljs-attr">scaleLabel</span>: {
            <span class="hljs-attr">display</span>: <span class="hljs-literal">true</span>,
            <span class="hljs-attr">labelString</span>: <span class="hljs-string">'(Y)'</span>,
          },
        }],
      },
    },
  });
}
</code></pre>
<p>And finally, we initialize our graph. At the start, it should be empty since we haven't added any data to it just yet.</p>
<p>Now if we run <em>npm run server-debug</em> and open our browser on localhost:5000 we should see something like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/09/image-12.png" alt="Image" width="600" height="400" loading="lazy">
<em>Our inputs to the left with an add button, or table with just the headers X and Y, to the right an empty graph</em></p>
<h2 id="heading-adding-functionality">Adding functionality</h2>
<p>The next step is to make the "Add" button do something. In our case we want to achieve:</p>
<ul>
<li>Add the X and Y values to the table</li>
<li>Update the formula when we add more than one pair (we need at least 2 pairs to create a line)</li>
<li>Update the graph with the points and the line</li>
<li>Clean the inputs, just so it's easier to keep introducing data</li>
</ul>
<h3 id="heading-add-the-values-to-the-table">Add the values to the table</h3>
<p><strong>public/least-squares.js</strong></p>
<pre><code class="lang-js"><span class="hljs-built_in">document</span>.addEventListener(<span class="hljs-string">'DOMContentLoaded'</span>, init, <span class="hljs-literal">false</span>);

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">init</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> currentData = {
    <span class="hljs-attr">pairs</span>: [],
    <span class="hljs-attr">slope</span>: <span class="hljs-number">0</span>,
    <span class="hljs-attr">coeficient</span>: <span class="hljs-number">0</span>,
    <span class="hljs-attr">line</span>: [],
  };
  <span class="hljs-keyword">const</span> btnUpdateGraph = <span class="hljs-built_in">document</span>.querySelector(<span class="hljs-string">'.btn-update-graph'</span>);
  <span class="hljs-keyword">const</span> tablePairs = <span class="hljs-built_in">document</span>.querySelector(<span class="hljs-string">'.table-pairs'</span>);
  <span class="hljs-keyword">const</span> spanFormula = <span class="hljs-built_in">document</span>.querySelector(<span class="hljs-string">'.span-formula'</span>);

  <span class="hljs-keyword">const</span> inputX = <span class="hljs-built_in">document</span>.querySelector(<span class="hljs-string">'.input-x'</span>);
  <span class="hljs-keyword">const</span> inputY = <span class="hljs-built_in">document</span>.querySelector(<span class="hljs-string">'.input-y'</span>);

  <span class="hljs-keyword">const</span> chart = initChart();

  btnUpdateGraph.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">const</span> x = <span class="hljs-built_in">parseFloat</span>(inputX.value);
    <span class="hljs-keyword">const</span> y = <span class="hljs-built_in">parseFloat</span>(inputY.value);

    updateTable(x, y);
  });

  <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">updateTable</span>(<span class="hljs-params">x, y</span>) </span>{
    <span class="hljs-keyword">const</span> tr = <span class="hljs-built_in">document</span>.createElement(<span class="hljs-string">'tr'</span>);
    <span class="hljs-keyword">const</span> tdX = <span class="hljs-built_in">document</span>.createElement(<span class="hljs-string">'td'</span>);
    <span class="hljs-keyword">const</span> tdY = <span class="hljs-built_in">document</span>.createElement(<span class="hljs-string">'td'</span>);

    tdX.innerHTML = x;
    tdY.innerHTML = y;

    tr.appendChild(tdX);
    tr.appendChild(tdY);

    tablePairs.querySelector(<span class="hljs-string">'tbody'</span>).appendChild(tr);
  }
}

<span class="hljs-comment">// ... rest of the code as it was</span>
</code></pre>
<p>We get all of the elements we will use shortly and add an event on the "Add" button. That event will grab the current values and update our table visually.</p>
<p>We need to parse the amount since we get a string. It will be important for the next step when we have to apply the formula.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/09/image-13.png" alt="Image" width="600" height="400" loading="lazy">
<em>When we press add we should see the pairs on the table</em></p>
<h3 id="heading-make-the-calculations">Make the calculations</h3>
<p>All the math we were talking about earlier (getting the average of <strong>X</strong> and <strong>Y</strong>, calculating <strong>b</strong>, and calculating <strong>a</strong>) should now be turned into code. We will also display the <strong>a</strong> and <strong>b</strong> values so we see them changing as we add values.</p>
<p><strong>public/least-squares.js</strong></p>
<pre><code class="lang-js"><span class="hljs-comment">// ... rest of the code as it was</span>

btnUpdateGraph.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-keyword">const</span> x = <span class="hljs-built_in">parseFloat</span>(inputX.value);
  <span class="hljs-keyword">const</span> y = <span class="hljs-built_in">parseFloat</span>(inputY.value);

  updateTable(x, y);
  updateFormula(x, y);
});

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">updateFormula</span>(<span class="hljs-params">x, y</span>) </span>{
  currentData.pairs.push({ x, y });
  <span class="hljs-keyword">const</span> pairsAmount = currentData.pairs.length;

  <span class="hljs-keyword">const</span> sum = currentData.pairs.reduce(<span class="hljs-function">(<span class="hljs-params">acc, pair</span>) =&gt;</span> ({
    <span class="hljs-attr">x</span>: acc.x + pair.x,
    <span class="hljs-attr">y</span>: acc.y + pair.y,
  }), { <span class="hljs-attr">x</span>: <span class="hljs-number">0</span>, <span class="hljs-attr">y</span>: <span class="hljs-number">0</span> });

  <span class="hljs-keyword">const</span> average = {
    <span class="hljs-attr">x</span>: sum.x / pairsAmount,
    <span class="hljs-attr">y</span>: sum.y / pairsAmount,
  };

  <span class="hljs-keyword">const</span> slopeDividend = currentData.pairs
    .reduce(<span class="hljs-function">(<span class="hljs-params">acc, pair</span>) =&gt;</span> <span class="hljs-built_in">parseFloat</span>(acc + ((pair.x - average.x) * (pair.y - average.y))), <span class="hljs-number">0</span>);
  <span class="hljs-keyword">const</span> slopeDivisor = currentData.pairs
    .reduce(<span class="hljs-function">(<span class="hljs-params">acc, pair</span>) =&gt;</span> <span class="hljs-built_in">parseFloat</span>(acc + (pair.x - average.x) ** <span class="hljs-number">2</span>), <span class="hljs-number">0</span>);

  <span class="hljs-keyword">const</span> slope = slopeDivisor !== <span class="hljs-number">0</span>
    ? <span class="hljs-built_in">parseFloat</span>((slopeDividend / slopeDivisor).toFixed(<span class="hljs-number">2</span>))
    : <span class="hljs-number">0</span>;

  <span class="hljs-keyword">const</span> coeficient = <span class="hljs-built_in">parseFloat</span>(
    (-(slope * average.x) + average.y).toFixed(<span class="hljs-number">2</span>),
  );

  currentData.line = currentData.pairs
    .map(<span class="hljs-function">(<span class="hljs-params">pair</span>) =&gt;</span> ({
      <span class="hljs-attr">x</span>: pair.x,
      <span class="hljs-attr">y</span>: <span class="hljs-built_in">parseFloat</span>((coeficient + (slope * pair.x)).toFixed(<span class="hljs-number">2</span>)),
    }));

  spanFormula.innerHTML = <span class="hljs-string">`Formula: Y = <span class="hljs-subst">${coeficient}</span> + <span class="hljs-subst">${slope}</span> * X`</span>;
}

<span class="hljs-comment">// ... rest of the code as it was</span>
</code></pre>
<p>There isn't much to be said about the code here since it's all the theory that we've been through earlier. We loop through the values to get sums, averages, and all the other values we need to obtain the coefficient (<strong>a</strong>) and the slope (<strong>b</strong>).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/09/image-15.png" alt="Image" width="600" height="400" loading="lazy">
<em>The span so we can display the formula and see it change as we add values</em></p>
<p>We have the <em>pairs</em> and <em>line</em> in the <em>current</em> variable so we use them in the next step to update our chart.</p>
<h3 id="heading-update-the-graph-and-clean-inputs">Update the graph and clean inputs</h3>
<p><strong>public/least-squares.js</strong></p>
<pre><code class="lang-js"><span class="hljs-comment">// ... rest of the code as it was</span>

btnUpdateGraph.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-keyword">const</span> x = <span class="hljs-built_in">parseFloat</span>(inputX.value);
  <span class="hljs-keyword">const</span> y = <span class="hljs-built_in">parseFloat</span>(inputY.value);

  updateTable(x, y);
  updateFormula(x, y);

  updateChart();

  clearInputs();
});

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">updateChart</span>(<span class="hljs-params"></span>) </span>{
  chart.data.datasets[<span class="hljs-number">0</span>].data = currentData.pairs;
  chart.data.datasets[<span class="hljs-number">1</span>].data = currentData.line;

  chart.update();
}

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">clearInputs</span>(<span class="hljs-params"></span>) </span>{
  inputX.value = <span class="hljs-string">''</span>;
  inputY.value = <span class="hljs-string">''</span>;
}

<span class="hljs-comment">// ... rest of the code as it was</span>
</code></pre>
<p>Updating the chart and cleaning the inputs of <strong>X</strong> and <strong>Y</strong> is very straightforward. We have two datasets, the first one (position zero) is for our pairs, so we show the dot on the graph. The second one (position one) is for our regression line.</p>
<p>We have to grab our instance of the chart and call <em>update</em> so we see the new values being taken into account.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/09/image-16.png" alt="Image" width="600" height="400" loading="lazy">
<em>At least three values are needed so we can take any kind of information our of the graph</em></p>
<h2 id="heading-adding-some-style">Adding some style</h2>
<p>We can change our layout a bit so it's more manageable. Nothing major, it just serves as a reminder that we can update the UI at any point </p>
<p><strong>public/style.css</strong></p>
<pre><code class="lang-css"><span class="hljs-selector-class">.container</span> {
  <span class="hljs-attribute">display</span>: grid; 
}

<span class="hljs-selector-class">.left-half</span> {
  <span class="hljs-attribute">grid-column</span>: <span class="hljs-number">1</span>;
}

<span class="hljs-selector-class">.right-half</span> {
  <span class="hljs-attribute">grid-column</span>: <span class="hljs-number">2</span>;
}

<span class="hljs-selector-class">.pairs-style</span> <span class="hljs-selector-tag">input</span><span class="hljs-selector-attr">[type=<span class="hljs-string">"number"</span>]</span>,
<span class="hljs-selector-class">.pairs-style</span> <span class="hljs-selector-tag">button</span> {
  <span class="hljs-attribute">margin</span>: <span class="hljs-number">5px</span> <span class="hljs-number">0px</span>;
}

<span class="hljs-selector-class">.table-pairs</span> {
  <span class="hljs-attribute">border-collapse</span>: collapse;
  <span class="hljs-attribute">width</span>: <span class="hljs-number">100%</span>;
}

<span class="hljs-selector-class">.table-pairs</span> <span class="hljs-selector-tag">td</span> {
  <span class="hljs-attribute">text-align</span>: center;
}

<span class="hljs-selector-class">.table-pairs</span>,
<span class="hljs-selector-class">.table-pairs</span> <span class="hljs-selector-tag">th</span>,
<span class="hljs-selector-class">.table-pairs</span> <span class="hljs-selector-tag">td</span> {
  <span class="hljs-attribute">margin</span>: <span class="hljs-number">10px</span> <span class="hljs-number">0px</span>;
  <span class="hljs-attribute">border</span>: <span class="hljs-number">1px</span> solid black;
}
</code></pre>
<p><strong>public/index.html</strong></p>
<pre><code class="lang-html"><span class="hljs-tag">&lt;<span class="hljs-name">html</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">head</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">title</span>&gt;</span>Least Squares Regression<span class="hljs-tag">&lt;/<span class="hljs-name">title</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"https://cdn.jsdelivr.net/npm/chart.js@2.9.3/dist/Chart.min.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">link</span> <span class="hljs-attr">rel</span>=<span class="hljs-string">"stylesheet"</span> <span class="hljs-attr">href</span>=<span class="hljs-string">"style.css"</span>&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">head</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">body</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"container"</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"left-half"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"pairs-style"</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"number"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"input-x"</span> <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"X"</span>&gt;</span>
          <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"number"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"input-y"</span> <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Y"</span>&gt;</span>
          <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"btn-update-graph"</span>&gt;</span>Add<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span> 
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">span</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"span-formula"</span>&gt;</span>Formula: Y = a + b * X<span class="hljs-tag">&lt;/<span class="hljs-name">span</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">table</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"table-pairs"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">thead</span>&gt;</span>
              <span class="hljs-tag">&lt;<span class="hljs-name">th</span>&gt;</span>
                X
              <span class="hljs-tag">&lt;/<span class="hljs-name">th</span>&gt;</span>
              <span class="hljs-tag">&lt;<span class="hljs-name">th</span>&gt;</span>
                Y
              <span class="hljs-tag">&lt;/<span class="hljs-name">th</span>&gt;</span>
            <span class="hljs-tag">&lt;/<span class="hljs-name">thead</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">tbody</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">tbody</span>&gt;</span>
          <span class="hljs-tag">&lt;/<span class="hljs-name">table</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"right-half"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"myChart"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"/js/least-squares.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">body</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">html</span>&gt;</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/09/image-17.png" alt="Image" width="600" height="400" loading="lazy">
<em>Not a big change, but at least the elements are a bit better aligned</em></p>
<h2 id="heading-proof-of-concept">Proof of Concept</h2>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/09/image-18.png" alt="Image" width="600" height="400" loading="lazy">
<em>We add the same values as earlier in the theory and obtain the same graph and formula! :D</em></p>
<h2 id="heading-final-remarks">Final remarks</h2>
<p>For brevity's sake, I cut out a lot that can be taken as an exercise to vastly improve the project. For example:</p>
<ul>
<li>Add checks for empty values and the like</li>
<li>Make it so we can remove data that we wrongly inserted</li>
<li>Add an input for X or Y and apply the current data formula to "predict the future", similar to the last example of the theory</li>
</ul>
<p>Regardless, predicting the future is a fun concept even if, in reality, the most we can hope to predict is an approximation based on past data points. </p>
<p>It's a powerful formula and if you build any project using it I would love to see it.</p>
<p>I hope this article was helpful to serve as an introduction to this concept. The code used in the article can be found in my GitHub <strong><a target="_blank" href="https://github.com/daspinola/least-squares-regression">here</a></strong><em>.</em></p>
<p>See you in the next one, in the meantime, go code something!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How I Used Regression Analysis to Analyze Life Expectancy with Scikit-Learn and Statsmodels ]]>
                </title>
                <description>
                    <![CDATA[ By Black Raven In this article, I will use some data related to life expectancy to evaluate the following models: Linear, Ridge, LASSO, and Polynomial Regression. So let's jump right in. I was exploring the dengue trend in Singapore where there has b... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/regression-analysis-on-life-expectancy/</link>
                <guid isPermaLink="false">66d45ddc55db48792eed3f3d</guid>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #Regression ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Thu, 19 Mar 2020 17:25:29 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2020/03/us-life-expectancy-drop.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Black Raven</p>
<p>In this article, I will use some data related to life expectancy to evaluate the following models: Linear, Ridge, LASSO, and Polynomial Regression. So let's jump right in.</p>
<p>I was exploring the dengue trend in Singapore where there has been a recent spike in dengue cases – especially in the <a target="_blank" href="https://www.nea.gov.sg/dengue-zika/dengue/dengue-clusters">Dengue Red Zone</a> where I am living. However, the raw data was not available on the NEA website.</p>
<p>I was wondering, has dengue affected the life expectancy of people in any country in particular? Do people in rich nations live longer? What are the factors affecting life expectancy of a country?</p>
<p>So I explored life expectancy and looked for data on the following aspects (features):</p>
<ul>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_by_birth_rate">Birth Rate</a></li>
<li><a target="_blank" href="https://www.worldlifeexpectancy.com/cause-of-death/all-cancers/by-country/">Cancer Rate</a></li>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/Dengue_fever_outbreaks">Dengue Cases</a></li>
<li>Environmental Performance Index (<a target="_blank" href="https://epi.envirocenter.yale.edu/epi-topline">EPI</a>)</li>
<li>Gross Domestic Product (<a target="_blank" href="https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)">GDP</a>)</li>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/List_of_countries_by_total_health_expenditure_per_capita">Health Expenditure</a></li>
<li><a target="_blank" href="https://www.worldlifeexpectancy.com/cause-of-death/coronary-heart-disease/by-country/">Heart Disease Rate</a></li>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/List_of_countries_by_population_in_2010">Population</a></li>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/List_of_countries_by_population_in_2010">Area</a></li>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/List_of_countries_by_population_in_2010">Population Density</a></li>
<li><a target="_blank" href="https://www.worldlifeexpectancy.com/cause-of-death/stroke/by-country/">Stroke Rate</a></li>
</ul>
<p>The target is <a target="_blank" href="https://en.wikipedia.org/wiki/List_of_countries_by_life_expectancy">Life Expectancy</a>, measured in number of years.</p>
<p>The assumptions are:</p>
<ol>
<li>These are country level averages</li>
<li>There is no distinction between male and female</li>
</ol>
<p>The Python code is available on my <a target="_blank" href="https://github.com/JNYH/Project-Luther">GitHub</a>.</p>
<h2 id="heading-data-science-process">Data Science Process</h2>
<p>I have used the following data science process in my analysis:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-96.png" alt="Image" width="600" height="400" loading="lazy"></p>
<ul>
<li>data collection, data cleaning, Exploratory Data Analysis</li>
<li>feature selection, feature engineering</li>
<li>model selection, model tuning and hyperparameter tuning</li>
<li>model optimization based on selected performance metric</li>
</ul>
<p>Tools used for this analysis include:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-102.png" alt="Image" width="600" height="400" loading="lazy"></p>
<ul>
<li>Python libraries, particularly <a target="_blank" href="https://numpy.org/">Numpy</a> and <a target="_blank" href="https://pandas.pydata.org/docs/">Pandas</a> for manipulating data structures</li>
<li><a target="_blank" href="https://matplotlib.org/">Matplotlib</a> and <a target="_blank" href="https://seaborn.pydata.org/">Seaborn</a> for visualization</li>
<li><a target="_blank" href="https://scikit-learn.org/stable/index.html">Scikit-Learn</a> and <a target="_blank" href="https://www.statsmodels.org/stable/index.html">Statsmodels</a> for regression analysis</li>
</ul>
<h2 id="heading-exploratory-data-analysis">Exploratory Data Analysis</h2>
<p>First I check for multi-collinearity between features. </p>
<pre><code class="lang-py">sns.set(rc={<span class="hljs-string">'figure.figsize'</span>:(<span class="hljs-number">10</span>,<span class="hljs-number">7</span>)})sns.heatmap(df.corr(), cmap=<span class="hljs-string">"seismic"</span>, annot=<span class="hljs-literal">True</span>, vmin=<span class="hljs-number">-1</span>, vmax=<span class="hljs-number">1</span>)
</code></pre>
<p>There seems to be some strong collinearity, denoted by boxes in dark red and dark blue as you can see in the image below. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-58.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>For example, countries who spent more on healthcare have a higher EPI score. When health expenditures are higher, the stroke rate is also lower. And a larger area yields a higher population.</p>
<p>How about the correlation between features and target?<br>To live a long life, you should have a low stroke rate, high health expenditure, take good care of the environment, and have fewer babies (according to the correlation chart).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/04/1_vmjdEhjU0ScLQOxLvtFwMg.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Let’s look at the initial pair plot.</p>
<pre><code class="lang-py">sns.pairplot(df, height=<span class="hljs-number">1.5</span>, aspect=<span class="hljs-number">1.5</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-64.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>There seems to be a need to remove outliers in many features, for example, Dengue Cases, GDP, Population, Area, and Population Density.</p>
<p>Each outlier is replaced by the next highest value in the column. After removing the outliers, the plots are still skewed to the right (points are very concentrated on the left side). So this suggests that some transformation might be needed.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/04/1_eAXh2VpB3mLWV-V3ci3ggw.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Another way to remove outliers is to use the LOG function, which helps to spread the concentrated data to the right.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/04/1_9IfppnhjoGbLVuNd5arrNg.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h1 id="heading-feature-selection">Feature Selection</h1>
<p>To look for significant features, I dropped one feature at a time to see its impact on the simple regression model. Looking at the R² Score, these 3 features (Birth Rate, EPI, Stroke Rate) are chosen, because the model will be adversely affected without them.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-68.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Next, I removed <strong>outliers</strong> and review the p-values on <strong>Statsmodels</strong>. I gained one more significant feature (Population Density). When the p-value of a feature is less than 0.05, it is considered a good feature, as I have chosen 5% as the significance level.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-69.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>After that, I applied <strong>LOG</strong> functions to all features, and gained 4 more significant features (GDP, Heart Disease Rate, Population, and Area).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-70.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>I have also done other transformations (Reciprocal, Power 2, Square Root) but there is no more improvement.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-71.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Features can also be selected using the <strong>LassoCV</strong> feature in <strong>SkLearn</strong>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-72.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Finally I looked at the pair plot again with all significant features. The scatter plots are now nicely spread out with some clear trends.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-73.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h1 id="heading-model-selection">Model Selection</h1>
<p>I am now ready to fit the following models on the train data set:</p>
<ul>
<li><a target="_blank" href="https://www.thebalancesmb.com/what-is-simple-linear-regression-2296697"><strong>Linear</strong></a> Regression (a straight line which approximates the relationship between the dependent variables and the independent target variable)</li>
<li><a target="_blank" href="https://www.datacamp.com/community/tutorials/tutorial-ridge-lasso-elastic-net"><strong>Ridge</strong></a> Regression (this reduces model complexity while keeping all coefficients in the model, known as L2 penalty)</li>
<li><strong><a target="_blank" href="https://www.datacamp.com/community/tutorials/tutorial-ridge-lasso-elastic-net">L</a></strong><a target="_blank" href="https://www.datacamp.com/community/tutorials/tutorial-ridge-lasso-elastic-net">ASSO</a> <a target="_blank" href="https://www.datacamp.com/community/tutorials/tutorial-ridge-lasso-elastic-net">Regression</a> (Least Absolute Shrinkage and Selection Operator reduces model complexity by penalizing model coefficients to zero, for example, L1 penalty)</li>
<li><a target="_blank" href="https://towardsdatascience.com/polynomial-regression-bbe8b9d97491"><strong>Degree 2 Polynomial</strong></a> Regression (a curve line to approximate the relationship between the dependent variables and the independent target variable)</li>
</ul>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-74.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>I have also validated their performance on the validation data set. The simple linear regression model seems to have the potential to be the best performing model.</p>
<p>This is confirmed by <strong>Cross Validation</strong> using <strong>KFold</strong> (with 5 splits).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-75.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Finally, I checked the residue error against assumptions. The residue errors should be normally distributed with equal variance around the mean zero. The Normal Quartile-to-Quartile plot also looks acceptably normal.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-76.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Since I only have 250 rows (data limited by the number of countries in the world), I used the entire data set to simulate the test data set (note: this is done for academic purpose, not practical as it will lead to <a target="_blank" href="https://towardsdatascience.com/data-leakage-in-machine-learning-10bdd3eec742">data leakage</a>). I used <strong>KFold Cross Validation</strong> with 10 splits to evaluate the model performance. </p>
<pre><code class="lang-py"><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> cross_val_score
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> KFold
kf = KFold(n_splits=<span class="hljs-number">5</span>, shuffle=<span class="hljs-literal">True</span>, random_state = <span class="hljs-number">1</span>)
lm = LinearRegression()
lm.fit(X_train, y_train)
cvs_lm = cross_val_score(lm, X, y, cv=kf, scoring=<span class="hljs-string">'r2'</span>)
print(cvs_lm)
</code></pre>
<p>There is quite a bit of variation in the R² values from 0.49 to 0.82, but the average result is around 0.69, which is quite satisfactory.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-77.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h1 id="heading-how-do-we-interpret-the-model">How do we interpret the model?</h1>
<pre><code class="lang-py">df = pd.read_csv(<span class="hljs-string">'df3.csv'</span>)
X = df[ [<span class="hljs-string">'Birth Rate'</span>, <span class="hljs-string">'EPI'</span>, <span class="hljs-string">'GDP'</span>, <span class="hljs-string">'Heart Disease Rate'</span>, <span class="hljs-string">'Population'</span>, <span class="hljs-string">'Area'</span>, <span class="hljs-string">'Pop Density'</span>, <span class="hljs-string">'Stroke Rate'</span>] ].astype(float)
X = np.log(X)
y = df[ <span class="hljs-string">"Life Expectancy"</span> ].astype(float)
X = sm.add_constant(X)

model = sm.OLS(y, X)
results = model.fit()
results.summary()
</code></pre>
<p>If you're unaffected by the features, your life expectancy is 62 years. If your country has low birth rate, add 5 more years to your life. <strong>If the EPI (Environment Performance Index) is high, add 8 more years to your life.</strong> If you live in a rich country, add half a year to your life. Finally for every unit (or rather LOG unit) decrease in stroke rate, 5 more years could be added to your life.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/03/image-78.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h1 id="heading-next-steps">Next Steps</h1>
<p>I could possibly collect more data by expanding the scope to cities instead of countries, and exploring other features (factors) affecting life expectancy. Also, I could split the data into male and female categories for such life expectancy regression analysis.</p>
<p>To conclude, here are some interesting insights:</p>
<ol>
<li>Japan has the highest life expectancy (83.7 years). Central African Republic (49.5 years) and many countries in the African continent are at the bottom of scale. Singapore is ranked #5 (82.7 years).</li>
</ol>
<p><strong>2. Take good care of the environment</strong>. This has the largest coefficient (impact) on a country’s life expectancy.</p>
<p>The Python code for the above analysis is available on my <a target="_blank" href="https://github.com/JNYH">GitHub</a> – do feel free to refer to it.</p>
<p><a target="_blank" href="https://github.com/JNYH/Project-Luther">https://github.com/JNYH/Project-Luther</a></p>
<p>Video presentation: <a target="_blank" href="https://youtu.be/gC2m_lvouu8">https://youtu.be/gC2m_lvouu8</a></p>
<p>Thank you for reading.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to read a Regression Table ]]>
                </title>
                <description>
                    <![CDATA[ By Sharad Vijalapuram What is regression? Regression is one of the most important and commonly used data analysis processes. Simply put, it is a statistical method that explains the strength of the relationship between a dependent variable and one or... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/https-medium-com-sharadvm-how-to-read-a-regression-table-661d391e9bd7-708e75efc560/</link>
                <guid isPermaLink="false">66c35715e9895571912a0caf</guid>
                
                    <category>
                        <![CDATA[ data ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ predictive analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #Regression ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Sun, 31 Mar 2019 20:25:40 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*kiLhwgfqplmsa9QgUfXjKQ.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Sharad Vijalapuram</p>
<h3 id="heading-what-is-regression"><strong>What is regression?</strong></h3>
<p>Regression is one of the most important and commonly used data analysis processes. Simply put, it is a statistical method that explains the strength of the relationship between a dependent variable and one or more independent variable(s).</p>
<p>A dependent variable could be a variable or a field you are trying to predict or understand. An independent variable could be the fields or data points that you think might have an impact on the dependent variable.</p>
<p>In doing so, it answers a couple of important questions —</p>
<ul>
<li>What variables matter?</li>
<li>To what extent do these variables matter?</li>
<li>How confident are we about these variables?</li>
</ul>
<h3 id="heading-lets-take-an-example">Let’s take an example…</h3>
<p>To better explain the numbers in the regression table, I thought it would be useful to use a sample dataset and walk through the numbers and their importance.</p>
<p>I’m using a small dataset that contains GRE (a test that students take to be considered for admittance in Grad schools in the US) scores of 500 students and their chance of admittance into a university.</p>
<p>Because <code>chance of admittance</code> depends on <code>GRE score</code>, <code>chance of admittance</code> is the dependent variable and <code>GRE score</code> is the independent variable.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/gEmYfLngh9iyyI1iWPkIHT2H4VGekxpIxUHY" alt="Image" width="800" height="438" loading="lazy">
<em>Scatterplot of GRE scores and chance of admittance</em></p>
<h4 id="heading-regression-line">Regression line</h4>
<p>Drawing a straight line that best describes the relationship between the GRE scores of students and their chances of admittance gives us the <strong>linear regression line</strong>. This is known as the <strong>trend line</strong> in various BI tools. The basic idea behind drawing this line is to minimize the distance between the data points at a given x-coordinate and the y-coordinate through which the regression line passes.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/ZKNDJUJRHA0Es0khr8RpLkbot3QmMPxsMc8Z" alt="Image" width="800" height="438" loading="lazy">
<em>Scatterplot with a regression line.</em></p>
<p>The regression line makes it easier for us to represent the relationship. It is based on a mathematical equation that associates the x-coefficient and y-intercept.</p>
<p><strong>Y-intercept</strong> is the point at which the line intersects the y-axis at x = 0. It is also the value the model would take or predict when x is 0.</p>
<p><strong>Coefficients</strong> provide the impact or weight of a variable towards the entire model. In other words, it provides the amount of change in the dependent variable for a unit change in the independent variable.</p>
<h4 id="heading-calculating-the-regression-line-equation">Calculating the regression line equation</h4>
<p>In order to find out the model’s y-intercept, we extend the regression line far enough until it intersects the y-axis at x = 0. This is our y-intercept and it is around -2.5. The number might not really make sense for the data set we are working on but the intention is to only show the calculation of y-intercept.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/Qr8R9PGFVxf8VnwyQrmaVCpU0PqnAeW3FH9i" alt="Image" width="800" height="417" loading="lazy">
<em>Calculating the y-intercept</em></p>
<p>The coefficient for this model will just be the slope of the regression line and can be calculated by getting the change in the admittance over the change in GRE scores.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/r8Xzo0fzjJ4HeM-cHz66kST-aW-gTdcqde05" alt="Image" width="800" height="430" loading="lazy">
<em>Calculating the slope</em></p>
<p>In the example above, the coefficient would just be</p>
<blockquote>
<p>m = (y2-y1) / (x2-x1)</p>
</blockquote>
<p>And in this case, it would be close to 0.01.</p>
<p>The formula y = m*x + b helps us calculate the mathematical equation of our regression line. Substituting the values for y-intercept and slope we got from extending the regression line, we can formulate the equation -</p>
<blockquote>
<p>y = 0.01x — 2.48</p>
</blockquote>
<p>-2.48 is a more accurate y-intercept value I got from the regression table as shown later in this post.</p>
<p>This equation lets us forecast and predicts the chance of admittance of a student when his/her GRE score is known.</p>
<p>Now that we have the basics, let’s jump onto reading and interpreting a regression table.</p>
<h3 id="heading-reading-a-regression-table">Reading a regression table</h3>
<p>The regression table can be roughly divided into <strong>three components</strong> —</p>
<ul>
<li><strong>Analysis of Variance (ANOVA):</strong> provides the analysis of the variance in the model, as the name suggests.</li>
<li><strong>regression statistics:</strong> provide numerical information on the variation and how well the model explains the variation for the given data/observations.</li>
<li><strong>residual output:</strong> provides the value predicted by the model and the difference between the actual observed value of the dependent variable and its predicted value by the regression model for each data point.</li>
</ul>
<h3 id="heading-analysis-of-variance-anova"><strong>Analysis of Variance (ANOVA)</strong></h3>
<p><img src="https://cdn-media-1.freecodecamp.org/images/qcL1FHAqajHQ3fk2Qnp2wMjSNDzWAf5vMNYP" alt="Image" width="800" height="122" loading="lazy">
<em>ANOVA table</em></p>
<h4 id="heading-degrees-of-freedom-df">Degrees of freedom (df)</h4>
<p><strong>Regression df</strong> is the number of independent variables in our regression model. Since we only consider GRE scores in this example, it is 1.</p>
<p><strong>Residual df</strong> is the total number of observations (rows) of the dataset subtracted by the number of variables being estimated. In this example, both the GRE score coefficient and the constant are estimated.</p>
<p>Residual df = 500 — 2 = 498</p>
<p><strong>Total df</strong> — is the sum of the regression and residual degrees of freedom, which equals the size of the dataset minus 1.</p>
<h4 id="heading-sum-of-squares-ss"><strong>Sum of Squares (SS)</strong></h4>
<p><img src="https://cdn-media-1.freecodecamp.org/images/9E4FVD77xkpB9bZ2Npwa-Y7jwhlEH6-qzDlh" alt="Image" width="800" height="370" loading="lazy">
<em>Regression line with the mean of the dataset in red.</em></p>
<p><strong>Regression SS</strong> is the total variation in the dependent variable that is explained by the regression model. It is the sum of the square of the difference between the predicted value and mean of the value of all the data points.</p>
<blockquote>
<p>∑ (ŷ — ӯ)²</p>
</blockquote>
<p>From the ANOVA table, the regression SS is 6.5 and the total SS is 9.9, which means the regression model explains about 6.5/9.9 (around 65%) of all the variability in the dataset.</p>
<p><strong>Residual SS</strong> — is the total variation in the dependent variable that is left unexplained by the regression model. It is also called the <strong>Error Sum of Squares</strong> and is the sum of the square of the difference between the actual and predicted values of all the data points.</p>
<blockquote>
<p>∑ (y — ŷ)²</p>
</blockquote>
<p>From the ANOVA table, the residual SS is about 3.4. In general, the smaller the error, the better the regression model explains the variation in the data set and so we would usually want to minimize this error.</p>
<p><strong>Total SS</strong> — is the sum of both, regression and residual SS or by how much the chance of admittance would vary if the GRE scores are <strong>NOT</strong> taken into account.</p>
<p><strong>Mean Squared Errors (MS)</strong> — are the mean of the sum of squares or the sum of squares divided by the degrees of freedom for both, regression and residuals.</p>
<blockquote>
<p>Regression MS = ∑ (ŷ — ӯ)²/Reg. df</p>
<p>Residual MS = ∑ (y — ŷ)²/Res. df</p>
</blockquote>
<p><strong>F</strong> — is used to test the hypothesis that the slope of the independent variable is zero. Mathematically, it can also be calculated as</p>
<blockquote>
<p>F = Regression MS / Residual MS</p>
</blockquote>
<p>This is otherwise calculated by comparing the F-statistic to an F distribution with regression df in numerator degrees and residual df in denominator degrees.</p>
<p><strong>Significance F</strong> — is nothing but the p-value for the null hypothesis that the coefficient of the independent variable is zero and as with any p-value, a low p-value indicates that a significant relationship exists between dependent and independent variables.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/QeNjs4oji3peiiodEnLof0wx4NSJlu5imm9C" alt="Image" width="800" height="86" loading="lazy"></p>
<p><strong>Standard Error</strong> — provides the estimated standard deviation of the distribution of coefficients. It is the amount by which the coefficient varies across different cases. A coefficient much greater than its standard error implies a probability that the coefficient is not 0.</p>
<p><strong>t-Stat</strong> — is the t-statistic or t-value of the test and its value is equal to the coefficient divided by the standard error.</p>
<blockquote>
<p>t-Stat = Coefficients/Standard Error</p>
</blockquote>
<p>Again, the larger the coefficient with respect to the standard error, the larger the t-Stat is and higher the probability that the coefficient is away from 0.</p>
<p><strong>p-value</strong> — The t-statistic is compared with the t distribution to determine the p-value. We usually only consider the p-value of the independent variable which provides the likelihood of obtaining a sample as close to the one used to derive the regression equation and verify if the slope of the regression line is actually zero or the coefficient is close to the coefficient obtained.</p>
<p>A p-value below 0.05 indicates 95% confidence that the slope of the regression line is not zero and hence there is a significant linear relationship between the dependent and independent variables.</p>
<p>A p-value greater than 0.05 indicates that the slope of the regression line may be zero and that there is not sufficient evidence at the 95% confidence level that a significant linear relationship exists between the dependent and independent variables.</p>
<p>Since the p-value of the independent variable GRE score is very close to 0, we can be extremely confident that there is a significant linear relationship between GRE scores and the chance of admittance.</p>
<p><strong>Lower and Upper 95%</strong> — Since we mostly use a sample of data to estimate the regression line and its coefficients, they are mostly an approximation of the true coefficients and in turn the true regression line. The lower and upper 95% boundaries give the 95th confidence interval of lower and upper bounds for each coefficient.</p>
<p>Since the 95% confidence interval for GRE scores is 0.009 and 0.01, the boundaries do not contain zero and so, we can be 95% confident that there is a significant linear relationship between GRE scores and the chance of admittance.</p>
<p>Please note that a confidence level of 95% is widely used but, a level other than 95% is possible and can be set up during regression analysis.</p>
<h3 id="heading-regression-statistics"><strong>Regression Statistics</strong></h3>
<p><img src="https://cdn-media-1.freecodecamp.org/images/7zaL2AUSPsdw2T8imw5bAqr6kCOy3nKOeHGk" alt="Image" width="453" height="180" loading="lazy">
<em>Regression Statistics table</em></p>
<p><strong>R² (R Square)</strong> — represents the power of a model. It shows the amount of variation in the dependent variable the independent variable explains and always lies between values 0 and 1. As the R² increases, more variation in the data is explained by the model and better the model gets at prediction. A low R² would indicate that the model doesn’t fit the data well and that an independent variable doesn’t explain the variation in the dependent variable well.</p>
<blockquote>
<p>R² = Regression Sum of Squares/Total Sum of Squares</p>
</blockquote>
<p>However, R square <em>cannot</em> determine whether the coefficient estimates and predictions are biased, which is why you must assess the residual plots, which are discussed later in this article.</p>
<p>R-square also does not indicate whether a regression model is adequate. You can have a low R-squared value for a good model, or high R-squared value for a model that does not fit the data.</p>
<p>R², in this case, is 65 %, which implies that the GRE scores can explain 65% of the variation in the chance of admittance.</p>
<p><strong>Adjusted R²</strong> — is R² multiplied by an adjustment factor. This is used while comparing different regression models with different independent variables. This number comes in handy while deciding on the right independent variables in multiple regression models.</p>
<p><strong>Multiple R</strong> — is the positive square root of R²</p>
<p><strong>Standard Error</strong> — is different from the standard error of the coefficients. This is the estimated standard deviation of the error of the regression equation and is a good measure of the accuracy of the regression line. It is the square root of the residual mean squared errors.</p>
<blockquote>
<p>Std. Error = √(Res.MS)</p>
</blockquote>
<h3 id="heading-residual-output"><strong>Residual Output</strong></h3>
<p>Residuals are the difference between the actual value and the predicted value of the regression model and residual output is the predicted value of the dependent variable by the regression model and the residual for each data point.</p>
<p>And as the name suggests, a residual plot is a scatter plot between the residual and the independent variable, which in this case is the GRE score of each student.</p>
<p>A residual plot is important in detecting things like <strong>heteroscedasticity</strong>, <strong>non-linearity,</strong> and <strong>outliers</strong>. The process of detecting them is not being discussed as part of this article but, the fact that the residual plot for our example has data scattered randomly helps us in establishing the fact that the relationship between the variables in this model is linear.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/svhwHtrkIYoNwy323YB8jPS-OxeWmvmpPAyH" alt="Image" width="800" height="401" loading="lazy">
<em>Residual Plot</em></p>
<h3 id="heading-intent"><strong>Intent</strong></h3>
<p>The intent of this article is not to build a working regression model but to provide a walkthrough of all the regression variables and their importance when necessary with a sample data set in a regression table.</p>
<p>Although this article provides an explanation with a single variable linear regression as an example, please be aware that some of these variables could have more importance in the cases of multi-variable or other situations.</p>
<h3 id="heading-references"><strong>References</strong></h3>
<ul>
<li><a target="_blank" href="https://www.kaggle.com/mohansacharya/graduate-admissions">Graduate Admissions Dataset</a></li>
<li><a target="_blank" href="https://egap.org/methods-guides/10-things-know-about-reading-regression-table">10 things about reading a regression table</a></li>
<li><a target="_blank" href="https://hbr.org/2015/11/a-refresher-on-regression-analysis">A refresher on regression analysis</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
