<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Gradient-Descent  - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Gradient-Descent  - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Mon, 25 May 2026 05:06:40 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/gradient-descent/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ Gradient Descent – Machine Learning Algorithm Example ]]>
                </title>
                <description>
                    <![CDATA[ What is the Gradient Descent Algorithm? Gradient descent is probably the most popular machine learning algorithm. At its core, the algorithm exists to minimize errors as much as possible.  The aim of gradient descent as an algorithm is to minimize th... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/gradient-descent-machine-learning-algorithm-example/</link>
                <guid isPermaLink="false">66ba2dc6de9370f66eeb0a97</guid>
                
                    <category>
                        <![CDATA[ algorithms ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Gradient-Descent  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Salim Oyinlola ]]>
                </dc:creator>
                <pubDate>Mon, 24 Oct 2022 13:53:51 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/10/pexels-pixabay-159751.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <h2 id="heading-what-is-the-gradient-descent-algorithm">What is the Gradient Descent Algorithm?</h2>
<p>Gradient descent is probably the most popular machine learning algorithm. At its core, the algorithm exists to minimize errors as much as possible. </p>
<p>The aim of gradient descent as an algorithm is to minimize the cost function of a model. We can tell this from the meanings of the words ‘<em>Gradient</em>’ and ‘<em>Descent</em>’. </p>
<p>While gradient means the gap between two defined points (that is the cost function in this context), descent refers to downward motion in general (that is minimizing the cost function in this context). </p>
<p>So in the context of machine learning, Gradient Descent refers to the iterative attempt to minimize the prediction error of a machine learning model by adjusting its parameters to yield the smallest possible error.</p>
<p>This error is known as the Cost Function. The cost function is a plot of the answer of the question “<em>by how much does the predicted value differ from the actual value?”</em>. While the way to evaluate cost functions often differs for different machine learning models, in a simple linear regression model, it is usually the root mean squared error of the model.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-166.png" alt="Image" width="600" height="400" loading="lazy">
<em>A 3D plot of the cost function of a simple linear regression model with M representing the minimum point</em></p>
<p>It is important to note that for the simpler models like the linear regression, a plot of the cost function is usually bow-shaped, which makes it easier to ascertain the minimum point. However, this is not always the case. For more complex models (for instance neural networks), the plot might not be bow-shaped. It is possible for the cost function to have multiple minimum points as shown in the image below.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-39.png" alt="Image" width="600" height="400" loading="lazy">
<em>A 3D plot of the cost function of a neural network. Source: <a target="_blank" href="https://www.coursera.org/lecture/machine-learning/gradient-descent-2f2PA">Coursera</a></em></p>
<h2 id="heading-how-does-gradient-descent-work">How Does Gradient Descent Work?</h2>
<p>Firstly, it is important to note that like most machine learning processes, the gradient descent algorithm is an iterative process. </p>
<p>Assuming you have the cost function for a simple linear regression model as <em>j(w,b)</em> where <em>j is</em> a function of <em>w</em> and <em>b,</em> the gradient descent algorithm works such that it starts off with some initial random guess for <em>w</em> and <em>b</em>. The algorithm will keep tweaking the parameters <em>w</em> and <em>b</em> in an attempt to optimize the cost function, <em>j.</em> </p>
<p>In linear regression, the choice for the initial values does not matter much. A common choice is zero. </p>
<p>The perfect analogy for the gradient descent algorithm that minimizes the cost-function <em>j</em>(<em>w</em>, <em>b</em>) and reaches its local minimum by adjusting the parameters <em>w</em> and <em>b</em> is hiking down to the bottom of a mountain or hill (as shown in the 3D plot of the cost function of a simple linear regression model shown earlier). Or, trying to get to the lowest point of a golf course. In either case, they will make repetitive short steps till they make it to the bottom of the mountain or hill.</p>
<h2 id="heading-the-gradient-descent-formula">The Gradient Descent Formula</h2>
<p><strong>Here's the formula for gradient descent: b = a - γ Δ f(a)</strong></p>
<p>The equation above describes what the gradient descent algorithm does. </p>
<p>That is <em>b</em> is the next position of the hiker while <em>a</em> represents the current position. The minus sign is for the minimization part of the gradient descent algorithm since the goal is to minimize the error as much as possible. γ in the middle is a factor known as the learning rate, and the term Δf(a) is a gradient term that defines the direction of the minimum point. </p>
<p>As such, this formula tells the next position for the hiker/the person on the golf course (that is the direction of the steepest descent). It is important to note that the term <em>γΔ f(a)</em> is subtracted from <em>a</em> because the goal is to move against the gradient, toward the local minimum.</p>
<h2 id="heading-what-is-the-learning-rate">What is the Learning Rate?</h2>
<p>The learning rate is the determinant of how big the steps gradient descent takes in the direction of the local minimum. It determines the speed with which the algorithm moves towards the optimum values of the cost function. </p>
<p>Because of this, the choice of the learning rate, γ, is important and has a significant impact on the effectiveness of the algorithm. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-42.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>If the learning rate is too big as shown above, in a bid to find the optimal point, it moves from the point on the left all the way to the point on the right. In that case, you see that the cost function has gotten worse.  </p>
<p>On the other hand, if the learning rate is too small, then gradient descents will work, albeit very slowly. </p>
<p>It is important to pick the learning rate carefully.</p>
<h2 id="heading-how-to-implement-gradient-descent-in-linear-regression">How to Implement Gradient Descent in Linear Regression</h2>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Linear_Regression</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, X, Y</span>):</span>
        self.X = X
        self.Y = Y
        self.b = [<span class="hljs-number">0</span>, <span class="hljs-number">0</span>]

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">update_coeffs</span>(<span class="hljs-params">self, learning_rate</span>):</span>
        Y_pred = self.predict()
        Y = self.Y
        m = len(Y)
        self.b[<span class="hljs-number">0</span>] = self.b[<span class="hljs-number">0</span>] - (learning_rate * ((<span class="hljs-number">1</span>/m) * np.sum(Y_pred - Y)))
        self.b[<span class="hljs-number">1</span>] = self.b[<span class="hljs-number">1</span>] - (learning_rate * ((<span class="hljs-number">1</span>/m) * np.sum((Y_pred - Y) * self.X)))

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">predict</span>(<span class="hljs-params">self, X=[]</span>):</span>
        Y_pred = np.array([])
        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> X: X = self.X
        b = self.b
        <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> X:
            Y_pred = np.append(Y_pred, b[<span class="hljs-number">0</span>] + (b[<span class="hljs-number">1</span>] * x))

        <span class="hljs-keyword">return</span> Y_pred

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_current_accuracy</span>(<span class="hljs-params">self, Y_pred</span>):</span>
        p, e = Y_pred, self.Y
        n = len(Y_pred)
        <span class="hljs-keyword">return</span> <span class="hljs-number">1</span>-sum(
            [
                abs(p[i]-e[i])/e[i]
                <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(n)
                <span class="hljs-keyword">if</span> e[i] != <span class="hljs-number">0</span>]
        )/n
    <span class="hljs-comment">#def predict(self, b, yi):</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">compute_cost</span>(<span class="hljs-params">self, Y_pred</span>):</span>
        m = len(self.Y)
        J = (<span class="hljs-number">1</span> / <span class="hljs-number">2</span>*m) * (np.sum(Y_pred - self.Y)**<span class="hljs-number">2</span>)
        <span class="hljs-keyword">return</span> J

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">plot_best_fit</span>(<span class="hljs-params">self, Y_pred, fig</span>):</span>
                f = plt.figure(fig)
                plt.scatter(self.X, self.Y, color=<span class="hljs-string">'b'</span>)
                plt.plot(self.X, Y_pred, color=<span class="hljs-string">'g'</span>)
                f.show()


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    X = np.array([i <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>)])
    Y = np.array([<span class="hljs-number">2</span>*i <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>)])

    regressor = Linear_Regression(X, Y)

    iterations = <span class="hljs-number">0</span>
    steps = <span class="hljs-number">100</span>
    learning_rate = <span class="hljs-number">0.01</span>
    costs = []

    <span class="hljs-comment">#original best-fit line</span>
    Y_pred = regressor.predict()
    regressor.plot_best_fit(Y_pred, <span class="hljs-string">'Initial Best Fit Line'</span>)


    <span class="hljs-keyword">while</span> <span class="hljs-number">1</span>:
        Y_pred = regressor.predict()
        cost = regressor.compute_cost(Y_pred)
        costs.append(cost)
        regressor.update_coeffs(learning_rate)

        iterations += <span class="hljs-number">1</span>
        <span class="hljs-keyword">if</span> iterations % steps == <span class="hljs-number">0</span>:
            print(iterations, <span class="hljs-string">"epochs elapsed"</span>)
            print(<span class="hljs-string">"Current accuracy is :"</span>,
                regressor.get_current_accuracy(Y_pred))

            stop = input(<span class="hljs-string">"Do you want to stop (y/*)??"</span>)
            <span class="hljs-keyword">if</span> stop == <span class="hljs-string">"y"</span>:
                <span class="hljs-keyword">break</span>

    <span class="hljs-comment">#final best-fit line</span>
    regressor.plot_best_fit(Y_pred, <span class="hljs-string">'Final Best Fit Line'</span>)

    <span class="hljs-comment">#plot to verify cost function decreases</span>
    h = plt.figure(<span class="hljs-string">'Verification'</span>)
    plt.plot(range(iterations), costs, color=<span class="hljs-string">'b'</span>)
    h.show()

    <span class="hljs-comment"># if user wants to predict using the regressor:</span>
    regressor.predict([i <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">10</span>)])

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:
    main()
</code></pre>
<p>At its core, you can see that the block of code trains a gradient descent algorithm for a linear regression machine learning model using <code>0.01</code>  as its learning rate on <code>100</code> steps. </p>
<p>Upon running the code above, the output shown is given below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-43.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-44.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In conclusion, it is important to note that the gradient descent algorithm is especially important in the artificial intelligence and machine learning domains as the models must be optimized for accuracy.  </p>
<p>In this article, you learnt what the Gradient Descent algorithm is, how it works, its formula, what learning rate is, and the importance of picking the right learning rate. You also saw a code illustration of how Gradient Descent works. </p>
<p>Finally, I share my writings on Artificial Intelligence, Machine Learning and Microsoft Azure on <a target="_blank" href="https://twitter.com/SalimOpines">Twitter</a> if you enjoyed this article and want to see more.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ An overview of the Gradient Descent algorithm ]]>
                </title>
                <description>
                    <![CDATA[ By Nishit Jain The subtle yet powerful algorithm that optimizes parameters Optimizing parameters is the ultimate goal of every machine learning algorithm. You want to get the optimum value of the slope and the intercept to get the line of best fit in... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/an-overview-of-the-gradient-descent-algorithm-8645c9e4de1e/</link>
                <guid isPermaLink="false">66c34487790a62b5fbf7b87b</guid>
                
                    <category>
                        <![CDATA[ algorithms ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Gradient-Descent  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ technology ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Thu, 25 Apr 2019 19:02:19 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*fp0t2D3aV_oHb6a94fp5Zg.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Nishit Jain</p>
<h4 id="heading-the-subtle-yet-powerful-algorithm-that-optimizes-parameters">The subtle yet powerful algorithm that optimizes parameters</h4>
<p>Optimizing parameters is the ultimate goal of every machine learning algorithm. You want to get the optimum value of the slope and the intercept to get the line of best fit in linear regression problems. You also want to get the optimum value for the parameters of a sigmoidal curve in logistic regression problems. So what if I told you that Gradient Descent does it all?</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
