<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ neural networks - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ neural networks - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Mon, 25 May 2026 05:06:10 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/neural-networks/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Enhance Images with Neural Networks ]]>
                </title>
                <description>
                    <![CDATA[ Artificial intelligence is changing how we work with images. What once took hours in Photoshop can now happen in seconds with AI-powered tools. You can take a blurry picture, enlarge it without losing sharpness, fix the lighting, remove unwanted nois... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-enhance-images-with-neural-networks/</link>
                <guid isPermaLink="false">68b8e1073c8fb81fc2265eef</guid>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Thu, 04 Sep 2025 00:44:55 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1756858495684/2742e9b0-87f8-47bf-a01d-2e979e4dfb35.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Artificial intelligence is changing how we work with images. What once took hours in Photoshop can now happen in seconds with AI-powered tools. You can take a blurry picture, enlarge it without losing sharpness, fix the lighting, remove unwanted noise, or even bring color to a black-and-white photo, all with a single click.</p>
<p>The magic you see in these tools is powered by algorithms which are trained AI models that understand how images should look and then reconstruct them accordingly. These models have studied millions of examples to learn patterns, textures, and details, so they can “predict” what’s missing and fill it in naturally.</p>
<p>For developers, photographers, and content creators, knowing the basics of these algorithms can help you pick the right tools for your workflow. Even if you never plan to code an AI model yourself, this knowledge will help you make better choices for image processing, web apps, or creative projects.</p>
<p>Let’s look at five of the most important algorithms used in AI image enhancement today. Along the way, you’ll see real-world tools that use these algorithms and how you can try them yourself.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents</strong></h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-image-colorization">Image Colorization</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-gan-based-image-enhancement">GAN-Based Image Enhancement</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-noise-reduction-denoising-autoencoders">Noise Reduction (Denoising Autoencoders)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-image-upscaling-using-super-resolution">Image Upscaling using Super-Resolution</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-artifact-removal">Artifact Removal</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-these-algorithms-matter-to-developers">Why These Algorithms Matter to Developers</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-image-colorization"><strong>Image Colorization</strong></h2>
<p>Automatic image colorization might be the most visually dramatic AI enhancement of all. It takes a black-and-white image and predicts the colors that should be there, often producing results that look like the photo was taken in full color.</p>
<p>The AI behind this uses <a target="_blank" href="https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns">convolutional neural networks</a> (CNNs) trained on huge datasets of color images. The model sees both the grayscale and the color versions during training, so it learns how certain objects typically appear. For example, it might learn that grass is usually green, the sky is often blue, and human skin falls within a certain range of tones.</p>
<p><img src="https://images-wixmp-ed30a86b8c4ca887773594c2.wixmp.com/f/3f1ef7e7-b08b-4251-ae26-9c4a8646a85a/de2k3n6-e04b7996-7c6d-437d-bca7-16aee0c061f6.png?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1cm46YXBwOjdlMGQxODg5ODIyNjQzNzNhNWYwZDQxNWVhMGQyNmUwIiwiaXNzIjoidXJuOmFwcDo3ZTBkMTg4OTgyMjY0MzczYTVmMGQ0MTVlYTBkMjZlMCIsIm9iaiI6W1t7InBhdGgiOiJcL2ZcLzNmMWVmN2U3LWIwOGItNDI1MS1hZTI2LTljNGE4NjQ2YTg1YVwvZGUyazNuNi1lMDRiNzk5Ni03YzZkLTQzN2QtYmNhNy0xNmFlZTBjMDYxZjYucG5nIn1dXSwiYXVkIjpbInVybjpzZXJ2aWNlOmZpbGUuZG93bmxvYWQiXX0.UJn-AuEJzCsQtiSanUT9M7j6rac6d_8T-goaCiMY2KA" alt="Image Colorization" width="600" height="400" loading="lazy"></p>
<p>One of the most famous models is DeOldify, which combines CNNs with GANs. The GAN setup helps refine the results, making colors more natural and avoiding strange or overly bright tones.</p>
<p>Colorization has practical uses beyond restoring old family photos. It’s used in film restoration, historical projects, digital storytelling, and even concept art.</p>
<p>See <a target="_blank" href="https://www.canva.com/features/colorize-black-and-white/">Image Colorization</a> in action.</p>
<h2 id="heading-gan-based-image-enhancement"><strong>GAN-Based Image Enhancement</strong></h2>
<p>GANs, or <a target="_blank" href="https://developers.google.com/machine-learning/gan/gan_structure">Generative Adversarial Networks</a>, are one of the most powerful AI techniques in image enhancement. They consist of two neural networks: the generator, which tries to create realistic-looking images, and the discriminator, which evaluates them. Over many iterations, the generator becomes extremely good at producing images that pass as real.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756865306217/cc30de30-3124-4a5c-bcc5-75827ec92c6d.png" alt="Image Enhancement" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>In image retouching, GANs can handle many tasks at once, like fixing lighting, improving sharpness, enhancing textures, and even subtly changing elements to make the picture more appealing. Because GANs learn from real-world images, the results often feel more natural than traditional editing filters.</p>
<p>GAN-based retouching is used in professional portrait editing, e-commerce product photos, real estate listings, and even game asset creation. It’s also behind many “one-click enhance” buttons you see in modern apps.</p>
<p>See a GAN powered <a target="_blank" href="https://www.artguru.ai/photo-enhancer/">photo enhancer</a> here.</p>
<h2 id="heading-noise-reduction-denoising-autoencoders"><strong>Noise Reduction (Denoising Autoencoders)</strong></h2>
<p>Noise in images looks like random specks of color or brightness that shouldn’t be there. It often happens in low-light photos or in images taken with high ISO settings. Noise makes photos look grainy and less professional.</p>
<p>Traditional noise removal methods simply blurs the image to hide the noise, but this also destroyed fine details. AI noise reduction works differently.</p>
<p><a target="_blank" href="https://www.geeksforgeeks.org/machine-learning/denoising-autoencoders-in-machine-learning/">Denoising Autoencoders</a>, one of the most common approaches, learn from pairs of images—one clean and one noisy. The AI studies how noise distorts details, then learns to reverse the process.</p>
<p><img src="https://uk.mathworks.com/discovery/denoising/_jcr_content/mainParsys/columns/e4e497e4-fa5c-49a0-afff-3e840fe0a8ca/image.adapt.full.medium.jpg/1743063756357.jpg" alt="Image denoising" width="600" height="400" loading="lazy"></p>
<p>When you pass a noisy photo through a denoising autoencoder, it removes the noise while preserving edges, textures, and important small details.</p>
<p>Noise reduction isn’t just for photography. It’s also used in document scanning to make text easier to read, medical imaging to clarify scans, cleaning up screenshots or UI mockups for presentations</p>
<p>See <a target="_blank" href="https://www.pica-ai.com/resource/denoise-image/">Noise Reduction</a> in action here.</p>
<h2 id="heading-image-upscaling-using-super-resolution"><strong>Image Upscaling using Super-Resolution</strong></h2>
<p>Super-resolution is the process of increasing the resolution of an image to make it sharper and larger without simply stretching the pixels.</p>
<p>In the past, enlarging a small image just made it blurry. AI super-resolution works differently. It studies the image, detects patterns, and then generates new pixels that match what would have been there in a higher-quality original.</p>
<p>One of the first big breakthroughs was <a target="_blank" href="https://medium.com/coinmonks/review-srcnn-super-resolution-3cb3a4f67a7c">SRCNN</a> (Super-Resolution Convolutional Neural Network). SRCNN works by breaking the image into patches, analyzing them, and then predicting what higher-resolution patches should look like. This early approach was effective but sometimes produced overly smooth images.</p>
<p>Then came <a target="_blank" href="https://esrgan.readthedocs.io/en/latest/">ESRGAN</a> (Enhanced Super-Resolution Generative Adversarial Network), which took things further. ESRGAN uses a GAN architecture, a generator creates enhanced images, while a discriminator judges how real they look. Through this back-and-forth training, the generator learns to produce fine textures like hair strands, fabric weaves, or building details that look realistic to the human eye.</p>
<p><img src="https://www.any-video-converter.com/images2020/article/convert-low-resolution-image-to-high-resolution-online.jpg" alt="Image Upscaling" width="600" height="400" loading="lazy"></p>
<p>Super-resolution is widely used in e-commerce (for clearer product photos), printing (turning web images into high-resolution posters), and web apps (making user-uploaded images look professional).</p>
<p>See Super resolution powered <a target="_blank" href="https://www.artguru.ai/image-upscaler/">image upscaler</a> in action.</p>
<h2 id="heading-artifact-removal"><strong>Artifact Removal</strong></h2>
<p>When a JPEG image is heavily compressed, it develops blocky patches, fuzzy edges, and strange halos around lines. These are called compression artifacts, and they appear because JPEG reduces file size by removing fine detail. Traditional fixes blur the image to hide these defects, but that also softens important edges and textures.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756465727105/b74f2d5f-c489-4238-a073-72ce86a5a4a7.png" alt="JPEG Aartifact Removal" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><a target="_blank" href="https://github.com/jiaxi-jiang/FBCNN">FBCNN</a>, or Flexible Blind Convolutional Neural Network, takes a smarter approach. Instead of needing to know the exact compression level beforehand, FBCNN is trained to handle a wide range of artifact severities without extra input. This is what makes it “blind”, it doesn’t require metadata about how the JPEG was compressed. It can adapt its restoration process on the fly.</p>
<p>FBCNN works in two main steps. First, it extracts features from the image, analyzing patterns in edges, textures, and flat areas to identify where artifacts are most likely. Then, it applies a learned mapping to reconstruct what those regions should look like without the damage.</p>
<p>Because it can estimate the compression quality itself, FBCNN avoids the common problem of over-smoothing lightly compressed images or under-restoring heavily compressed ones.</p>
<p>This flexibility makes FBCNN useful in many scenarios: cleaning up low-quality images from social media, restoring graphics and text in screenshots, or preparing old compressed web images for printing. Modern AI tools often integrate FBCNN-style processing as a first step before applying super-resolution or general enhancement.</p>
<p>FBCNN’s ability to adapt without manual tuning makes it one of the most practical and developer-friendly models for real-world JPEG restoration today.</p>
<p>See <a target="_blank" href="https://huggingface.co/spaces/KenjieDec/FBCNN">artifact removal</a> in action.</p>
<h2 id="heading-why-these-algorithms-matter-to-developers"><strong>Why These Algorithms Matter to Developers</strong></h2>
<p>Even if you have never trained your own AI model, understanding these algorithms gives you a better sense of what’s possible and how to apply it. Many of the tools mentioned here offer APIs, which means developers can build them into their own apps and websites.</p>
<p>If you run a social platform, you can automatically enhance user-uploaded images before they appear in feeds. If you build e-commerce platforms, you can clean and upscale product images for better sales conversions. If you work in media archiving, you can restore and preserve images without spending hours on manual edits.</p>
<p>The real value comes from knowing which algorithm is right for the problem you’re solving. Super-resolution for enlarging, denoising for cleaning, colorization for restoration, artifact removal for fixing compression, and GAN retouching for overall beautification.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>AI image enhancement has moved from research labs to everyday tools, making it possible for anyone to transform low-quality images into something sharp, vibrant, and professional. The algorithms behind these tools like super-resolution, denoising, colorization, artifact removal, and GAN retouching are the building blocks of modern visual AI.</p>
<p>Whether you’re a developer looking to integrate image processing into your app or a creator who wants to improve your visuals, knowing how these algorithms work will help you get the most out of AI. This is only the beginning and future models will be even more precise, faster, and capable of things we haven’t yet imagined. Developers who understand these foundations will be ready to make the most of the next wave of AI-powered creativity.</p>
<p><em>Hope you enjoyed this article. Signup for my free AI newsletter</em> <a target="_blank" href="https://www.turingtalks.ai/"><strong><em>TuringTalks.ai</em></strong></a> <em>for more hands-on tutorials on AI. You can also</em> <a target="_blank" href="https://manishshivanandhan.com/"><em>visit my website</em></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn to Build a Multilayer Perceptron with Real-Life Examples and Python Code ]]>
                </title>
                <description>
                    <![CDATA[ The perceptron is a fundamental concept in deep learning, with many algorithms stemming from its original design. In this tutorial, I’ll show you how to build both single layer and multi-layer perceptrons (MLPs) across three frameworks: Custom class... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-a-multilayer-perceptron-with-examples-and-python-code/</link>
                <guid isPermaLink="false">6839f729798ea464918cffe8</guid>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                    <category>
                        <![CDATA[ binary classification ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MLP (Multi-Layer Perceptrons) ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MathJax ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Kuriko ]]>
                </dc:creator>
                <pubDate>Fri, 30 May 2025 18:21:29 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748616370600/01903917-4be7-476b-90d1-18295d19edef.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>The <strong>perceptron</strong> is a fundamental concept in deep learning, with many algorithms stemming from its original design.</p>
<p>In this tutorial, I’ll show you how to build both single layer and multi-layer perceptrons (MLPs) across three frameworks:</p>
<ul>
<li><p>Custom classifier</p>
</li>
<li><p>Scikit-learn’s MLPClassifier</p>
</li>
<li><p>Keras Sequential classifier using SGD and Adam optimizers.</p>
</li>
</ul>
<p>This will help you learn about their various use cases and how they work.</p>
<h3 id="heading-table-of-contents">Table of Contents</h3>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-a-perceptron">What is a Perceptron?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-a-single-layered-classifier">How to Build a Single-Layered Classifier</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-a-multi-layer-perceptron">What is a Multi-Layer Perceptron?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-multi-layered-perceptrons">How to Build Multi-Layered Perceptrons</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-optimizers">Understanding Optimizers</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-an-mlp-classifier-with-sgd-optimizer">How to Build an MLP Classifier with SGD Optimizer</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-an-mlp-classifier-with-adam-optimizer">How to Build an MLP Classifier with Adam Optimizer</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-final-results-generalization">Final Results: Generalization</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h3 id="heading-prerequisites">Prerequisites</h3>
<ul>
<li><p>Mathematics (Calculus, Linear Algebra, Statistics)</p>
</li>
<li><p>Coding in Python</p>
</li>
<li><p>Basic understanding of Machine Learning concepts</p>
</li>
</ul>
<h2 id="heading-what-is-a-perceptron">What is a Perceptron?</h2>
<p>A perceptron is one of the simplest types of artificial neurons used in Machine Learning. It’s a building block of artificial neural networks that learns from labeled data to perform classification and pattern recognition tasks, typically on linearly separable data.</p>
<p>A single-layer perceptron consists of a single layer of artificial neurons, called perceptrons.</p>
<p>But when you connect many perceptrons together in layers, you have a multi-layer perceptron (MLP). This lets the network learn more complex patterns by combining simple decisions from each perceptron. And this makes MLPs powerful tools for tasks like image recognition and natural language processing.</p>
<p>The perceptron consists of four main parts:</p>
<ul>
<li><p><strong>Input layer</strong>: Takes the initial numerical values into the system for further processing.</p>
</li>
<li><p><strong>Weights</strong>: Combines input values with weights (and bias terms).</p>
</li>
<li><p><strong>Activation function</strong>: Determines whether the neuron should fire based on the threshold value.</p>
</li>
<li><p><strong>Output layer</strong>: Produces classification result.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748438698612/5b2920db-4ec1-455b-840e-7b5e9d6c2e75.png" alt="Image: Organization of a perceptron. Source: Rosenblatt 1958" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>It performs a weighted sum of inputs, adds a bias, and passes the result through an activation function – just like logistic regression. It’s sort of like a little decision-maker that says “yes” or “no” based on the information it gets.</p>
<p>So for instance, when we use a sigmoid activation, its output is a probability between 0 and 1, mimicking the behavior of logistic regression.</p>
<h3 id="heading-applications-of-perceptrons">Applications of Perceptrons</h3>
<p>Perceptrons are applied to tasks such as:</p>
<ul>
<li><p><strong>Image classification:</strong> Perceptrons classify images containing specific objects. They achieve this by performing binary classification tasks.</p>
</li>
<li><p><strong>Linear regression:</strong> Perceptrons can predict continuous outputs based on input features. This makes them useful for solving linear regression problems.</p>
</li>
</ul>
<h3 id="heading-how-the-activation-function-works">How the Activation Function Works</h3>
<p>For a single perceptron used for binary classification, the most common activation function is the <strong>step function</strong> (also known as the threshold function):</p>
<p>$$\phi(z) = \begin{cases} 1 &amp;\text{if } z \geq \theta \\ \\ 0 &amp;\text{if } z &lt; \theta \end{cases}$$</p><p>where:</p>
<ul>
<li><p><code>ϕ(z)</code>: the output of the activation function.</p>
</li>
<li><p><code>z</code>: the weighted sum of the inputs plus the bias:</p>
</li>
</ul>
<p>$$z = \sum_{i=1}^m w_i x_i + b$$</p><p>(xi: input values, w: weight associated with each input, b: bias terms)</p>
<p><code>θ</code> is the threshold. Often, the threshold θ is set to zero, and the bias (b) effectively controls the activation threshold.</p>
<p>In that case, the formula becomes:</p>
<p>$$\phi(z) = \begin{cases} 1 &amp;\text{if } z \geq 0 \\ \\ 0 &amp;\text{if } z &lt; 0 \end{cases}$$</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748439460839/e74f1c1c-4e89-419b-aa9e-24a297d81ff5.png" alt="Image: Step Function (Author)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>When the step function ϕ(z) outputs one, it signifies that the input belongs to the class labeled one.</p>
<p>This occurs <strong>when the weighted sum is greater than zero,</strong> leading the perceptron to predict the input is in this binary class.</p>
<p>While the step function is conceptually the original activation for a perceptron, its discontinuity at zero causes computational challenges.</p>
<p>In modern implementations, we can use other activation functions like the <strong>sigmoid</strong> function:</p>
<p>$$\sigma (z) = \frac {1} {1 + e^{-z}}$$</p><p>The sigmoid function also outputs zero or one depending on the weighted sum (z).</p>
<h3 id="heading-how-the-loss-function-works">How the Loss Function Works</h3>
<p>The <strong>loss function</strong> is a crucial concept in machine learning that quantifies the error or discrepancy between the model's predictions and the actual target values.</p>
<p>Its purpose is to penalize the model for making incorrect or inaccurate predictions, which guides the learning algorithm (for example, gradient descent) to adjust the model's parameters in a way that minimizes this error and improves performance.</p>
<p>In a binary classification task, the model may adopt the <strong>hinge loss function</strong> to penalize misclassifications by incurring an additional cost for incorrect predictions:</p>
<p>$$L(y, h(x)) = max(0, 1- y*h(x))$$</p><p>(h(x): prediction label, y: true label)</p>
<h2 id="heading-how-to-build-a-single-layered-classifier">How to Build a Single-Layered Classifier</h2>
<p>Now, let’s build a simple single-layer perceptron for binary classification.</p>
<h3 id="heading-1-custom-classifier">1. Custom Classifier</h3>
<h4 id="heading-initialize-the-classifier">Initialize the classifier</h4>
<p>We’ll first initialize the classifier with <code>weights</code>, <code>bias</code>, number of epochs (<code>n_iterations)</code>, and <code>learning_rates</code>.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, learning_rate=<span class="hljs-number">0.01</span>, n_iterations=<span class="hljs-number">1000</span></span>):</span>
    self.learning_rate = learning_rate
    self.n_iterations = n_iterations
    self.weights = <span class="hljs-literal">None</span>
    self.bias = <span class="hljs-literal">None</span>
</code></pre>
<h4 id="heading-define-the-activation-function">Define the activation function</h4>
<p>Use a step function that returns zero if input (x) ≤ 0, else 1. By default, the <code>threshold</code> is set to zero.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_step_function</span>(<span class="hljs-params">self, x, threshold: int = <span class="hljs-number">0</span></span>):</span>
     <span class="hljs-keyword">return</span> np.where(x &gt; threshold, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>)
</code></pre>
<h4 id="heading-train-the-model">Train the model</h4>
<p>Now it’s time to start training. The learning process involves iteratively updating the perceptron’s internal parameters: <code>weights</code> and <code>bias</code>.</p>
<p>This process is controlled by a specified number of training epochs defined by <code>n_iterations</code>.</p>
<p>In each epoch, the model processes the entire input dataset (X) and adjusts its weights and bias based on the difference between its predictions and the true labels (y), guided by a predefined <code>learning_rate</code>.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fit</span>(<span class="hljs-params">self, X, y</span>):</span>
    n_samples, n_features = X.shape

    self.weights = np.zeros(n_features)
    self.bias = <span class="hljs-number">0</span>

    <span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(self.n_iterations):
        <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(n_samples):
            <span class="hljs-comment"># compute weighted sum (z)</span>
            z = np.dot(X[i], self.weights) + self.bias

            <span class="hljs-comment"># apply the activation function</span>
            y_pred = self._step_function(z)

            <span class="hljs-comment"># update weights and bias</span>
            self.weights += self.learning_rate * (y[i] - y_pred) * X[i]
            self.bias += self.learning_rate * (y[i] - y_pred)
</code></pre>
<h4 id="heading-how-the-weights-work-in-the-iteration-loop">How the weights work in the iteration loop</h4>
<p>The weights in a perceptron define the orientation (slope) of the decision boundary that separates the classes.</p>
<p>Its iterative update in the <code>for</code> loop aims to reduce classification errors such that:</p>
<p>$$\begin {align*} w_j &amp;:= w_j + \Delta w_j \\ &amp; := w_j + \eta (y_i - \hat y_i)x_{ij} \\ &amp;= \begin{cases} w_j &amp;\text{(a) } y_i - \hat y_i = 0\\ w_j + \eta x_ij &amp;\text{(b) } y_i - \hat y_i = 1 \\ w_j - \eta x_ij &amp;\text{(c) } y_i - \hat y_i = -1 \\ \end{cases} \end{align*}$$</p><p>(<code>w_j</code>: j-th weight, <code>η</code>: learning rate, (<code>yi​−y^​i​</code>): error)</p>
<p>This means that:</p>
<ol>
<li><p>When the prediction is <strong>correct</strong>, the error is zero, so the weight is unchanged.</p>
</li>
<li><p>When the prediction is <strong>too low</strong> (yi​=1 and y^​i​=0), the weight is adjusted to the same direction to increase the weighted sum.</p>
</li>
<li><p>When the prediction is <strong>too high</strong> (yi​=0 and y^​i​=1), the weight is adjusted to the opposite direction to pull the weighted sum lower.</p>
</li>
</ol>
<h4 id="heading-how-the-bias-terms-work-in-the-iteration-loop">How the bias terms work in the iteration loop</h4>
<p>The bias determines the decision boundary’s intercept (position from the origin).</p>
<p>Similar to weights, we adjust the bias terms in each epoch to position the decision boundary:</p>
<p>$$\begin {align*} b &amp;:= b + \Delta b \\ &amp; := b + \eta (y_i - \hat y_i) \\ &amp;= \begin{cases} b &amp;\text{(a) } y_i - \hat y_i = 0\\ b + \eta &amp;\text{(b) } y_i - \hat y_i = 1 \\ b - \eta &amp;\text{(c) } y_i - \hat y_i = -1 \\ \end{cases} \end{align*}$$</p><p>This repeated adjustment aims to optimize the model’s ability to correctly classify the training data.</p>
<h4 id="heading-make-a-prediction">Make a prediction</h4>
<p>Lastly, we add a function to generate an outcome value (zero or one) for a new, unseen data (X):</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">predict</span>(<span class="hljs-params">self, X</span>):</span>
      linear_output = np.dot(X, self.weights) + self.bias
      predictions = self._step_function(linear_output)
      <span class="hljs-keyword">return</span> predictions
</code></pre>
<p><strong>The entire classifier looks like this:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Perceptron</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, learning_rate=<span class="hljs-number">0.01</span>, n_iterations=<span class="hljs-number">1000</span></span>):</span>
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = <span class="hljs-literal">None</span>
        self.bias = <span class="hljs-literal">None</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_step_function</span>(<span class="hljs-params">self, x, threshold: int = <span class="hljs-number">0</span></span>):</span>
        <span class="hljs-keyword">return</span> np.where(x &gt; threshold, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fit</span>(<span class="hljs-params">self, X, y</span>):</span>
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = <span class="hljs-number">0</span>

        <span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(self.n_iterations):
            <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(n_samples):
                linear_output = np.dot(X[i], self.weights) + self.bias
                y_pred = self._step_function(linear_output)
                self.weights += self.learning_rate * (y[i] - y_pred) * X[i]
                self.bias += self.learning_rate * (y[i] - y_pred)
        <span class="hljs-keyword">return</span> self

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">predict</span>(<span class="hljs-params">self, X</span>):</span>
        linear_output = np.dot(X, self.weights) + self.bias
        y_pred = self._step_function(linear_output)
        <span class="hljs-keyword">return</span> y_pred
</code></pre>
<h4 id="heading-simulate-with-synthetic-datasets">Simulate with synthetic datasets</h4>
<p>First, we generated a synthetic linearly separable dataset using <code>make_blob</code> and computed a decision boundary, then train the classifier we created.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.datasets <span class="hljs-keyword">import</span> make_blobs
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-comment"># create a mock dataset</span>
X, y = make_blobs(n_features=<span class="hljs-number">2</span>, centers=<span class="hljs-number">2</span>, n_samples=<span class="hljs-number">1000</span>, random_state=<span class="hljs-number">12</span>)

<span class="hljs-comment"># split</span>
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class="hljs-number">0.2</span>, random_state=<span class="hljs-number">42</span>)

<span class="hljs-comment"># train the model</span>
perceptron = Perceptron(learning_rate=<span class="hljs-number">0.1</span>, n_iterations=<span class="hljs-number">1000</span>).fit(X_train, y_train)

<span class="hljs-comment"># make a prediction</span>
y_pred_train = perceptron.predict(X_train)
y_pred_test = perceptron.predict(X_test)

<span class="hljs-comment"># evaluate the results</span>
acc_train = np.mean(y_pred_train == y_train)
acc_test = np.mean(y_pred_test == y_test)
print(<span class="hljs-string">f"Accuracy (Train): <span class="hljs-subst">{acc_train:<span class="hljs-number">.3</span>}</span> \nAccuracy (Test): <span class="hljs-subst">{acc_test:<span class="hljs-number">.3</span>}</span>"</span>)
</code></pre>
<h4 id="heading-results">Results</h4>
<p>The classifier generated a clear, highly accurate linear decision boundary.</p>
<ul>
<li><p><em>Accuracy (Train): 0.981</em></p>
</li>
<li><p><em>Accuracy (Test): 0.975</em></p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748440470195/0a01c5ad-124e-4f59-b4d5-9ee5dd5b23ce.png" alt="Decision boundary of single-layer perceptron (Custom classifier)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-2-leverage-sckitlearns-mcp-classifier">2. Leverage SckitLearn’s MCP Classifier</h3>
<p>For our convenience, we’ll use sckit-learn’s build-in classifier ( <code>MCPClassifier</code>) to build a similar, yet more robust classifier:</p>
<pre><code class="lang-python">model = MLPClassifier(
    hidden_layer_sizes=(), <span class="hljs-comment"># intentionally set empty to create a single layer perceptron</span>
    activation=<span class="hljs-string">'logistic'</span>, <span class="hljs-comment"># choosing a sigmoid function as an activation function</span>
    solver=<span class="hljs-string">'sgd'</span>, <span class="hljs-comment"># choosing SGD optimizer</span>
    max_iter=<span class="hljs-number">1000</span>,
    random_state=<span class="hljs-number">42</span>, 
    learning_rate=<span class="hljs-string">'constant'</span>, 
    learning_rate_init=<span class="hljs-number">0.1</span>
).fit(X_train, y_train)

y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

acc_train = np.mean(y_pred_train == y_train)
acc_test = np.mean(y_pred_test == y_test)
print(<span class="hljs-string">f"MCPClassifier\nAccuracy (Train): <span class="hljs-subst">{acc_train:<span class="hljs-number">.3</span>}</span> \nAccuracy (Test): <span class="hljs-subst">{acc_test:<span class="hljs-number">.3</span>}</span>"</span>)
</code></pre>
<h4 id="heading-results-1">Results</h4>
<p>The MCP Classifier generated a clear linear decision boundary with slightly better accuracy scores.</p>
<ul>
<li><p><em>Accuracy (Train): 0.985</em></p>
</li>
<li><p><em>Accuracy (Test): 0.995</em></p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748440118956/f5391f47-711a-4948-b956-1a76dbd7ca92.png" alt="Decision boundary of single-layer perceptron (MCP Classifier)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-limitations-of-single-layer-perceptrons">Limitations of Single-Layer Perceptrons</h3>
<p>Now, let’s talk about the key differences between the MCP Classifier and our custom single-layer perceptron.</p>
<p>Unlike more general neural networks, single-layer perceptrons use a <strong>step function</strong> as their activation.</p>
<p>Due to its discontinuity at x=0, the step function is not differentiable over its entire domain (−∞ to ∞).</p>
<p>This fundamental property precludes the use of <strong>gradient-based optimization algorithms</strong> such as SGD or Adam, as these methods depend on the computation of gradients, partial derivatives for the cost function.</p>
<p>In contrast, most neural networks employ differentiable activation functions (for example, <strong>sigmoid</strong>, <strong>ReLU</strong>) and loss functions (for example, <strong>MSE</strong>, <strong>Cross-Entropy</strong>) for effective optimization.</p>
<p>Other challenges of a single-layer perceptron include:</p>
<ul>
<li><p><strong>Limited to linear separability:</strong> Because they can only learn linear decision boundaries, they are unable to handle complex, non-linearly separable data.</p>
</li>
<li><p><strong>Lack of depth:</strong> Being single-layered, they cannot learn complex hierarchical representations.</p>
</li>
<li><p><strong>Limited optimizer options:</strong> As mentioned, their non-differentiable activation function precludes the use of major gradient-based optimizers.</p>
</li>
</ul>
<p>So, in the next section, you’ll learn about multi-layered perceptrons to overcome the disadvantages.</p>
<h2 id="heading-what-is-a-multi-layer-perceptron">What is a Multi-Layer Perceptron?</h2>
<p>An MLP is a class of feedforward artificial neural network that consists of at least <strong>three layers</strong> of nodes:</p>
<ul>
<li><p>an input layer,</p>
</li>
<li><p>one or more hidden layers, and</p>
</li>
<li><p>an output layer.</p>
</li>
</ul>
<p>Except for the input nodes, each node is a neuron that uses a <strong>nonlinear</strong> activation function.​</p>
<p>MLPs are widely used for classification problems as well as regression:</p>
<ul>
<li><p><strong>Classification tasks:</strong> MLPs are widely used for classification problems, such as handwriting recognition and speech recognition.​</p>
</li>
<li><p><strong>Regression analysis:</strong> They are also applied in regression problems where the relationship between input and output is complex.​</p>
</li>
</ul>
<h2 id="heading-how-to-build-multi-layered-perceptrons">How to Build Multi-Layered Perceptrons</h2>
<p>Let’s handle a binary classification task using a standard MLP architecture.</p>
<h3 id="heading-outline-of-the-project">Outline of the Project</h3>
<h4 id="heading-objective">Objective</h4>
<ul>
<li>Detect fraudulent transactions</li>
</ul>
<h4 id="heading-evaluation-metrics">Evaluation Metrics</h4>
<ul>
<li><p>Considering the cost of misclassification, we’ll prioritize improving <strong>Recall</strong> and <strong>Precision scores</strong></p>
</li>
<li><p>Then check the accuracy of classification with <strong>Accuracy</strong> Score (TP + TN / (TP + TN + FP + FN ))</p>
</li>
</ul>
<p><strong>Cost of Misclassification (from high to low):</strong></p>
<ul>
<li><p><strong>False Negative (FN):</strong> The model incorrectly identifies a fraudulent transaction as legitimate (Missing actual fraud)</p>
</li>
<li><p><strong>False Positive (FP):</strong> The model incorrectly identifies a legitimate transaction as fraudulent (Blocking legitimate customers.)</p>
</li>
<li><p><strong>True Positive (TP):</strong> The model correctly identifies a fraudulent transaction as fraud.</p>
</li>
<li><p><strong>True Negative (TN):</strong>  The model correctly identifies a non-fraudulent transaction as non-fraud.</p>
</li>
</ul>
<h3 id="heading-planning-an-mlp-architecture">Planning an MLP Architecture</h3>
<p>In the network, 19 input features feed into the first hidden layer’s 30 neurons, which use a ReLU activation function.</p>
<p>Then, their outputs are passed to the second layer, culminating in sigmoid values as the final output.</p>
<p>During the optimization process, we’ll let the optimizer (SGD and Adam) perform forward and backward passes to adjust parameters.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748440761512/37753a4c-f7f8-44bc-bea9-c50360830456.png" alt="Standard MLP Architecture for Binary Classification Tasks)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Image: Standard MLP Architecture for Binary Classification Tasks (Created by Kuriko Iwai using <a target="_blank" href="https://www.researchgate.net/publication/355148120_SS-MLP_A_Novel_Spectral-Spatial_MLP_Architecture_for_Hyperspectral_Image_Classification">image source</a>)</p>
<p>Especially in deeper network, <strong>ReLU</strong> is advantageous in preventing <a target="_blank" href="https://en.wikipedia.org/wiki/Vanishing_gradient_problem#:~:text=In%20machine%20learning%2C%20the%20vanishing,derivative%20of%20the%20loss%20function">vanishing gradient problems</a> where gradients become extremely small as they are backpropagated from the output layers.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748440797954/ba19bf66-cdb9-4bfb-9b92-e1e3f72e9fc7.png" alt="Comparison of major activation functions: From left to right: Sigmoid, Tanh, ReLU" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><a target="_blank" href="https://medium.com/data-science-collective/a-comprehensive-guide-on-neural-network-in-deep-learning-442ba9f1f0e5">Learn More: A Comprehensive Guide on Neural Network in Deep Learning</a></p>
<h3 id="heading-preprocessing-the-datasets">Preprocessing the Datasets</h3>
<p>First, we consolidate <a target="_blank" href="https://www.kaggle.com/datasets/computingvictor/transactions-fraud-datasets">three datasets  –  transaction, customer, and credit card</a>  –  into a single DataFrame, independently sanitizing numerical and categorical data:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> StandardScaler, OneHotEncoder
<span class="hljs-keyword">from</span> sklearn.impute <span class="hljs-keyword">import</span> SimpleImputer
<span class="hljs-keyword">from</span> sklearn.compose <span class="hljs-keyword">import</span> ColumnTransformer
<span class="hljs-keyword">from</span> sklearn.pipeline <span class="hljs-keyword">import</span> Pipeline

<span class="hljs-comment"># download the raw data to local</span>
<span class="hljs-keyword">import</span> kagglehub
path = kagglehub.dataset_download(<span class="hljs-string">"computingvictor/transactions-fraud-datasets"</span>)
dir = <span class="hljs-string">f'<span class="hljs-subst">{path}</span>/gd_card_flaud_demo'</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">sanitize_df</span>(<span class="hljs-params">amount_str</span>):</span>
    <span class="hljs-string">"""Removes '$' and converts the string to a float."""</span>
    <span class="hljs-keyword">if</span> isinstance(amount_str, str):
        <span class="hljs-keyword">return</span> float(amount_str.replace(<span class="hljs-string">'$'</span>, <span class="hljs-string">''</span>))
    <span class="hljs-keyword">return</span> amount_str

<span class="hljs-comment"># load transaction data</span>
trx_df = pd.read_csv(<span class="hljs-string">f'<span class="hljs-subst">{dir}</span>/transactions_data.csv'</span>)

<span class="hljs-comment"># sanitize the dataset (drop unnecessary columns and error transactions, convert string to int/float dtype)</span>
trx_df = trx_df[trx_df[<span class="hljs-string">'errors'</span>].isna()]
trx_df = trx_df.drop(columns=[<span class="hljs-string">'merchant_city'</span>,<span class="hljs-string">'merchant_state'</span>, <span class="hljs-string">'date'</span>, <span class="hljs-string">'mcc'</span>, <span class="hljs-string">'errors'</span>], axis=<span class="hljs-string">'columns'</span>)
trx_df[<span class="hljs-string">'amount'</span>] = trx_df[<span class="hljs-string">'amount'</span>].apply(sanitize_df)

<span class="hljs-comment"># merge the dataframe with fraud transaction flag.</span>
<span class="hljs-keyword">with</span> open(<span class="hljs-string">f'<span class="hljs-subst">{dir}</span>/train_fraud_labels.json'</span>, <span class="hljs-string">'r'</span>) <span class="hljs-keyword">as</span> fp:
    fraud_labels_json = json.load(fp=fp)

fraud_labels_dict = fraud_labels_json.get(<span class="hljs-string">'target'</span>, {})
fraud_labels_series = pd.Series(fraud_labels_dict, name=<span class="hljs-string">'is_fraud'</span>)
fraud_labels_series.index = fraud_labels_series.index.astype(int) <span class="hljs-comment"># convert the datatype from string to integer</span>
merged_df = pd.merge(trx_df, fraud_labels_series, left_on=<span class="hljs-string">'id'</span>, right_index=<span class="hljs-literal">True</span>, how=<span class="hljs-string">'left'</span>)
merged_df.fillna({<span class="hljs-string">'is_fraud'</span>: <span class="hljs-string">'No'</span>}, inplace=<span class="hljs-literal">True</span>)
merged_df[<span class="hljs-string">'is_fraud'</span>] = merged_df[<span class="hljs-string">'is_fraud'</span>].map({<span class="hljs-string">'Yes'</span>: <span class="hljs-number">1</span>, <span class="hljs-string">'No'</span>: <span class="hljs-number">0</span>})

<span class="hljs-comment"># load card data</span>
card_df = pd.read_csv(<span class="hljs-string">f'<span class="hljs-subst">{dir}</span>/cards_data.csv'</span>)
card_df = card_df.drop(columns=[<span class="hljs-string">'client_id'</span>, <span class="hljs-string">'acct_open_date'</span>, <span class="hljs-string">'card_number'</span>, <span class="hljs-string">'expires'</span>, <span class="hljs-string">'cvv'</span>], axis=<span class="hljs-string">'columns'</span>)
card_df[<span class="hljs-string">'credit_limit'</span>] = card_df[<span class="hljs-string">'credit_limit'</span>].apply(sanitize_df)

<span class="hljs-comment"># merge transaction and card data</span>
merged_df = pd.merge(left=merged_df, right=card_df, left_on=<span class="hljs-string">'card_id'</span>, right_on=<span class="hljs-string">'id'</span>, how=<span class="hljs-string">'inner'</span>)
merged_df = merged_df.drop(columns=[<span class="hljs-string">'id_y'</span>, <span class="hljs-string">'card_id'</span>], axis=<span class="hljs-string">'columns'</span>)

<span class="hljs-comment"># converts categorical variables into a new binary column (0 or 1)</span>
categorical_cols = merged_df.select_dtypes(include=[<span class="hljs-string">'object'</span>]).columns
df = merged_df.copy()
df = pd.get_dummies(df, columns=categorical_cols, dummy_na=<span class="hljs-literal">False</span>, dtype=float) 
df = df.dropna().drop([<span class="hljs-string">'client_id'</span>, <span class="hljs-string">'id_x'</span>], axis=<span class="hljs-number">1</span>)
print(<span class="hljs-string">'\nDataFrame: \n'</span>, df.head(n=<span class="hljs-number">3</span>))
</code></pre>
<p>DataFrame:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748440856826/ba79bdaf-e0a1-457f-ab19-fda3e0f08141.png" alt="Base DataFrame" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Our DataFrame shows an extremely <strong>skewed data distribution</strong> with:</p>
<ul>
<li><p>Fraud samples: 1,191</p>
</li>
<li><p>Non-fraud samples: 11,477,397</p>
</li>
</ul>
<p>For classification tasks, <strong>it's crucial to be aware of sample size imbalances and employ appropriate strategies to mitigate their negative impact</strong> on classification model performance, especially regarding the minority class.</p>
<p>For our data, we’ll:</p>
<ol>
<li><p>split the 1,191 fraud samples into training, validation, and test sets,</p>
</li>
<li><p>add an equal number of randomly chosen non-fraud samples from the DataFrame, and</p>
</li>
<li><p>adjust split balances later if generalization challenges arise.</p>
</li>
</ol>
<pre><code class="lang-python"><span class="hljs-comment"># define the desired size of the fraud samples for the validation and test sets</span>
val_size_per_class = <span class="hljs-number">200</span>
test_size_per_class = <span class="hljs-number">200</span>

<span class="hljs-comment"># create test sets</span>
X_test_fraud = df_fraud.sample(n=test_size_per_class, random_state=<span class="hljs-number">42</span>)
X_test_non_fraud = df_non_fraud.sample(n=test_size_per_class, random_state=<span class="hljs-number">42</span>)

<span class="hljs-comment"># combine to form the balanced test set</span>
X_test = pd.concat([X_test_fraud, X_test_non_fraud]).sample(frac=<span class="hljs-number">1</span>, random_state=<span class="hljs-number">42</span>).reset_index(drop=<span class="hljs-literal">True</span>)
y_test = X_test[<span class="hljs-string">'is_fraud'</span>]
X_test = X_test.drop(<span class="hljs-string">'is_fraud'</span>, axis=<span class="hljs-number">1</span>)

<span class="hljs-comment"># remove sampled rows from the original dataframes to avoid data leakage</span>
df_fraud_remaining = df_fraud.drop(X_test_fraud.index)
df_non_fraud_remaining = df_non_fraud.drop(X_test_non_fraud.index)


<span class="hljs-comment"># create validation sets</span>
X_val_fraud = df_fraud_remaining.sample(n=val_size_per_class, random_state=<span class="hljs-number">42</span>)
X_val_non_fraud = df_non_fraud_remaining.sample(n=val_size_per_class, random_state=<span class="hljs-number">42</span>)

<span class="hljs-comment"># combine to form the balanced validation set</span>
X_val = pd.concat([X_val_fraud, X_val_non_fraud]).sample(frac=<span class="hljs-number">1</span>, random_state=<span class="hljs-number">42</span>).reset_index(drop=<span class="hljs-literal">True</span>)
y_val = X_val[<span class="hljs-string">'is_fraud'</span>]
X_val = X_val.drop(<span class="hljs-string">'is_fraud'</span>, axis=<span class="hljs-number">1</span>)

<span class="hljs-comment"># remove sampled rows from the remaining dataframes</span>
df_fraud_train = df_fraud_remaining.drop(X_val_fraud.index)
df_non_fraud_train = df_non_fraud_remaining.drop(X_val_non_fraud.index)


<span class="hljs-comment"># create training sets</span>
min_train_samples_per_class = min(len(df_fraud_train), len(df_non_fraud_train))

X_train_fraud = df_fraud_train.sample(n=min_train_samples_per_class, random_state=<span class="hljs-number">42</span>)
X_train_non_fraud = df_non_fraud_train.sample(n=min_train_samples_per_class, random_state=<span class="hljs-number">42</span>)

X_train = pd.concat([X_train_fraud, X_train_non_fraud]).sample(frac=<span class="hljs-number">1</span>, random_state=<span class="hljs-number">42</span>).reset_index(drop=<span class="hljs-literal">True</span>)
y_train = X_train[<span class="hljs-string">'is_fraud'</span>]
X_train = X_train.drop(<span class="hljs-string">'is_fraud'</span>, axis=<span class="hljs-number">1</span>)


print(<span class="hljs-string">"\n--- Final Dataset Shapes and Distributions ---"</span>)
print(<span class="hljs-string">f"X_train shape: <span class="hljs-subst">{X_train.shape}</span>, y_train distribution: <span class="hljs-subst">{np.unique(y_train, return_counts=<span class="hljs-literal">True</span>)}</span>"</span>)
print(<span class="hljs-string">f"X_val shape: <span class="hljs-subst">{X_val.shape}</span>, y_val distribution: <span class="hljs-subst">{np.unique(y_val, return_counts=<span class="hljs-literal">True</span>)}</span>"</span>)
print(<span class="hljs-string">f"X_test shape: <span class="hljs-subst">{X_test.shape}</span>, y_test distribution: <span class="hljs-subst">{np.unique(y_test, return_counts=<span class="hljs-literal">True</span>)}</span>"</span>)
</code></pre>
<p>After the operation, we secured 1,582 training, 400 validation, and 400 test samples, each dataset maintaining a <strong>50:50 split between fraud and non-fraud transactions</strong>:</p>
<p><img src="https://cdn-images-1.medium.com/max/1440/1*IZtK3l0hSqmkOrm9h_d9Jw.png" alt="X, y datasets shape" width="600" height="400" loading="lazy"></p>
<p>Considering the high dimensional feature space with 19 input features, we’ll apply <strong>SMOTE</strong> to resample the training data (SMOTE should not be applied to validation or test sets to avoid data leakage):</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> imblearn.over_sampling <span class="hljs-keyword">import</span> SMOTE
<span class="hljs-keyword">from</span> collections <span class="hljs-keyword">import</span> Counter

train_target = <span class="hljs-number">2000</span>

smote_train = SMOTE(
  sampling_strategy={<span class="hljs-number">0</span>: train_target, <span class="hljs-number">1</span>: train_target},  <span class="hljs-comment"># increase sample size to 2,000</span>
  random_state=<span class="hljs-number">12</span>
)
X_train, y_train = smote_train.fit_resample(X_train, y_train)

print(<span class="hljs-string">f"\nAfter SMOTE with custom sampling_strategy (target train: <span class="hljs-subst">{train_target}</span>):"</span>)
print(<span class="hljs-string">f"X_train_oversampled shape: <span class="hljs-subst">{X_train.shape}</span>"</span>)
print(<span class="hljs-string">f"y_train_oversampled distribution: <span class="hljs-subst">{Counter(y_train)}</span>"</span>)
</code></pre>
<p>We’ve secured 4,000 training samples, maintaining a 50:50 split between fraud and non-fraud transactions:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748440986995/ed079321-3972-4226-b1a8-244010445162.png" alt="Training sample shape after SMOTE" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Lastly, we’ll apply <strong>column transformers</strong> to numerical and categorical features separately.</p>
<p>Column transformers are advantageous in handling datasets with multiple data types, as they can apply different transformations to different subsets of columns while preventing data leakage.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.impute <span class="hljs-keyword">import</span> SimpleImputer
<span class="hljs-keyword">from</span> sklearn.compose <span class="hljs-keyword">import</span> ColumnTransformer
<span class="hljs-keyword">from</span> sklearn.pipeline <span class="hljs-keyword">import</span> Pipeline

categorical_features = X_train.select_dtypes(include=[<span class="hljs-string">'object'</span>]).columns.tolist()
categorical_transformer = Pipeline(steps=[(<span class="hljs-string">'imputer'</span>, SimpleImputer(strategy=<span class="hljs-string">'most_frequent'</span>)),(<span class="hljs-string">'onehot'</span>, OneHotEncoder(handle_unknown=<span class="hljs-string">'ignore'</span>))])

numerical_features = X_train.select_dtypes(include=[<span class="hljs-string">'int64'</span>, <span class="hljs-string">'float64'</span>]).columns.tolist()
numerical_transformer = Pipeline(steps=[(<span class="hljs-string">'imputer'</span>, SimpleImputer(strategy=<span class="hljs-string">'mean'</span>)), (<span class="hljs-string">'scaler'</span>, StandardScaler())])

preprocessor = ColumnTransformer(
    transformers=[
        (<span class="hljs-string">'num'</span>, numerical_transformer, numerical_features),
        (<span class="hljs-string">'cat'</span>, categorical_transformer, categorical_features)
    ]
)

X_train_processed = preprocessor.fit_transform(X_train)
X_val_processed = preprocessor.transform(X_val)
X_test_processed = preprocessor.transform(X_test)
</code></pre>
<h2 id="heading-understanding-optimizers">Understanding Optimizers</h2>
<p>In deep learning, an optimizer is a crucial element that fine-tunes a neural network’s parameters during training. Its primary role is to minimize the model’s loss function, enhancing performance.</p>
<p>Various optimization algorithms, known as optimizers, employ distinct strategies to converge towards optimal parameters for improved predictions efficiently.</p>
<p>In this article, we’ll use the SGD Optimizer and Adam Optimizer.</p>
<h3 id="heading-1-how-a-sgd-stochastic-gradient-descent-optimizer-works">1. How a SGD (Stochastic Gradient Descent) Optimizer Works</h3>
<p>SGD is a major optimization algorithm that computes the gradient (partial derivative of the cost function) using a small mini-batch of examples at each epoch:</p>
<p>$$\begin{align*} w_j &amp;:= w_j - \eta \frac {\partial J} {\partial w_j} \\ \\ b &amp;:= b - \eta \frac {\partial J} {\partial b} \end{align*}$$</p><p>(w: weight, b: bias, J: cost function, <em>η</em>: learning rate)</p>
<p>In binary classification, the cost function (J) is defined with a sigmoid function (σ(z)) where z generates weighted sum of inputs and bias terms:</p>
<p>$$\begin{align*} J(y, \hat y) &amp;=−[y log(\hat y) + (1-y)log(1-\hat y)] \\ \\ \hat y &amp;= \sigma (z) = \frac {1} {1+e^{-z}} \\ \\ z &amp;= \sum_{i=1}^m w_i x_i + b \end {align*}$$</p><h3 id="heading-2-how-adam-adaptive-moment-estimation-optimizer-works">2. How Adam (Adaptive Moment Estimation) Optimizer Works</h3>
<p>Adam is an optimization algorithm that computes <strong>individual adaptive learning rates</strong> for different parameters from estimates of first and second moments of the gradients.</p>
<p>Adam optimizer combines the advantages of <a target="_blank" href="https://keras.io/api/optimizers/rmsprop/"><strong>RMSprop</strong></a> (using squared gradients to scale the learning rate) and <a target="_blank" href="https://optimization.cbe.cornell.edu/index.php?title=Momentum"><strong>Momentum</strong></a> (using past gradients to accelerate convergence):</p>
<p>$$w_{j,t+1} = w_{j,t} - \alpha \cdot \frac{\hat{m}{t,w_j}}{\sqrt{\hat{v}{t,w_j}} + \epsilon}$$</p><p>where:</p>
<ul>
<li><p><code>α</code>: The learning rate (default is 0.001)</p>
</li>
<li><p><code>ϵ</code>: A small positive constant used to avoid division by zero</p>
</li>
<li><p><code>m^</code>: First moment (mean) estimate with a bias correction, leveraging <strong>Momentum</strong>:</p>
</li>
</ul>
<p>$$\begin{align*} \hat m_t &amp;= \frac {m_t} {1 - \beta_1^t} \\ \\ m_t &amp;= \beta_1 m_{t-1} + (1-\beta_1) \underbrace{ \frac {\partial L} {\partial w_t}}_{\text{gradient}} \end{align*}$$</p><p>(<code>β1</code>​​: <strong>Decay rates</strong>, typically set to β1=0.9)</p>
<p><code>v^</code>: Second moment (variance) estimate with a bias correction, leveraging <strong>RMSprop</strong>:</p>
<p>$$\begin{align*} \hat v_t &amp;= \frac {v_t} {1 - \beta_2^t} \\ \\ v_t &amp;=\beta_2 v_{t-1} + (1- \beta_2) (\frac {\partial L} {\partial w_t})^2 \end {align*}$$</p><p>(<code>β2</code>​​: <strong>Decay rates</strong>, typically set to β2=0.999)</p>
<p>Since both <code>m</code>​​ and <code>v</code>​ are initialized at zero, Adam computes the bias-corrected estimates to prevent them being biased toward zero.</p>
<p>Learn More: <a target="_blank" href="https://medium.com/@kuriko-iwai/a-comprehensive-guide-on-neural-network-in-deep-learning-9c795a1f1648">A Comprehensive Guide on Neural Network in Deep Learning</a></p>
<h2 id="heading-how-to-build-an-mlp-classifier-with-sgd-optimizer">How to Build an MLP Classifier with SGD Optimizer</h2>
<h3 id="heading-custom-classifier">Custom Classifier</h3>
<p>This process involves a <strong>forward pass</strong> and <strong>backpropagation</strong>, during which SGD computes optimal weights and biases using gradients:</p>
<pre><code class="lang-python"><span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, n_samples, self.batch_size):
    <span class="hljs-comment"># SGD starts with randomly selected mini-batch for the epoch</span>
    X_batch = X_shuffled[i : i + self.batch_size]
    y_batch = y_shuffled[i : i + self.batch_size]

    <span class="hljs-comment"># A. forward pass</span>
    activations, zs = self._forward_pass(X_batch)
    y_pred = activations[<span class="hljs-number">-1</span>]  <span class="hljs-comment"># final output of the network</span>

    <span class="hljs-comment"># B. backpropagation</span>
    <span class="hljs-comment"># 1) calculating gradients for the output layer)</span>
    delta = y_pred - y_batch
    dW = np.dot(activations[<span class="hljs-number">-2</span>].T, delta) / X_batch.shape[<span class="hljs-number">0</span>]
    db = np.sum(delta, axis=<span class="hljs-number">0</span>) / X_batch.shape[<span class="hljs-number">0</span>]

    <span class="hljs-comment"># 2) update output layer parameters</span>
    self.weights[<span class="hljs-number">-1</span>] -= self.learning_rate * dW
    self.biases[<span class="hljs-number">-1</span>] -= self.learning_rate * db

    <span class="hljs-comment"># 3) iterate backward from last hidden layer to the input layer</span>
    <span class="hljs-keyword">for</span> l <span class="hljs-keyword">in</span> range(len(self.weights) - <span class="hljs-number">2</span>, <span class="hljs-number">-1</span>, <span class="hljs-number">-1</span>):
        delta = np.dot(delta, self.weights[l+<span class="hljs-number">1</span>].T) * self._relu_derivative(zs[l]) <span class="hljs-comment"># d_activation(z)</span>
        dW = np.dot(activations[l].T, delta) / X_batch.shape[<span class="hljs-number">0</span>]
        db = np.sum(delta, axis=<span class="hljs-number">0</span>) / X_batch.shape[<span class="hljs-number">0</span>]

        self.weights[l] -= self.learning_rate * dW
        self.biases[l] -= self.learning_rate * db
</code></pre>
<p>In the process of the forward pass, the network calculates a weighted sum of weights and bias (z), applies an activation function (ReLU) to the values in each hidden layer, and then computes the predicted output (y_pred) using a sigmoid function.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_forward_pass</span>(<span class="hljs-params">self, X</span>):</span>
    activations = [X]
    zs = []

    <span class="hljs-comment"># forward through hidden layers</span>
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(self.weights) - <span class="hljs-number">1</span>):
        z = np.dot(activations[<span class="hljs-number">-1</span>], self.weights[i]) + self.biases[i]
        zs.append(z)
        a = self._relu(z) <span class="hljs-comment"># using ReLU for hidden layers</span>
        activations.append(a)

    <span class="hljs-comment"># forward through output layer</span>
    z_output = np.dot(activations[<span class="hljs-number">-1</span>], self.weights[<span class="hljs-number">-1</span>]) + self.biases[<span class="hljs-number">-1</span>]
    zs.append(z_output)

    <span class="hljs-comment"># computes the final output using sigmoid function</span>
    y_pred = <span class="hljs-number">1</span> / (<span class="hljs-number">1</span> + np.exp(-np.clip(x, <span class="hljs-number">-500</span>, <span class="hljs-number">500</span>)))
    activations.append(y_pred)
    <span class="hljs-keyword">return</span> activations, zs
</code></pre>
<p>So the final classifier looks like this:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">MLP_SGD</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, hidden_layer_sizes=(<span class="hljs-params"><span class="hljs-number">10</span>,</span>), learning_rate=<span class="hljs-number">0.01</span>, n_epochs=<span class="hljs-number">1000</span>, batch_size=<span class="hljs-number">32</span></span>):</span>
        self.hidden_layer_sizes = hidden_layer_sizes
        self.learning_rate = learning_rate
        self.n_epochs = n_epochs
        self.batch_size = batch_size
        self.weights = []
        self.biases = []
        self.weights_history = []
        self.biases_history = []
        self.loss_history = []

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_sigmoid</span>(<span class="hljs-params">self, x</span>):</span>
        <span class="hljs-keyword">return</span> <span class="hljs-number">1</span> / (<span class="hljs-number">1</span> + np.exp(-np.clip(x, <span class="hljs-number">-500</span>, <span class="hljs-number">500</span>)))

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_sigmoid_derivative</span>(<span class="hljs-params">self, x</span>):</span>
        s = self._sigmoid(x)
        <span class="hljs-keyword">return</span> s * (<span class="hljs-number">1</span> - s)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_relu</span>(<span class="hljs-params">self, x</span>):</span>
        <span class="hljs-keyword">return</span> np.maximum(<span class="hljs-number">0</span>, x)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_relu_derivative</span>(<span class="hljs-params">self, x</span>):</span>
        <span class="hljs-keyword">return</span> (x &gt; <span class="hljs-number">0</span>).astype(float)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_initialize_parameters</span>(<span class="hljs-params">self, n_features</span>):</span>
        layer_sizes = [n_features] + list(self.hidden_layer_sizes) + [<span class="hljs-number">1</span>]
        self.weights = []
        self.biases = []

        <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(layer_sizes) - <span class="hljs-number">1</span>):
            fan_in = layer_sizes[i]
            fan_out = layer_sizes[i+<span class="hljs-number">1</span>]
            limit = np.sqrt(<span class="hljs-number">6</span> / (fan_in + fan_out))
            self.weights.append(np.random.uniform(-limit, limit, (fan_in, fan_out)))
            self.biases.append(np.zeros((<span class="hljs-number">1</span>, fan_out)))

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_forward_pass</span>(<span class="hljs-params">self, X</span>):</span>
        activations = [X]
        zs = []

        <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(self.weights) - <span class="hljs-number">1</span>):
            z = np.dot(activations[<span class="hljs-number">-1</span>], self.weights[i]) + self.biases[i]
            zs.append(z)
            a = self._relu(z)
            activations.append(a)

        z_output = np.dot(activations[<span class="hljs-number">-1</span>], self.weights[<span class="hljs-number">-1</span>]) + self.biases[<span class="hljs-number">-1</span>]
        zs.append(z_output)
        y_pred = self._sigmoid(z_output)
        activations.append(y_pred)

        <span class="hljs-keyword">return</span> activations, zs

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_compute_loss</span>(<span class="hljs-params">self, y_true, y_pred</span>):</span>
        y_pred = np.clip(y_pred, <span class="hljs-number">1e-10</span>, <span class="hljs-number">1</span> - <span class="hljs-number">1e-10</span>)
        loss = -np.mean(y_true * np.log(y_pred) + (<span class="hljs-number">1</span> - y_true) * np.log(<span class="hljs-number">1</span> - y_pred))
        <span class="hljs-keyword">return</span> loss

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fit</span>(<span class="hljs-params">self, X, y</span>):</span>
        n_samples, n_features = X.shape
        y = np.asarray(y).reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)
        X = np.asarray(X)
        self._initialize_parameters(n_features)
        self.weights_history.append([w.copy() <span class="hljs-keyword">for</span> w <span class="hljs-keyword">in</span> self.weights])
        self.biases_history.append([b.copy() <span class="hljs-keyword">for</span> b <span class="hljs-keyword">in</span> self.biases])
        activations, _ = self._forward_pass(X)
        initial_loss = self._compute_loss(y, activations[<span class="hljs-number">-1</span>])
        self.loss_history.append(initial_loss)

        <span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(self.n_epochs):
            <span class="hljs-comment"># shuffle datasets</span>
            permutation = np.random.permutation(n_samples)
            X_shuffled = X[permutation]
            y_shuffled = y[permutation]

            <span class="hljs-comment"># mini-batch loop</span>
            <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, n_samples, self.batch_size):
                X_batch = X_shuffled[i : i + self.batch_size]
                y_batch = y_shuffled[i : i + self.batch_size]

                activations, zs = self._forward_pass(X_batch)
                y_pred = activations[<span class="hljs-number">-1</span>]

                delta = y_pred - y_batch
                dW = np.dot(activations[<span class="hljs-number">-2</span>].T, delta) / X_batch.shape[<span class="hljs-number">0</span>]
                db = np.sum(delta, axis=<span class="hljs-number">0</span>) / X_batch.shape[<span class="hljs-number">0</span>]
                self.weights[<span class="hljs-number">-1</span>] -= self.learning_rate * dW
                self.biases[<span class="hljs-number">-1</span>] -= self.learning_rate * db

                <span class="hljs-keyword">for</span> l <span class="hljs-keyword">in</span> range(len(self.weights) - <span class="hljs-number">2</span>, <span class="hljs-number">-1</span>, <span class="hljs-number">-1</span>):
                    delta = np.dot(delta, self.weights[l+<span class="hljs-number">1</span>].T) * self._relu_derivative(zs[l]) <span class="hljs-comment"># d_activation(z)</span>
                    dW = np.dot(activations[l].T, delta) / X_batch.shape[<span class="hljs-number">0</span>]
                    db = np.sum(delta, axis=<span class="hljs-number">0</span>) / X_batch.shape[<span class="hljs-number">0</span>]

                    self.weights[l] -= self.learning_rate * dW
                    self.biases[l] -= self.learning_rate * db

            self.weights_history.append([w.copy() <span class="hljs-keyword">for</span> w <span class="hljs-keyword">in</span> self.weights])
            self.biases_history.append([b.copy() <span class="hljs-keyword">for</span> b <span class="hljs-keyword">in</span> self.biases])

            activations, _ = self._forward_pass(X)
            epoch_loss = self._compute_loss(y, activations[<span class="hljs-number">-1</span>])
            self.loss_history.append(epoch_loss)

            <span class="hljs-keyword">if</span> (epoch + <span class="hljs-number">1</span>) % <span class="hljs-number">100</span> == <span class="hljs-number">0</span>:
                print(<span class="hljs-string">f"Epoch <span class="hljs-subst">{epoch+<span class="hljs-number">1</span>}</span>/<span class="hljs-subst">{self.n_epochs}</span>, Loss: <span class="hljs-subst">{epoch_loss:<span class="hljs-number">.4</span>f}</span>"</span>)
        <span class="hljs-keyword">return</span> self

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">predict_proba</span>(<span class="hljs-params">self, X</span>):</span>
        activations, _ = self._forward_pass(X)
        <span class="hljs-keyword">return</span> activations[<span class="hljs-number">-1</span>]

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">predict</span>(<span class="hljs-params">self, X, threshold=<span class="hljs-number">0.5</span></span>):</span>
        probabilities = self.predict_proba(X)
        <span class="hljs-keyword">return</span> (probabilities &gt;= threshold).astype(int).flatten() <span class="hljs-comment"># for 1D output</span>
</code></pre>
<h3 id="heading-training-prediction">Training / Prediction</h3>
<p>Train the model and make a prediction using training and validation datasets:</p>
<pre><code class="lang-python"><span class="hljs-comment"># 1. define the model</span>
mlp_sgd = MLP_SGD(
  hidden_layer_sizes=(<span class="hljs-number">30</span>, <span class="hljs-number">30</span>, ), <span class="hljs-comment"># 2 hidden layers with 30 neurons each</span>
  learning_rate=<span class="hljs-number">0.001</span>,           <span class="hljs-comment"># a step size</span>
  n_epochs=<span class="hljs-number">1000</span>,                 <span class="hljs-comment"># number of epochs</span>
  batch_size=<span class="hljs-number">32</span>                  <span class="hljs-comment"># mini-batch size</span>
)

<span class="hljs-comment"># 2. train the model</span>
mlp_sgd.fit(X_train_processed, y_train)

<span class="hljs-comment"># 3. make a prediction with training and validation datasets</span>
y_pred_train = mlp_sgd.predict(X_train_processed)
y_pred_val = mlp_sgd.predict(X_val_processed)

<span class="hljs-comment"># 4. compute evaluation matrics</span>
conf_matrix = confusion_matrix(y_true, y_pred)
acc = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred, pos_label=<span class="hljs-number">1</span>)
recall = recall_score(y_true, y_pred, pos_label=<span class="hljs-number">1</span>)
f1 = f1_score(y_true, y_pred, pos_label=<span class="hljs-number">1</span>)


print(<span class="hljs-string">f"\nMLP (Custom SGD) Accuracy (Train): <span class="hljs-subst">{acc_train:<span class="hljs-number">.3</span>f}</span>"</span>)
print(<span class="hljs-string">f"MLP (Custom SGD) Accuracy (Validation): <span class="hljs-subst">{acc_val:<span class="hljs-number">.3</span>f}</span>"</span>)
</code></pre>
<h3 id="heading-results-2">Results</h3>
<ul>
<li><p>Recall: <em>0.7930 — 0.6650 (from training to validation)</em></p>
</li>
<li><p>Precision: <em>0.7790 — 0.6786 (from training to validation)</em></p>
</li>
</ul>
<p>The model effectively learned and generalized the patterns, achieving a <strong>Recall of 79.3%</strong> (approximately 80% accuracy in identifying fraud transactions) with a 12-point drop on the validation set.</p>
<p><strong>Loss history:</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748441103897/088deb38-846d-4026-a706-701be93036ca.png" alt="Loss by epoch, weight history, bias history (Source: Kuriko Iwai)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>We visualized the <strong>decision boundary</strong> using the first two principal components (PCA) as the x and y axes. Note that the boundary is non-linear.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748442430297/032ee809-1b7e-4bb1-81c0-8715361658a5.png" alt="Image: Decision Boundary of MLP Classifier with SGD optimizer (Source: Kuriko Iwai)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-leverage-sckitlearns-mcp-classifier">Leverage SckitLearn’s MCP Classifier</h3>
<p>We can use an MCP Classifier to define a similar model, incorporating;</p>
<ul>
<li><p><strong>Early stopping</strong> using internal validation to prevent overfitting and</p>
</li>
<li><p><strong>L2 regularization</strong> with a small tolerance.</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.neural_network <span class="hljs-keyword">import</span> MLPClassifier

<span class="hljs-comment"># define a model</span>
model_sklearn_mlp_sgd = MLPClassifier(
    hidden_layer_sizes=(<span class="hljs-number">30</span>, <span class="hljs-number">30</span>),
    activation=<span class="hljs-string">'relu'</span>,
    solver=<span class="hljs-string">'sgd'</span>,
    learning_rate_init=<span class="hljs-number">0.001</span>,
    learning_rate=<span class="hljs-string">'constant'</span>,
    momentum=<span class="hljs-number">0.9</span>,
    nesterovs_momentum=<span class="hljs-literal">True</span>,
    alpha=<span class="hljs-number">0.00001</span>,           <span class="hljs-comment"># l2 regulation strength</span>
    max_iter=<span class="hljs-number">3000</span>,           <span class="hljs-comment"># max epochs (keep it high)</span>
    batch_size=<span class="hljs-number">16</span>,           <span class="hljs-comment"># mini-batch size</span>
    random_state=<span class="hljs-number">42</span>,
    early_stopping=<span class="hljs-literal">True</span>,     <span class="hljs-comment"># apply early stopping</span>
    n_iter_no_change=<span class="hljs-number">50</span>,     <span class="hljs-comment"># stop the iteration if internal validation score doesn't improve for 50 epochs</span>
    validation_fraction=<span class="hljs-number">0.1</span>, <span class="hljs-comment"># proportion of training data for internal validation (default is 0.1)</span>
    tol=<span class="hljs-number">1e-4</span>,                <span class="hljs-comment"># tolerance for optimization</span>
    verbose=<span class="hljs-literal">False</span>,
)

<span class="hljs-comment"># training</span>
model_sklearn_mlp_sgd.fit(X_train_processed, y_train)

<span class="hljs-comment"># make a prediction</span>
y_pred_train_sklearn = model_sklearn_mlp_sgd.predict(X_train_processed)
y_pred_val_sklearn = model_sklearn_mlp_sgd.predict(X_val_processed)
</code></pre>
<h3 id="heading-results-3">Results</h3>
<ul>
<li><p>Recall: <em>0.7830 - 0.6200 (from training to validation)</em></p>
</li>
<li><p>Precision: <em>0.8208  - 0.6703 (from training to validation)</em></p>
</li>
</ul>
<p>The model showed strong performance during training, achieving a Recall <strong>of 78.30%</strong>. Its performance declined on the validation set.</p>
<p>This suggests that while the model learned effectively from the training data, it may be overfitting and not generalizing as well to unseen data.</p>
<h3 id="heading-leverage-keras-sequential-classifier">Leverage Keras Sequential Classifier</h3>
<p>For the sequential classifier, we can further enhance the classifier by:</p>
<ul>
<li><p>Initializing the output layer’s bias with the log-odds of positive class occurrences in the training data (y_train​) to address dataset imbalance and promote faster convergence,</p>
</li>
<li><p>Integrating 10% dropout between hidden layers to prevent overfitting by randomly deactivating neurons during training,</p>
</li>
<li><p>Including Precision and Recall in the model’s compilation metrics to optimize for classification performance,</p>
</li>
<li><p>Applying class weights to penalize misclassifications of the minority class more heavily, improving the model’s ability to learn rare patterns, and</p>
</li>
<li><p>Utilizing a separate validation dataset for monitoring performance during training to help detect overfitting and guides hyperparameter tuning.</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf
<span class="hljs-keyword">from</span> tensorflow <span class="hljs-keyword">import</span> keras
<span class="hljs-keyword">from</span> keras.models <span class="hljs-keyword">import</span> Sequential
<span class="hljs-keyword">from</span> keras.layers <span class="hljs-keyword">import</span> Dense, Dropout, Input
<span class="hljs-keyword">from</span> keras.optimizers <span class="hljs-keyword">import</span> SGD
<span class="hljs-keyword">from</span> keras.callbacks <span class="hljs-keyword">import</span> EarlyStopping
<span class="hljs-keyword">from</span> sklearn.utils <span class="hljs-keyword">import</span> class_weight


<span class="hljs-comment"># calculates an initial bias for the output layer </span>
initial_bias = np.log([np.sum(y_train == <span class="hljs-number">1</span>) / np.sum(y_train == <span class="hljs-number">0</span>)])


<span class="hljs-comment"># defines the model</span>
model_keras_sgd = Sequential([
    Input(shape=(X_train_processed.shape[<span class="hljs-number">1</span>],)), 
    Dense(<span class="hljs-number">30</span>, activation=<span class="hljs-string">'relu'</span>),
    Dropout(<span class="hljs-number">0.1</span>), <span class="hljs-comment"># 10% of the neurons in that layer randomly dropped out</span>
    Dense(<span class="hljs-number">30</span>, activation=<span class="hljs-string">'relu'</span>),
    Dropout(<span class="hljs-number">0.1</span>),
    Dense(<span class="hljs-number">1</span>, activation=<span class="hljs-string">'sigmoid'</span>, <span class="hljs-comment"># binary classification</span>
          bias_initializer=tf.keras.initializers.Constant(initial_bias)) <span class="hljs-comment"># to address the imbalanced datasets</span>
])



<span class="hljs-comment"># compiles the model with the SGD optimizer</span>
opt = SGD(learning_rate=<span class="hljs-number">0.001</span>)
model_keras_sgd.compile(
    optimizer=opt, 
    loss=<span class="hljs-string">'binary_crossentropy'</span>,
    metrics=[
        <span class="hljs-string">'accuracy'</span>, <span class="hljs-comment"># add several metrics to return</span>
        tf.keras.metrics.Precision(name=<span class="hljs-string">'precision'</span>),
        tf.keras.metrics.Recall(name=<span class="hljs-string">'recall'</span>),
        tf.keras.metrics.AUC(name=<span class="hljs-string">'auc'</span>) 
    ]
)


<span class="hljs-comment"># defines early stopping to prevent overfitting</span>
early_stopping_callback = EarlyStopping(
    monitor=<span class="hljs-string">'val_recall'</span>,  <span class="hljs-comment"># monitor recall </span>
    mode=<span class="hljs-string">'max'</span>,         <span class="hljs-comment"># maximize recall</span>
    patience=<span class="hljs-number">50</span>,        <span class="hljs-comment"># stop after 50 epochs without loss improvement</span>
    min_delta=<span class="hljs-number">1e-4</span>,     <span class="hljs-comment"># minimum change to be considered an improvement (tol)</span>
    verbose=<span class="hljs-number">0</span>
)


<span class="hljs-comment"># compute the class weight</span>
class_weights = class_weight.compute_class_weight(
    class_weight=<span class="hljs-string">'balanced'</span>,
    classes=np.unique(y_train),
    y=y_train
)
class_weights_dict = dict(zip(np.unique(y_train), class_weights))


<span class="hljs-comment"># train the model</span>
history = model_keras_sgd.fit(
    X_train_processed, y_train,
    epochs=<span class="hljs-number">1000</span>,
    batch_size=<span class="hljs-number">32</span>,
    validation_data=(X_val_processed, y_val), <span class="hljs-comment"># use our external val set</span>
    callbacks=[early_stopping_callback], <span class="hljs-comment"># early stopping to prevent overfitting</span>
    class_weight=class_weights_dict, <span class="hljs-comment"># penarlize more misclassification on minority class</span>
    verbose=<span class="hljs-number">0</span>
)

<span class="hljs-comment"># evaluate</span>
loss_train, accuracy_train, precision_train, recall_train, auc_train = model_keras_sgd.evaluate(X_train_processed, y_train, verbose=<span class="hljs-number">0</span>)
print(<span class="hljs-string">f"\n--- Keras Model Accuracy (Train) ---"</span>)
print(<span class="hljs-string">f"Loss: <span class="hljs-subst">{loss_train:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Accuracy: <span class="hljs-subst">{accuracy_train:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Precision: <span class="hljs-subst">{precision_train:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Recall: <span class="hljs-subst">{recall_train:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"AUC: <span class="hljs-subst">{auc_train:<span class="hljs-number">.4</span>f}</span>"</span>)

loss_val, accuracy_val, precision_val, recall_val, auc_val = model_keras_sgd.evaluate(X_val_processed, y_val, verbose=<span class="hljs-number">0</span>)
print(<span class="hljs-string">f"\n--- Keras Model Accuracy (Validation) ---"</span>)
print(<span class="hljs-string">f"Loss: <span class="hljs-subst">{loss_val:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Accuracy: <span class="hljs-subst">{accuracy_val:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Precision: <span class="hljs-subst">{precision_val:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Recall: <span class="hljs-subst">{recall_val:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"AUC: <span class="hljs-subst">{auc_val:<span class="hljs-number">.4</span>f}</span>"</span>)

<span class="hljs-comment"># display model summary</span>
model_keras_sgd.summary()
</code></pre>
<h3 id="heading-results-4">Results</h3>
<ul>
<li><p>Recall: <em>0.7125 — 0.7250 (from training to validation)</em></p>
</li>
<li><p>Precision: <em>0.7607 — 0.7545 (from training to validation)</em></p>
</li>
</ul>
<p>Given that the gaps between training and validation are relatively small, the model is generalizing reasonably well.</p>
<p>It suggests that the regularization techniques are likely effective in preventing significant overfitting.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748441165170/4e0528e3-514a-454c-b52a-2a0318ba405a.png" alt="Image: Summary of the Keras Sequential Model with SGD Optimizer" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-to-build-an-mlp-classifier-with-adam-optimizer">How to Build an MLP Classifier with Adam Optimizer</h2>
<h3 id="heading-custom-classifier-1">Custom Classifier</h3>
<p>This iterative process of updating parameters occurs within the mini-batch loop to keep updating weights and bias:</p>
<pre><code class="lang-python"><span class="hljs-comment"># apply Adam updates for output layer parameters</span>
<span class="hljs-comment"># 1) weights (w)</span>
self.m_weights[<span class="hljs-number">-1</span>] = self.beta1 * self.m_weights[<span class="hljs-number">-1</span>] + (<span class="hljs-number">1</span> - self.beta1) * grad_w_output
self.v_weights[<span class="hljs-number">-1</span>] = self.beta2 * self.v_weights[<span class="hljs-number">-1</span>] + (<span class="hljs-number">1</span> - self.beta2) * (grad_w_output ** <span class="hljs-number">2</span>)
m_w_hat = self.m_weights[<span class="hljs-number">-1</span>] / (<span class="hljs-number">1</span> - self.beta1**t)
v_w_hat = self.v_weights[<span class="hljs-number">-1</span>] / (<span class="hljs-number">1</span> - self.beta2**t)
self.weights[<span class="hljs-number">-1</span>] -= self.learning_rate * m_w_hat / (np.sqrt(v_w_hat) + self.epsilon)

<span class="hljs-comment"># 2) bias (b)</span>
self.m_biases[<span class="hljs-number">-1</span>] = self.beta1 * self.m_biases[<span class="hljs-number">-1</span>] + (<span class="hljs-number">1</span> - self.beta1) * grad_b_output
self.v_biases[<span class="hljs-number">-1</span>] = self.beta2 * self.v_biases[<span class="hljs-number">-1</span>] + (<span class="hljs-number">1</span> - self.beta2) * (grad_b_output ** <span class="hljs-number">2</span>)
m_b_hat = self.m_biases[<span class="hljs-number">-1</span>] / (<span class="hljs-number">1</span> - self.beta1**t)
v_b_hat = self.v_biases[<span class="hljs-number">-1</span>] / (<span class="hljs-number">1</span> - self.beta2**t)
self.biases[<span class="hljs-number">-1</span>] -= self.learning_rate * m_b_hat / (np.sqrt(v_b_hat) + self.epsilon)
</code></pre>
<p>Following the principles of forward and backward passes, we construct the final classifier by initializing it with <code>beta1</code> and <code>beta2</code>, built upon an <code>MLP_SGD</code> architecture:</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">MLP_Adam</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, hidden_layer_sizes=(<span class="hljs-params"><span class="hljs-number">10</span>,</span>), learning_rate=<span class="hljs-number">0.001</span>, n_epochs=<span class="hljs-number">1000</span>, batch_size=<span class="hljs-number">32</span>,
                 beta1=<span class="hljs-number">0.9</span>, beta2=<span class="hljs-number">0.999</span>, epsilon=<span class="hljs-number">1e-8</span></span>):</span>
        self.hidden_layer_sizes = hidden_layer_sizes
        self.learning_rate = learning_rate
        self.n_epochs = n_epochs
        self.batch_size = batch_size
        self.beta1 = beta1
        self.beta2 = beta2
        self.epsilon = epsilon

        self.weights = [] 
        self.biases = []

        <span class="hljs-comment"># Adam optimizer internal states for each parameter (weights and biases)</span>
        self.m_weights = []
        self.v_weights = []
        self.m_biases = []
        self.v_biases = []

        self.weights_history = []
        self.biases_history = []
        self.loss_history = []

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_sigmoid</span>(<span class="hljs-params">self, x</span>):</span>
        <span class="hljs-keyword">return</span> <span class="hljs-number">1</span> / (<span class="hljs-number">1</span> + np.exp(-np.clip(x, <span class="hljs-number">-500</span>, <span class="hljs-number">500</span>)))

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_sigmoid_derivative</span>(<span class="hljs-params">self, x</span>):</span>
        s = self._sigmoid(x)
        <span class="hljs-keyword">return</span> s * (<span class="hljs-number">1</span> - s)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_relu</span>(<span class="hljs-params">self, x</span>):</span>
        <span class="hljs-keyword">return</span> np.maximum(<span class="hljs-number">0</span>, x)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_relu_derivative</span>(<span class="hljs-params">self, x</span>):</span>
        <span class="hljs-keyword">return</span> (x &gt; <span class="hljs-number">0</span>).astype(float)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_initialize_parameters</span>(<span class="hljs-params">self, n_features</span>):</span>
        layer_sizes = [n_features] + list(self.hidden_layer_sizes) + [<span class="hljs-number">1</span>]

        self.weights = []
        self.biases = []
        self.m_weights = []
        self.v_weights = []
        self.m_biases = []
        self.v_biases = []

        <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(layer_sizes) - <span class="hljs-number">1</span>):
            fan_in = layer_sizes[i]
            fan_out = layer_sizes[i+<span class="hljs-number">1</span>]
            limit = np.sqrt(<span class="hljs-number">6</span> / (fan_in + fan_out))

            self.weights.append(np.random.uniform(-limit, limit, (fan_in, fan_out)))
            self.biases.append(np.zeros((<span class="hljs-number">1</span>, fan_out)))

            self.m_weights.append(np.zeros((fan_in, fan_out)))
            self.v_weights.append(np.zeros((fan_in, fan_out)))
            self.m_biases.append(np.zeros((<span class="hljs-number">1</span>, fan_out)))
            self.v_biases.append(np.zeros((<span class="hljs-number">1</span>, fan_out)))


    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_forward_pass</span>(<span class="hljs-params">self, X</span>):</span>
        activations = [X]
        zs = []

        <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(self.weights) - <span class="hljs-number">1</span>):
            z = np.dot(activations[<span class="hljs-number">-1</span>], self.weights[i]) + self.biases[i]
            zs.append(z)
            a = self._relu(z)
            activations.append(a)

        z_output = np.dot(activations[<span class="hljs-number">-1</span>], self.weights[<span class="hljs-number">-1</span>]) + self.biases[<span class="hljs-number">-1</span>]
        zs.append(z_output)
        y_pred = self._sigmoid(z_output)
        activations.append(y_pred)

        <span class="hljs-keyword">return</span> activations, zs

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_compute_loss</span>(<span class="hljs-params">self, y_true, y_pred</span>):</span>
        y_pred = np.clip(y_pred, <span class="hljs-number">1e-10</span>, <span class="hljs-number">1</span> - <span class="hljs-number">1e-10</span>)
        loss = -np.mean(y_true * np.log(y_pred) + (<span class="hljs-number">1</span> - y_true) * np.log(<span class="hljs-number">1</span> - y_pred))
        <span class="hljs-keyword">return</span> loss

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fit</span>(<span class="hljs-params">self, X, y</span>):</span>
        n_samples, n_features = X.shape
        y = np.asarray(y).reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)
        X = np.asarray(X)

        self._initialize_parameters(n_features)
        self.weights_history.append([w.copy() <span class="hljs-keyword">for</span> w <span class="hljs-keyword">in</span> self.weights])
        self.biases_history.append([b.copy() <span class="hljs-keyword">for</span> b <span class="hljs-keyword">in</span> self.biases])
        activations, _ = self._forward_pass(X)
        initial_loss = self._compute_loss(y, activations[<span class="hljs-number">-1</span>])
        self.loss_history.append(initial_loss)

        <span class="hljs-comment"># global time step for Adam bias correction</span>
        t = <span class="hljs-number">0</span>

        <span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(self.n_epochs):
            permutation = np.random.permutation(n_samples)
            X_shuffled = X[permutation]
            y_shuffled = y[permutation]

            <span class="hljs-comment"># Mini-batch loop</span>
            <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, n_samples, self.batch_size):
                X_batch = X_shuffled[i : i + self.batch_size]
                y_batch = y_shuffled[i : i + self.batch_size]

                t += <span class="hljs-number">1</span>

                <span class="hljs-comment"># 1. forward pass</span>
                activations, zs = self._forward_pass(X_batch)
                y_pred = activations[<span class="hljs-number">-1</span>] <span class="hljs-comment"># Output of the network</span>

                <span class="hljs-comment"># 2. backpropagation</span>
                delta = y_pred - y_batch
                grad_w_output = np.dot(activations[<span class="hljs-number">-2</span>].T, delta) / X_batch.shape[<span class="hljs-number">0</span>] <span class="hljs-comment"># Average over batch</span>
                grad_b_output = np.sum(delta, axis=<span class="hljs-number">0</span>) / X_batch.shape[<span class="hljs-number">0</span>]

                <span class="hljs-comment"># apply Adam updates to weights</span>
                self.m_weights[<span class="hljs-number">-1</span>] = self.beta1 * self.m_weights[<span class="hljs-number">-1</span>] + (<span class="hljs-number">1</span> - self.beta1) * grad_w_output
                self.v_weights[<span class="hljs-number">-1</span>] = self.beta2 * self.v_weights[<span class="hljs-number">-1</span>] + (<span class="hljs-number">1</span> - self.beta2) * (grad_w_output ** <span class="hljs-number">2</span>)
                m_w_hat = self.m_weights[<span class="hljs-number">-1</span>] / (<span class="hljs-number">1</span> - self.beta1**t)
                v_w_hat = self.v_weights[<span class="hljs-number">-1</span>] / (<span class="hljs-number">1</span> - self.beta2**t)
                self.weights[<span class="hljs-number">-1</span>] -= self.learning_rate * m_w_hat / (np.sqrt(v_w_hat) + self.epsilon)

                <span class="hljs-comment"># apply Adam updates to bias</span>
                self.m_biases[<span class="hljs-number">-1</span>] = self.beta1 * self.m_biases[<span class="hljs-number">-1</span>] + (<span class="hljs-number">1</span> - self.beta1) * grad_b_output
                self.v_biases[<span class="hljs-number">-1</span>] = self.beta2 * self.v_biases[<span class="hljs-number">-1</span>] + (<span class="hljs-number">1</span> - self.beta2) * (grad_b_output ** <span class="hljs-number">2</span>)
                m_b_hat = self.m_biases[<span class="hljs-number">-1</span>] / (<span class="hljs-number">1</span> - self.beta1**t)
                v_b_hat = self.v_biases[<span class="hljs-number">-1</span>] / (<span class="hljs-number">1</span> - self.beta2**t)
                self.biases[<span class="hljs-number">-1</span>] -= self.learning_rate * m_b_hat / (np.sqrt(v_b_hat) + self.epsilon)


                <span class="hljs-comment"># Propagate gradients backward through hidden layers</span>
                <span class="hljs-keyword">for</span> l <span class="hljs-keyword">in</span> range(len(self.weights) - <span class="hljs-number">2</span>, <span class="hljs-number">-1</span>, <span class="hljs-number">-1</span>):
                    delta = np.dot(delta, self.weights[l+<span class="hljs-number">1</span>].T) * self._relu_derivative(zs[l]) <span class="hljs-comment"># d_activation(z)</span>
                    grad_w_hidden = np.dot(activations[l].T, delta) / X_batch.shape[<span class="hljs-number">0</span>]
                    grad_b_hidden = np.sum(delta, axis=<span class="hljs-number">0</span>) / X_batch.shape[<span class="hljs-number">0</span>]

                    <span class="hljs-comment"># apply Adam updates to weights</span>
                    self.m_weights[l] = self.beta1 * self.m_weights[l] + (<span class="hljs-number">1</span> - self.beta1) * grad_w_hidden
                    self.v_weights[l] = self.beta2 * self.v_weights[l] + (<span class="hljs-number">1</span> - self.beta2) * (grad_w_hidden ** <span class="hljs-number">2</span>)
                    m_w_hat = self.m_weights[l] / (<span class="hljs-number">1</span> - self.beta1**t)
                    v_w_hat = self.v_weights[l] / (<span class="hljs-number">1</span> - self.beta2**t)
                    self.weights[l] -= self.learning_rate * m_w_hat / (np.sqrt(v_w_hat) + self.epsilon)

                    <span class="hljs-comment"># apply Adam updates to bias</span>
                    self.m_biases[l] = self.beta1 * self.m_biases[l] + (<span class="hljs-number">1</span> - self.beta1) * grad_b_hidden
                    self.v_biases[l] = self.beta2 * self.v_biases[l] + (<span class="hljs-number">1</span> - self.beta2) * (grad_b_hidden ** <span class="hljs-number">2</span>)
                    m_b_hat = self.m_biases[l] / (<span class="hljs-number">1</span> - self.beta1**t)
                    v_b_hat = self.v_biases[l] / (<span class="hljs-number">1</span> - self.beta2**t)
                    self.biases[l] -= self.learning_rate * m_b_hat / (np.sqrt(v_b_hat) + self.epsilon)


            self.weights_history.append([w.copy() <span class="hljs-keyword">for</span> w <span class="hljs-keyword">in</span> self.weights])
            self.biases_history.append([b.copy() <span class="hljs-keyword">for</span> b <span class="hljs-keyword">in</span> self.biases])

            activations, _ = self._forward_pass(X)
            epoch_loss = self._compute_loss(y, activations[<span class="hljs-number">-1</span>])
            self.loss_history.append(epoch_loss)

            <span class="hljs-keyword">if</span> (epoch + <span class="hljs-number">1</span>) % <span class="hljs-number">100</span> == <span class="hljs-number">0</span>:
                print(<span class="hljs-string">f"Epoch <span class="hljs-subst">{epoch+<span class="hljs-number">1</span>}</span>/<span class="hljs-subst">{self.n_epochs}</span>, Loss: <span class="hljs-subst">{epoch_loss:<span class="hljs-number">.4</span>f}</span>"</span>)
        <span class="hljs-keyword">return</span> self


    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">predict_proba</span>(<span class="hljs-params">self, X</span>):</span>
        activations, _ = self._forward_pass(X)
        <span class="hljs-keyword">return</span> activations[<span class="hljs-number">-1</span>]

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">predict</span>(<span class="hljs-params">self, X, threshold=<span class="hljs-number">0.5</span></span>):</span>
        probabilities = self.predict_proba(X)
        <span class="hljs-keyword">return</span> (probabilities &gt;= threshold).astype(int).flatten()
</code></pre>
<h3 id="heading-training-prediction-1">Training / Prediction</h3>
<p>Train the model and make a prediction using training and validation datasets:</p>
<pre><code class="lang-python">mlp_adam = MLP_Adam(hidden_layer_sizes=(<span class="hljs-number">30</span>, <span class="hljs-number">10</span>), learning_rate=<span class="hljs-number">0.001</span>, n_epochs=<span class="hljs-number">500</span>, batch_size=<span class="hljs-number">32</span>)
mlp_adam.fit(X_train_processed, y_train)

y_pred_train = mlp_adam.predict(X_train_processed)
y_pred_val = mlp_adam.predict(X_val_processed)

acc_train = accuracy_score(y_train, y_pred_train)
acc_val = accuracy_score(y_val, y_pred_val)

print(<span class="hljs-string">f"\nMLP (Custom Adam) Accuracy (Train): <span class="hljs-subst">{acc_train:<span class="hljs-number">.3</span>f}</span>"</span>)
print(<span class="hljs-string">f"MLP (Custom Adam) Accuracy (Validation): <span class="hljs-subst">{acc_val:<span class="hljs-number">.3</span>f}</span>"</span>)
</code></pre>
<h3 id="heading-results-5">Results</h3>
<ul>
<li><p>Recall: <em>0.9870–0.6150 (from training to validation)</em></p>
</li>
<li><p>Precision: <em>0.9811–0.6474 (from training to validation)</em></p>
</li>
</ul>
<p>While the Adam optimizer outperformed SGD, the model exhibited significant overfitting, with both Recall and Precision falling by around 30 points between training and validation.</p>
<p><strong>Loss History</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748442341394/3183a9b1-5df0-4f74-9473-6b5b595dc9c0.png" alt="Loss by epoch, middle: weights history by epoch, right: bias history by epoch (source: Kuriko Iwai)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>We visualized the decision boundary using the first two principal components (PCA) as the x and y axes.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748442311514/34f004c9-bf1d-41e5-a0af-08c62802b78c.png" alt="Decision Boundary of MLP with Adam Optimizer (source: Kuriko Iwai)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-leverage-sckitlearns-mcp-classifier-1">Leverage SckitLearn’s MCP Classifier</h3>
<p>We’ve switched the optimizer from SGD to Adam, keeping all other settings constant:</p>
<pre><code class="lang-python">model_sklearn_mlp_adam = MLPClassifier(
    hidden_layer_sizes=(<span class="hljs-number">30</span>, <span class="hljs-number">30</span>),
    activation=<span class="hljs-string">'relu'</span>,
    solver=<span class="hljs-string">'adam'</span>,             <span class="hljs-comment"># update the optimizer from SGD to Adam</span>
    learning_rate_init=<span class="hljs-number">0.001</span>,
    learning_rate=<span class="hljs-string">'constant'</span>,
    alpha=<span class="hljs-number">0.0001</span>,
    max_iter=<span class="hljs-number">3000</span>,
    batch_size=<span class="hljs-number">16</span>,
    random_state=<span class="hljs-number">42</span>,
    early_stopping=<span class="hljs-literal">True</span>,
    n_iter_no_change=<span class="hljs-number">50</span>,
    validation_fraction=<span class="hljs-number">0.1</span>,
    tol=<span class="hljs-number">1e-4</span>,
    verbose=<span class="hljs-literal">False</span>,
)

model_sklearn_mlp_adam.fit(X_train_processed, y_train)

y_pred_train_sklearn = model_sklearn_mlp_adam.predict(X_train_processed)
y_pred_val_sklearn = model_sklearn_mlp_adam.predict(X_val_processed)
</code></pre>
<h3 id="heading-results-6">Results</h3>
<ul>
<li><p><em>Recall: 0.8975–0.6400 (from training to validation)</em></p>
</li>
<li><p><em>Precision: 0.8864 —  0.6305 (from training to validation)</em></p>
</li>
</ul>
<p>Despite a performance improvement compared to the SGD optimizer, the significant drop in both Recall (from 0.8975 to 0.6400) and Precision (from 0.8864 to 0.6305) from training to validation data indicates that the model is still overfitting.</p>
<h3 id="heading-leverage-keras-sequential-classifier-1">Leverage Keras Sequential Classifier</h3>
<p>Similar to MLPClassifier, we’ve switched the optimizer from SGD to Adam with all the other conditions remaining the same:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf
<span class="hljs-keyword">from</span> tensorflow <span class="hljs-keyword">import</span> keras
<span class="hljs-keyword">from</span> keras.models <span class="hljs-keyword">import</span> Sequential
<span class="hljs-keyword">from</span> keras.layers <span class="hljs-keyword">import</span> Dense, Dropout, Input
<span class="hljs-keyword">from</span> keras.optimizers <span class="hljs-keyword">import</span> Adam
<span class="hljs-keyword">from</span> keras.callbacks <span class="hljs-keyword">import</span> EarlyStopping
<span class="hljs-keyword">from</span> sklearn.utils <span class="hljs-keyword">import</span> class_weight


initial_bias = np.log([np.sum(y_train == <span class="hljs-number">1</span>) / np.sum(y_train == <span class="hljs-number">0</span>)])
model_keras_adam = Sequential([
    Input(shape=(X_train_processed.shape[<span class="hljs-number">1</span>],)), 
    Dense(<span class="hljs-number">30</span>, activation=<span class="hljs-string">'relu'</span>)),
    Dropout(<span class="hljs-number">0.1</span>),
    Dense(<span class="hljs-number">30</span>, activation=<span class="hljs-string">'relu'</span>),
    Dropout(<span class="hljs-number">0.1</span>),
    Dense(<span class="hljs-number">1</span>, activation=<span class="hljs-string">'sigmoid'</span>, 
          bias_initializer=tf.keras.initializers.Constant(initial_bias))
])


optimizer_keras = Adam(learning_rate=<span class="hljs-number">0.001</span>)
model_keras_adam.compile(
    optimizer=optimizer_keras, 
    loss=<span class="hljs-string">'binary_crossentropy'</span>, 
    metrics=[
        <span class="hljs-string">'accuracy'</span>,
        tf.keras.metrics.Precision(name=<span class="hljs-string">'precision'</span>),
        tf.keras.metrics.Recall(name=<span class="hljs-string">'recall'</span>),
        tf.keras.metrics.AUC(name=<span class="hljs-string">'auc'</span>) 
    ]
)

early_stopping_callback = EarlyStopping(
    monitor=<span class="hljs-string">'val_recall'</span>,
    mode=<span class="hljs-string">'max'</span>,
    patience=<span class="hljs-number">50</span>,
    min_delta=<span class="hljs-number">1e-4</span>,
    verbose=<span class="hljs-number">0</span>
)

class_weights = class_weight.compute_class_weight(
    class_weight=<span class="hljs-string">'balanced'</span>,
    classes=np.unique(y_train),
    y=y_train
)
class_weights_dict = dict(zip(np.unique(y_train), class_weights))

model_keras_adam.fit(
    X_train_processed, y_train,
    epochs=<span class="hljs-number">1000</span>,
    batch_size=<span class="hljs-number">32</span>,
    validation_data=(X_val_processed, y_val),
    callbacks=[early_stopping_callback],
    class_weight=class_weights_dict,
    verbose=<span class="hljs-number">0</span>
)


loss_train, accuracy_train, precision_train, recall_train, auc_train = model_keras_adam.evaluate(X_train_processed, y_train, verbose=<span class="hljs-number">0</span>)
print(<span class="hljs-string">f"\n--- Keras Model Accuracy (Train) ---"</span>)
print(<span class="hljs-string">f"Loss: <span class="hljs-subst">{loss_train:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Accuracy: <span class="hljs-subst">{accuracy_train:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Precision: <span class="hljs-subst">{precision_train:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Recall: <span class="hljs-subst">{recall_train:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"AUC: <span class="hljs-subst">{auc_train:<span class="hljs-number">.4</span>f}</span>"</span>)


loss_val, accuracy_val, precision_val, recall_val, auc_val = model_keras_adam.evaluate(X_val_processed, y_val, verbose=<span class="hljs-number">0</span>)
print(<span class="hljs-string">f"\n--- Keras Model Accuracy (Validation) ---"</span>)
print(<span class="hljs-string">f"Loss: <span class="hljs-subst">{loss_val:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Accuracy: <span class="hljs-subst">{accuracy_val:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Precision: <span class="hljs-subst">{precision_val:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Recall: <span class="hljs-subst">{recall_val:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"AUC: <span class="hljs-subst">{auc_val:<span class="hljs-number">.4</span>f}</span>"</span>)


model_keras_adam.summary()
</code></pre>
<h3 id="heading-results-7">Results</h3>
<ul>
<li><p><em>Recall: 0.7995–0.7500 (from training to validation)</em></p>
</li>
<li><p><em>Precision: 0.8409–0.8065 (from training to validation)</em></p>
</li>
</ul>
<p>The model exhibits good performance, with Recall slightly decreasing from 0.7995 (training) to 0.7500 (validation), and Precision similarly dropping from 0.8409 (training) to 0.8065 (validation).</p>
<p>This indicates good generalization, with only minor performance degradation on unseen data.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748441767800/fe43f181-4323-461f-b56a-125fc78e9c84.png" alt="Image: Keras Sequential Model with Adam Optimizer (Source: Kuriko Iwai)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-final-results-generalization">Final Results: Generalization</h2>
<p>Finally, we’ll evaluate the model’s ultimate performance on the test dataset, which has remained completely separate from all prior training and validation processes.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Custom classifiers</span>
y_pred_test_custom_sgd = mlp_sgd.fit(X_train_processed, y_train).predict(X_test_processed)
y_pred_test_custom_adam = mlp_adam.fit(X_train_processed, y_train).predict(X_test_processed)

<span class="hljs-comment"># MLPClassifer</span>
y_pred_test_sk_sgd = model_sklearn_mlp_sgd.fit(X_train_processed, y_train).predict(X_test_processed)
y_pred_test_sk_adam = model_sklearn_mlp_adam.fit(X_train_processed, y_train).predict(X_test_processed)

<span class="hljs-comment"># Keras Sequential</span>
_, accuracy_val_sgd, precision_val_sgd, recall_val_sgd, auc_val_sgd = model_keras_sgd.evaluate(X_test_processed, y_test, verbose=<span class="hljs-number">0</span>)
_, accuracy_val_adam, precision_val_adam, recall_val_adam, auc_val_adam = model_keras_adam.evaluate(X_test_processed, y_test, verbose=<span class="hljs-number">0</span>)
</code></pre>
<p>Overall, the Keras Sequential model, optimized with SGD, achieved the best performance with an <strong>AUPRC (Area Under Precision-Recall Curve) of 0.72.</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748874699534/f0f008c4-9067-4e2a-b070-4bb5cbae8f23.png" alt="Precision-Recall Curves for Six Classifier Models (Comparing Custom, MLP, and Keras Sequential Classifiers with SGD and Adam Optimizers (Source: Kuriko Iwai)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this exploration, we experimented with custom classifiers, Scikit-learn models, and Keras deep learning architectures.</p>
<p>Our findings underscore that effective machine learning hinges on three critical factors:</p>
<ol>
<li><p><strong>robust data preprocessing</strong> (tailored to objectives and data distribution),</p>
</li>
<li><p><strong>judicious model selection</strong>, and</p>
</li>
<li><p><strong>strategic framework or library choices</strong>.</p>
</li>
</ol>
<h3 id="heading-choosing-the-right-framework"><strong>Choosing the right framework</strong></h3>
<p>Generally speaking, choose <code>MLPClassifier</code> when:</p>
<ul>
<li><p>You’re primarily working with <strong>tabular data,</strong></p>
</li>
<li><p>You want to prioritize <strong>simplicity, quick iteration, and seamless integration,</strong></p>
</li>
<li><p>You have simple, shallow architectures, and</p>
</li>
<li><p>You have a moderate dataset size (manageable on a CPU).</p>
</li>
</ul>
<p>Choose Keras <code>Sequential</code> when:</p>
<ul>
<li><p>You’re dealing with <strong>image, text, audio, or other sequential data,</strong></p>
</li>
<li><p>You’re building <strong>deep learning models</strong> such as CNNs, RNNs, LSTMs,</p>
</li>
<li><p>You need <strong>fine-grained control</strong> over the model architecture, training process, or custom components,</p>
</li>
<li><p>You need to leverage <strong>GPU acceleration</strong>,</p>
</li>
<li><p>You’re planning for <strong>production deployment</strong>, and</p>
</li>
<li><p>You want to experiment with more advanced deep learning techniques.</p>
</li>
</ul>
<h3 id="heading-limitation-of-mlps">Limitation of MLPs</h3>
<p>While Multilayer Perceptrons (MLPs) proved valuable, their susceptibility to computational complexity and overfitting emerged as key challenges.</p>
<p>Looking ahead, we’ll delve into how Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) offer powerful solutions to these inherent MLP limitations.</p>
<p>You can find more info about me on my <a target="_blank" href="https://kuriko.vercel.app/">Portfolio</a> / <a target="_blank" href="https://www.linkedin.com/in/k-i-i">LinkedIn</a> / <a target="_blank" href="https://github.com/versionhq/multi-agent-system">Github</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How AI Models Think: The Key Role of Activation Functions with Code Examples ]]>
                </title>
                <description>
                    <![CDATA[ In Artificial Intelligence, Machine Learning is the foundation of most revolutionary AI applications. From language processing to image recognition, Machine Learning is everywhere. Machine Learning relies on algorithms, statistical models, and neural... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/activation-functions-in-neural-networks/</link>
                <guid isPermaLink="false">66ba5319ba2ef92905bfa81b</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Wed, 10 Apr 2024 15:44:31 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/07/abigail-keenan-8-s5QuUBtyM-unsplash.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In Artificial Intelligence, Machine Learning is the foundation of most revolutionary AI applications. From language processing to image recognition, Machine Learning is everywhere.</p>
<p>Machine Learning relies on algorithms, statistical models, and neural networks. And Deep Learning is the subfield of Machine Learning focused only on neural networks.</p>
<p>A key piece of any neural network are activation functions. But understanding exactly why they are essential to any neural network system is a common question, and it can be a difficult one to answer.</p>
<p>This tutorial focuses on explaining, in a simple manner with analogies, why exactly activation functions are necessary.</p>
<p>By understanding this, you will understand the process of how AI models think.</p>
<p>Before that, we will explore neural networks in AI. We will also explore the most commonly used activation functions.</p>
<p>We're also going to analyze every line of a very simple PyTorch code example of a neural network.</p>
<h3 id="heading-in-this-article-we-will-explore">In this article, we will explore:</h3>
<ul>
<li><a class="post-section-overview" href="#artificial">Artificial Intelligence and the Rise of Deep Learning</a></li>
<li><a class="post-section-overview" href="#heading-understanding-activation-functions-simplifying-neural-network-mechanics">Understanding Activation Functions: Simplifying Neural Network Mechanics</a></li>
<li><a class="post-section-overview" href="#heading-simple-analogy-why-activation-functions-are-necessary">Simple Analogy: The Necessity of Activation Functions</a></li>
<li><a class="post-section-overview" href="#heading-what-happens-without-activation-functions">What Happens Without Activation Functions?</a></li>
<li><a class="post-section-overview" href="#heading-pytorch-activation-function-code-example">PyTorch Activation Function Code Example</a> </li>
<li><a class="post-section-overview" href="#heading-conclusion-the-unsung-heroes-of-ai-neural-networks">Conclusion: The Unsung Heroes of AI Neural Networks</a></li>
</ul>
<p>This article won't cover dropout or other regularization techniques, hyperparameter optimization, complex architectures like CNNs, or detailed differences in gradient descent variants.</p>
<p>I just want to showcase <strong>why activation functions are needed</strong> and what happens when they are not applied to neural networks.</p>
<p>If you don't know much about deep learning, I personally recommend this Deep Learning crash course on freeCodeCamp's YouTube channel:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/VyWAvY2CF9c" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<h2 id="Artificial">Artificial Intelligence and the Rise of Deep Learning</h2>

<h3 id="heading-what-is-deep-learning-in-artificial-intelligence">What is Deep Learning in Artificial Intelligence?</h3>
<p>Deep learning is a subfield of artificial intelligence. It uses neural networks to process complex patterns, just like the strategies a sports team uses to win a match.</p>
<p>The bigger the neural network, the more capable it is of doing awesome things – like ChatGPT, for example, which uses natural language processing to answer questions and interact with users.</p>
<p>To truly understand the basics of neural networks – what every single AI model has in common that enables it to work – we need to understand activation layers.</p>
<h3 id="heading-deep-learning-training-neural-networks">Deep Learning = Training Neural Networks</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/4-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Simple neural network</em></p>
<p>At the core of deep learning is the training of neural networks.</p>
<p>That means basically using data to get the right values of the weights to be able to predict what we want.</p>
<p>Neural networks are made of neurons organized in layers. Each layer extracts unique features from the data.</p>
<p>This layered structure allows deep learning models to analyze and interpret complex data.</p>
<h2 id="activation_functions_explanation">Understanding Activation Functions: Simplifying Neural Network Mechanics</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/aaaaaaaaaaaaaaaaaaa.png" alt="Image" width="600" height="400" loading="lazy">
<em>Leaky reLU activation function</em></p>
<p>Activation functions help neural networks handle complex data. They change the neuron value based on the data they receive.</p>
<p>It is almost like a filter every neuron has before sending its value to the next neuron.</p>
<p>Essentially, activation functions control the information flow of neural networks – they decide which data is relevant and which is not.</p>
<p>This helps prevent the vanishing gradients to ensure the network learns properly.</p>
<p>The vanishing gradients problem happens when the neural network's learning signals are too weak to make the weight values change. This makes learning from data very difficult.</p>
<h2 id="simple">Simple Analogy: Why Activation Functions are Necessary</h2>

<p>In a soccer game, players decide whether to pass, dribble, or shoot the ball.</p>
<p>These decisions are based on the current game situation, just as neurons in a neural network process data.</p>
<p>In this case, activation functions act like this in the decision-making process.</p>
<p>Without them, neurons would pass data <strong>without any selective analysis</strong> – like players <strong>mindlessly kicking the ball</strong> regardless of the game context.</p>
<p>In this way, activation functions introduce complexity into a neural network, allowing it to learn complex patterns.</p>
<h2 id="what">What Happens Without Activation Functions?</h2>

<p>To understand what would happen without activation functions, let's first think about what happens if players mindlessly kick the ball in a soccer match.</p>
<p>They'd likely lose the match because there would be no decision-making processes as a team. That ball still goes somewhere – but most of the time it will not go where it's intended.</p>
<p>This is similar to what happens in a neural network without activation functions: the neural network doesn't make good predictions because the neurons were just passing data to each other randomly.</p>
<p>We still get a prediction. Just not what we wanted, or what's helpful.</p>
<p>This dramatically limits the capability – of both the soccer team and the neural network.</p>
<h3 id="heading-intuitive-explanation-of-activation-functions">Intuitive Explanation of Activation Functions</h3>
<p>Let's now look at an example so you can understand this intuitively.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/7-3.png" alt="Image" width="600" height="400" loading="lazy">
<em>reLU activation function</em></p>
<p>Let's start with the most widely used activation function in deep learning (it's also one of the simpler ones).</p>
<p>This is an ReLU activation function. It basically acts as a filter before a neuron sends a value to its next neuron.</p>
<p>This filter is essentially two conditions:</p>
<ul>
<li>If the value of the weight is negative, it becomes 0</li>
<li>If the value of the weight is positive, it does not change anything</li>
</ul>
<p>With this, we are adding a decision-making process to each neuron. It decides which data to send and which not to send.</p>
<p>Now let's look at some examples of other activation functions.</p>
<h3 id="heading-sigmoid-activation-functions">Sigmoid Activation Functions</h3>
<p>This activation function converts the input value between 0 and 1. Sigmoids are widely used in binary classification problems in the last neuron.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/9-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Sigmoid activation function</em></p>
<p>There is a problem with sigmoid activation functions, though. Take the output values from a given linear transformation:</p>
<ul>
<li>0.00000003</li>
<li>0.99999992</li>
<li>0.00000247</li>
<li>0.99993320</li>
</ul>
<p>There are some questions about these values we can ask:</p>
<ul>
<li>Are values like 0.00000003 and 0.000002 really important? Can't they be just 0 so that we have fewer things to run on the computer? Remember, in many of today's models, we have millions of weights in them. Can't millions of 0.00000003 and 0.000002 be 0?</li>
<li>And if it is a positive value, how will it distinguish a <strong>big value</strong> from a <strong>very big value</strong>? For example, in 0.99993320 and 0.99999992, where are the input values like <em>7 and 13</em> or <em>7 and 55</em>? 0.99993320 and 0.99999992 do not <strong>accurately</strong> describe their input values.</li>
</ul>
<p>How can we distinguish the subtle differences in outputs so that accuracy is maintained?</p>
<p>This is what the ReLU activation functions solved: setting negative numbers to zero while keeping positive ones boosts neural network computational efficiency.</p>
<h3 id="heading-tanh-hyperbolic-tangent-activation-functions">Tanh (Hyperbolic Tangent) Activation Functions</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/10-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>tanh activation function</em></p>
<p>These activation functions output values between -1 and 1, similar to Sigmoid.</p>
<p>They're often used in <a target="_blank" href="https://www.freecodecamp.org/news/the-ultimate-guide-to-recurrent-neural-networks-in-python/">recurrent neural networks (RNNs) and long short-term memory networks (LSTMs).</a></p>
<p>Tanh is also used because it is zero-centered. This means that the mean of the output values is around zero. This property helps when dealing with the vanishing gradient problem.</p>
<h3 id="heading-leaky-relu">Leaky reLU</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/11-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Leaky reLU activation function</em></p>
<p>Instead of <strong>ignoring</strong> the negative values, Leaky ReLU activation functions are going to have a small negative value.</p>
<p>This way, negative values are also used when training neural networks.</p>
<p>With the ReLU activation function, neurons with negative values are inactive and do not contribute to the learning process.</p>
<p>With the Leaky ReLU activation function, neurons with negative values are active and contribute to the learning process.</p>
<p>This decision-making process is implemented by activations function. Without it, it would simply send the data to the next neuron (just like a player mindlessly kicking the ball).</p>
<h3 id="heading-mathematical-explanation-of-activation-functions">Mathematical Explanation of Activation Functions</h3>
<p>Neurons do two things:</p>
<ul>
<li>They use linear transformations with the previous neurons weights values</li>
<li>They use activation functions to filter certain values to selectively pass on values.</li>
</ul>
<p>Without activation functions, the neural network just does one thing: <strong>Linear transformations.</strong></p>
<p>If it <strong>only</strong> does linear transformations, it is a <strong>linear system</strong>.</p>
<p>If it is a linear system, in very simple terms without being too technical, the <a target="_blank" href="https://www.allaboutcircuits.com/textbook/direct-current/chpt-10/superposition-theorem/">superposition theorem</a> tells us that any mixture of two or more linear transformations can be simplified into one single transformation.</p>
<p>Essentially, it means that, without activation functions, this complex neural network:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/12-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Long neural network without activation functions</em></p>
<p>Is the same as this simple one:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/13-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Short neural network without activation functions</em></p>
<p>This is because each layer in its matrix form is a product of linear transformations of previous layers.</p>
<p>And according to the theorem, since any mixture of two or more linear transformations can be simplified in one single transformation, then any mixture of hidden layers (that is, layers between the inputs and outputs of neurons) in a neural network can be simplified into only one layer.</p>
<p><strong>What does this all mean?</strong></p>
<p>It means that it can only model data linearly. But in real life with real data, every system is non-linear. So we need activation functions.</p>
<p>We introduce non-linearity into a neural network so that it learns non-linear patterns.</p>
<h2 id="pytorch">PyTorch Activation Function Code Example </h2>

<p>In this section, we are going to train the neural network below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/nn-1.svg" alt="Image" width="600" height="400" loading="lazy">
<em>Simple feed forward neural network</em></p>
<p>This is a simple neural network AI model with four layers:</p>
<ul>
<li>Input layer with 10 neurons</li>
<li>Two hidden layers with 18 neurons</li>
<li>One hidden layer with 18 neurons</li>
<li>One output layer with 1 neuron</li>
</ul>
<p>In the code, we can choose any of the four activation functions mentioned in this tutorial. </p>
<p>Here it is the full code – we'll discuss in detail below:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">import</span> torch.optim <span class="hljs-keyword">as</span> optim

<span class="hljs-comment">#Choose which activation function to use in code</span>
defined_activation_function = <span class="hljs-string">'relu'</span>

activation_functions = {
    <span class="hljs-string">'relu'</span>: nn.ReLU(),
    <span class="hljs-string">'sigmoid'</span>: nn.Sigmoid(),
    <span class="hljs-string">'tanh'</span>: nn.Tanh(),
    <span class="hljs-string">'leaky_relu'</span>: nn.LeakyReLU()
}

<span class="hljs-comment"># Initializing hyperparameters</span>
num_samples = <span class="hljs-number">100</span>
batch_size = <span class="hljs-number">10</span>
num_epochs = <span class="hljs-number">150</span>
learning_rate = <span class="hljs-number">0.001</span>

<span class="hljs-comment"># Define a simple synthetic dataset</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate_data</span>(<span class="hljs-params">num_samples</span>):</span>
    X = torch.randn(num_samples, <span class="hljs-number">10</span>)
    y = torch.randn(num_samples, <span class="hljs-number">1</span>)
    <span class="hljs-keyword">return</span> X, y

<span class="hljs-comment"># Generate synthetic data</span>
X, y = generate_data(num_samples)

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">SimpleModel</span>(<span class="hljs-params">nn.Module</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, activation=defined_activation_function</span>):</span>
        super(SimpleModel, self).__init__()
        self.fc1 = nn.Linear(in_features=<span class="hljs-number">10</span>, out_features=<span class="hljs-number">18</span>)
        self.fc2 = nn.Linear(in_features=<span class="hljs-number">18</span>, out_features=<span class="hljs-number">18</span>)
        self.fc3 = nn.Linear(in_features=<span class="hljs-number">18</span>, out_features=<span class="hljs-number">4</span>)
        self.fc4 = nn.Linear(in_features=<span class="hljs-number">4</span>, out_features=<span class="hljs-number">1</span>)
        self.activation = activation_functions[activation]

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self, x</span>):</span>
        x = self.fc1(x)
        x = self.activation(x)
        x = self.fc2(x) 
        x = self.activation(x)
        x = self.fc3(x) 
        x = self.activation(x)  
        x = self.fc4(x) 
        <span class="hljs-keyword">return</span> x

<span class="hljs-comment"># Initialize the model, define loss function and optimizer</span>
model = SimpleModel(activation=defined_activation_function)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

<span class="hljs-comment"># Training loop</span>
<span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(num_epochs):
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, num_samples, batch_size):
        <span class="hljs-comment"># Get the mini-batch</span>
        inputs = X[i:i+batch_size]
        labels = y[i:i+batch_size]

        <span class="hljs-comment"># Zero the parameter gradients</span>
        optimizer.zero_grad()

        <span class="hljs-comment"># Forward pass</span>
        outputs = model(inputs)

        <span class="hljs-comment"># Compute the loss</span>
        loss = criterion(outputs, labels)

        <span class="hljs-comment"># Backward pass and optimize</span>
        loss.backward()
        optimizer.step()

    print(<span class="hljs-string">f'Epoch <span class="hljs-subst">{epoch+<span class="hljs-number">1</span>}</span>/<span class="hljs-subst">{num_epochs}</span>, Loss: <span class="hljs-subst">{loss}</span>'</span>)

print(<span class="hljs-string">"Training complete."</span>)
</code></pre>
<p>Looks like a lot, doesn't it? Don't worry – we'll take it piece by piece.</p>
<h3 id="heading-1-importing-libraries-and-defining-activation-functions">1: Importing libraries and defining activation functions</h3>
<pre><code><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">import</span> torch.optim <span class="hljs-keyword">as</span> optim

#Choose which activation <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">to</span> <span class="hljs-title">use</span> <span class="hljs-title">in</span> <span class="hljs-title">code</span>
<span class="hljs-title">defined_activation_function</span> = '<span class="hljs-title">relu</span>'

<span class="hljs-title">activation_functions</span> = </span>{
    <span class="hljs-string">'relu'</span>: nn.ReLU(),
    <span class="hljs-string">'sigmoid'</span>: nn.Sigmoid(),
    <span class="hljs-string">'tanh'</span>: nn.Tanh(),
    <span class="hljs-string">'leaky_relu'</span>: nn.LeakyReLU()
}
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Importing libraries and defining dictionary with activation functions</em></p>
<p>In this code:</p>
<ul>
<li><code>import torch</code>: <a target="_blank" href="https://pytorch.org/docs/stable/torch.html">Imports the PyTorch library.</a></li>
<li><code>import torch.nn as nn</code>: <a target="_blank" href="https://pytorch.org/docs/stable/nn.html">Imports the neural network module from PyTorch.</a></li>
<li><code>import torch.optim as optim</code>: <a target="_blank" href="https://pytorch.org/docs/stable/optim.html">Imports the optimization module from PyTorch.</a></li>
</ul>
<p>The variable and the dictionary above help you easily define, for this deep learning model, the activation function you want to use.</p>
<h3 id="heading-2-defining-hyperparameters-and-generating-a-dataset">2: Defining hyperparameters and generating a dataset</h3>
<pre><code># Initializing hyperparameters
num_samples = <span class="hljs-number">100</span>
batch_size = <span class="hljs-number">10</span>
num_epochs = <span class="hljs-number">150</span>
learning_rate = <span class="hljs-number">0.001</span>

# Define a simple synthetic dataset
def generate_data(num_samples):
    X = torch.randn(num_samples, <span class="hljs-number">10</span>)
    y = torch.randn(num_samples, <span class="hljs-number">1</span>)
    <span class="hljs-keyword">return</span> X, y

# Generate synthetic data
X, y = generate_data(num_samples)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Initializing hyperparameters and creating, with a function, a synthetic dataset</em></p>
<p>In this code:</p>
<ul>
<li><code>num_samples</code> is the number of samples in the synthetic dataset.</li>
<li><code>batch_size</code> is the size of each mini-batch during training.</li>
<li><code>num_epochs</code> is the number of iterations over the entire dataset during training.</li>
<li><code>learning_rate</code> is the learning rate used by the optimization algorithm.</li>
</ul>
<p>Besides, we define a <code>generate_data</code> function to create two tensors with random values. Then it calls the function and it generates, for X and y, two tensors with random values.</p>
<h3 id="heading-3-creating-the-deep-learning-model">3: Creating the deep learning model</h3>
<pre><code><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">SimpleModel</span>(<span class="hljs-title">nn</span>.<span class="hljs-title">Module</span>):
    <span class="hljs-title">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-title">self</span>, <span class="hljs-title">activation</span></span>=defined_activation_function):
        <span class="hljs-built_in">super</span>(SimpleModel, self).__init__()
        self.fc1 = nn.Linear(in_features=<span class="hljs-number">10</span>, out_features=<span class="hljs-number">18</span>)
        self.fc2 = nn.Linear(in_features=<span class="hljs-number">18</span>, out_features=<span class="hljs-number">18</span>)
        self.fc3 = nn.Linear(in_features=<span class="hljs-number">18</span>, out_features=<span class="hljs-number">4</span>)
        self.fc4 = nn.Linear(in_features=<span class="hljs-number">4</span>, out_features=<span class="hljs-number">1</span>)
        self.activation = activation_functions[activation]

    def forward(self, x):
        x = self.fc1(x)
        x = self.activation(x)
        x = self.fc2(x) 
        x = self.activation(x)
        x = self.fc3(x) 
        x = self.activation(x)  
        x = self.fc4(x) 
        <span class="hljs-keyword">return</span> x
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/3.png" alt="Image" width="600" height="400" loading="lazy">
<em>A simple feed forward neural network deep learning AI model</em></p>
<p>The <code>__init__</code> method in the <code>SimpleModel</code> class <strong>initializes</strong> the neural network architecture. It initializes four fully connected layers and defines the activation function we are going to use.</p>
<p><a target="_blank" href="https://pytorch.org/docs/stable/generated/torch.nn.Linear.html">We create each layer using</a> <code>nn.Linear</code>, while the <code>forward</code> method defines how the data flows through the neural network.</p>
<h3 id="heading-4-initializing-the-model-and-defining-the-loss-function-and-optimizer">4: Initializing the model and defining the loss function and optimizer</h3>
<pre><code>model = SimpleModel(activation=defined_activation_function)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/4.png" alt="Image" width="600" height="400" loading="lazy">
<em>Defining activation function, loss function and gradient descend variation to be used</em></p>
<p>In this code:</p>
<ol>
<li><code>model = SimpleModel(activation=defined_activation_function)</code> creates a neural network model with a specified activation function.</li>
<li><code>criterion = nn.MSELoss()</code> defines the <a target="_blank" href="https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html">Mean Squared Error (MSE) Loss function</a>.</li>
<li><code>optimizer = optim.Adam(model.parameters(), lr=learning_rate)</code> sets up the <a target="_blank" href="https://pytorch.org/docs/stable/generated/torch.optim.Adam.html">Adam optimizer</a> for updating the model parameters during training, with a specified learning rate.</li>
</ol>
<h3 id="heading-5-training-the-deep-learning-model">5: Training the deep learning model</h3>
<pre><code><span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(num_epochs):
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, num_samples, batch_size):
        # Get the mini-batch
        inputs = X[i:i+batch_size]
        labels = y[i:i+batch_size]

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)

        # Compute the loss
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        loss.backward()
        optimizer.step()

    print(f<span class="hljs-string">'Epoch {epoch+1}/{num_epochs}, Loss: {loss}'</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/5.png" alt="Image" width="600" height="400" loading="lazy">
<em>Training the model</em></p>
<ul>
<li>The outer loop, based on <code>num_epochs</code> (number of epochs) controls how many times the entire dataset is processed.</li>
<li>The inner loop divides the dataset in mini-batches using the range function.</li>
</ul>
<p>In each mini loop:</p>
<ol>
<li>With inputs and labels, we get the data from the mini-batch we want to process</li>
<li>We <a target="_blank" href="https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html">eliminate with <code>optimizer.zero_grad()</code>, the gradients</a> – variables that tell us how to adjust weights for accurate predictions – of the previous mini-batch iteration. This is important to prevent mixing gradient information between mini-batches.</li>
<li>The forward pass gets us the model predictions (<code>outputs</code>), and the loss is calculated using the specified loss function (<code>criterion</code>). </li>
<li>With <code>loss.backward()</code>, we <a target="_blank" href="https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html">calculate the gradients for the weights</a>. </li>
<li>Finally, <code>optimizer.step()</code> <a target="_blank" href="https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.step.html">updates the model's weights</a> based on those gradients to minimize the loss function.</li>
</ol>
<p>This is the full code to train a very simple deep learning model on a very simple dataset.</p>
<p>It does not have anything more advanced like convolutional neural networks.</p>
<h2 id="conclusion">Conclusion: The Unsung Heroes of AI Neural Networks</h2>

<p>Activation functions are like gatekeepers. By restricting the flow of information, the neural network can learn better.</p>
<p>Activation functions are just like people when they study, or soccer players when deciding what to do with a ball.</p>
<p>These functions give neural networks the ability to learn and predict correctly.</p>
<p>Mathematically, activation functions are what allow the correct approximation of any linear or non-linear function in neural networks. Without them, neural networks approximate only linear functions.</p>
<p>And I leave you with this:</p>
<p>The mathematical idea of a neural network being able to approximate any non linear function is called the <a target="_blank" href="https://towardsai.net/p/deep-learning/understanding-the-universal-approximation-theorem">Universal Approximation Theorem‌‌</a>.</p>
<p>You can find the full code on GitHub here:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code</a></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn Machine Learning and Neural Networks without Frameworks ]]>
                </title>
                <description>
                    <![CDATA[ A lot of machine learning courses relay on frameworks that abstract the inner workings of what's going on. If you want to become proficient, it's helpful to understand how things work under the hood. We just published a course on the freeCodeCamp.org... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-machine-learning-and-neural-networks-without-frameworks/</link>
                <guid isPermaLink="false">66b204b9f31aa965000e5868</guid>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Wed, 30 Aug 2023 16:22:01 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/08/mlscratch.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>A lot of machine learning courses relay on frameworks that abstract the inner workings of what's going on. If you want to become proficient, it's helpful to understand how things work under the hood.</p>
<p>We just published a course on the freeCodeCamp.org YouTube channel that will teach you how to use machine learning and neural networks without any frameworks.</p>
<p>Dr. Radu Mariescu-Istodor created this course. He has a PhD in computer science and is creates engaging and creative software tutorials.</p>
<p>In a world where machine learning is becoming increasingly prevalent, understanding the underlying concepts and algorithms has never been more crucial. The course doesn't rely on pre-existing libraries. Instead, it takes you through building a machine learning system from scratch.</p>
<p>The course centers around a captivating project: creating a web app that learns to recognize drawings. This is Phase 2 of the course, where the focus shifts towards enhancing the accuracy of the methods developed in Phase 1 (but you can still follow along if you did not watch Phase 1). </p>
<p>Through a series of comprehensive sections, you'll explore advanced features, classification methods, data cleaning techniques, and a wide array of fundamental concepts that are essential to machine learning.</p>
<p>The course is structured to gradually build your understanding from the ground up. Here are the topic sections of this course:</p>
<ul>
<li>Introduction</li>
<li>Phase 1 Code Review</li>
<li>Data Cleaning</li>
<li>Confusion Matrix</li>
<li>Euclidean Distance Marker</li>
<li>Measuring the Elongation</li>
<li>Measuring the Roundness</li>
<li>Vector vs Raster (Pixels)</li>
<li>Neural Networks</li>
<li>Optimizing Neural Networks</li>
<li>Deep Neural Networks</li>
</ul>
<p>By unveiling the inner workings of machine learning systems, you'll not only develop a profound understanding of the subject but also hone your software development skills. </p>
<p>Watch the full course on <a target="_blank" href="https://youtu.be/3wwiOSxDAmg">the freeCodeCamp.org YouTube channel</a> (4-hour watch).</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/3wwiOSxDAmg" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Brain-Inspired Approach to AI – Explained for Developers ]]>
                </title>
                <description>
                    <![CDATA[ By Edem Gold "Our intelligence is what makes us human, and AI is an extension of that quality." – Yan LeCun Since the advent of Neural Networks (also known as artificial neural networks), the AI industry has enjoyed unparalleled success. Neural Net... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-brain-inspired-approach-to-ai/</link>
                <guid isPermaLink="false">66d84fc963d2055c664a1a65</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Mon, 08 May 2023 15:43:38 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/05/pexels-tara-winstead-8386365.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Edem Gold</p>
<blockquote>
<p>"Our intelligence is what makes us human, and AI is an extension of that quality." – Yan LeCun</p>
</blockquote>
<p>Since the advent of Neural Networks (also known as artificial neural networks), the AI industry has enjoyed unparalleled success. Neural Networks are the driving force behind modern AI systems, and they are modeled after the human brain.</p>
<p>Modern AI research involves creating and implementing algorithms that aim to mimic the neural processes of the human brain. Their goal is to create systems that learn and act in ways similar to human beings. </p>
<p>In this article, we will attempt to understand the brain-inspired approach to building AI systems.</p>
<p>Here's what we'll cover:</p>
<ol>
<li><a class="post-section-overview" href="#heading-how-well-approach-this">How we'll approach this</a></li>
<li><a class="post-section-overview" href="#heading-the-history-of-the-brain-inspired-approach-to-ai">The history of the brain-inspired approach to AI</a></li>
<li><a class="post-section-overview" href="#heading-how-the-human-brain-works-and-how-its-related-to-ai-systems">How the human brain works and how it's related to AI systems</a></li>
<li><a class="post-section-overview" href="#heading-core-principles-behind-the-brain-inspired-approach-to-ai">Core principles behind the brain-inspired approach to AI</a></li>
<li><a class="post-section-overview" href="#heading-challenges-in-building-brain-inspired-ai-systems">Challenges in building brain-inspired AI systems</a></li>
<li><a class="post-section-overview" href="#heading-summary">Summary</a></li>
</ol>
<h2 id="heading-how-well-approach-this">How We'll Approach This</h2>
<p>This article will begin by providing background history on how researchers began to model AI to mimic the human brain and end by discussing the challenges currently being faced by researchers in attempting to imitate the human brain. Below is an in-depth description of what to expect from each section.</p>
<p>It is worth noting that while this topic is an inherently broad one, I will try to be as brief and succinct as possible to keep this engaging. I plan to treat sub-topics which have more intricate sub-branches as standalone articles. I'll also leave references at the end of the article.</p>
<p>Here's a brief outline of what we'll cover:</p>
<p><strong>History of the brain-inspired approach to AI:</strong> Here we'll discuss how scientists Norman Weiner and Warren McCulloch brought about the convergence of neuroscience and computer science. We'll also discuss how Frank Rosenblatt's Perceptron was the first real attempt to mimic human intelligence. And we'll learn how its failure brought about ground-breaking work which would serve as the platform for Neural Networks.</p>
<p><strong>How the human brain works and how it relates to AI systems:</strong> In this section, we'll dive into the biological basis for the brain-inspired approach to AI. We will discuss the basic structure and functions of the human brain, understand its core building block, the neuron, and how they work together to process information and enable complex actions.</p>
<p><strong>The Core Principles behind the brain-inspired Approach to AI:</strong> Here we will discuss the fundamental concepts behind the brain-inspired approach to AI. We will explain concepts such as Neural networks, Hierarchical processing, and plasticity work. We'll also learn how the techniques of parallel processing, distributed representations, and recurrent feedback help AI mimicking the brain's functioning.</p>
<p><strong>Challenges in building AI systems modeled after the human brain:</strong> Here we will talk about the challenges and limitations inherent in attempting to build systems that mimic the human brain. Challenges such as the complexity of the brain, and the lack of a unified theory of cognition, and we'll explore the ways these challenges an limitations are being addressed.</p>
<p>Let's begin!</p>
<h2 id="heading-the-history-of-the-brain-inspired-approach-to-ai">The History of the Brain-inspired Approach to AI</h2>
<p>The drive to build machines that are capable of intelligent behaviour owes much inspiration to MIT Professor <a target="_blank" href="https://en.wikipedia.org/wiki/Norbert_Wiener">Norbert Weiner</a>. Norbert Weiner was a child prodigy who could read by the age of three. He had broad knowledge of various fields such as mathematics, neurophysiology, medicine, and physics. </p>
<p>Norbert Weiner believed that the main opportunities in science lay in exploring what he termed as <em>Boundary Regions</em>. These are areas of study that are not clearly within a certain discipline but rather a mixture of disciplines, like the study of medicine and engineering coming together to create the field of Medical Engineering. He was quoted saying:</p>
<blockquote>
<p>"If the difficulty of a physiological problem is mathematical in nature, ten physiologists ignorant of mathematics will get precisely as far as one physiologist ignorant of mathematics."</p>
</blockquote>
<p>In 1934, Weiner and a couple of other academics gathered monthly to discuss papers involving boundary region science. </p>
<p><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fc008a7-d0e0-4d6f-83ed-ab2a320263e0_2048x1251.jpeg" alt="Image" width="600" height="400" loading="lazy">
<em>Norman Weiner</em></p>
<p>He described it as "a perfect catharsis for half-baked ideas, insufficient self-criticism, exaggerated self-confidence and pomposity."</p>
<p>From these sessions and from his own personal research, Weiner learned about new research on biological nervous systems as well as about pioneering work on electronic computers. </p>
<p>His natural inclination was to blend these two fields, so a relationship between neuroscience and computer science was formed. This relationship became the cornerstone for the creation of artificial intelligence as we know it.</p>
<p>After World War II, Wiener began forming theories about intelligence in both humans and machines and this new field was named <a target="_blank" href="https://en.wikipedia.org/wiki/Cybernetics"><em>Cybernetics</em></a>. Wiener’s foray into Cybernetics was successful in getting scientists talking about the possibility of biology fusing with engineering. </p>
<p>One of these scientists was a neurophysiologist named <a target="_blank" href="https://en.wikipedia.org/wiki/Warren_Sturgis_McCulloch">Warren McCulloch</a>. He dropped out of Haverford University and went to Yale to study philosophy and psychology. While attending a scientific conference in New York, he came discovered papers written by colleagues on biological feedback mechanisms. </p>
<p>The following year, in collaboration with his brilliant 18-year-old protégé named Walter Pitts, McCulloch proposed a theory about how the brain works. This theory would help foster the widespread perception that computers and brains function essentially in the same way.</p>
<p>They based their conclusions on research by McCulloch on the possibility of neurons processing Binary Numbers (computers communicate via binary numbers). This theory became the foundation for what became the first model of an artificial neural network, which was named the McCulloch-Pitts Neuron (MCP).</p>
<p>The MCP served as the foundation for the creation of the first-ever neural network, known as <a target="_blank" href="https://edemgold.substack.com/p/the-history-of-ai">the perceptron</a>. The Perceptron was created by Psychologist <a target="_blank" href="https://en.wikipedia.org/wiki/Frank_Rosenblatt">Frank Rosenblatt</a>. Inspired by the synapses in the brain, he decided that since the human brain could process and classify information through synapses (communication between neurons) then perhaps a digital computer could do the same via a neural network. </p>
<p>The Perceptron essentially scaled the MCP neuron from one artificial neuron into a network of neurons. But unfortunately, the perceptron had some technical challenges which limited its practical application. The most notable of these limitations was its inability to perform complex operations (like classifying between more than one item – for example, the perceptron could not perform classification between a cat, a dog, and a bird).</p>
<p>In 1969, a book published by <a target="_blank" href="https://en.wikipedia.org/wiki/Marvin_Minsky">Marvin Minsky</a> and <a target="_blank" href="https://en.wikipedia.org/wiki/Seymour_Papert">Seymour Papert</a> titled <em>Perceptron</em> lay out in detail the flaws of the Perceptron. Because of this, research on Artificial Neural Networks stagnated until the proposal of Back Propagation by <a target="_blank" href="https://en.wikipedia.org/wiki/Paul_Werbos">Paul Werbos</a>.</p>
<p>Back Propagation hopes to solve the issue of classifying complex data that hindered the industrial application of Neural Networks at the time. It was inspired by synaptic plasticity – the way the brain modifies the strengths of connections between neurons and as such improves performance. </p>
<p>Back Propagation was designed to mimic the process in the brain that strengthens connections between neurons via a process called weight adjustment.</p>
<p>Despite the early proposal by Paul Werbos, the concept of back propagation only gained widespread adoption when researchers such as <a target="_blank" href="https://en.wikipedia.org/wiki/David_Rumelhart">David Rumelheart</a>, <a target="_blank" href="https://en.wikipedia.org/wiki/Geoffrey_Hinton">Geoffrey Hinton</a>, and <a target="_blank" href="https://en.wikipedia.org/wiki/Ronald_J._Williams">Ronald Williams</a> published papers that demonstrated its effectiveness for training neural networks. </p>
<p>The implementation of back propagation led to the creation of Deep Learning which powers most of the AI systems available in the world.</p>
<blockquote>
<p>"People are smarter than today's computers because the brain employs a basic computational architecture that is more suited to deal with a central aspect of the natural information processing tasks that people are so good at." – Parallel Distributed Processing</p>
</blockquote>
<h2 id="heading-how-the-human-brain-works-and-how-its-related-to-ai-systems">How the Human Brain Works and How it's Related to AI Systems</h2>
<p><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8360703d-bbe7-4637-ba4a-5e898d5e3110_602x376.png" alt="Image" width="600" height="400" loading="lazy">
<em>Illustration of how the brain's cells process information</em></p>
<p>We have discussed how researchers began to model AI to mimic the human brain. Now let's look at how the brain works and define the relationship between the brain and AI systems.</p>
<h3 id="heading-how-the-brain-works-a-simplified-description">How the brain works – a simplified description</h3>
<p>The human brain essentially processes thoughts via the use of neurons. A neuron is made up of 3 core sections: the dendrite, axon, and the soma. </p>
<p>The dendrite is responsible for receiving signals from other neurons. The soma processes information received from the dendrite, and the axon is responsible for transferring the processed information to the next dendrite in the sequence.</p>
<p>To grasp how the brain processes thought, imagine you see a car coming towards you. Your eyes immediately send electrical signals to your brain through the optical nerve. Then the brain forms a chain of neurons to make sense of the incoming signal. </p>
<p>So the first neuron in the chain collects the signal through its <strong>dendrites</strong> and sends it to the <strong>soma</strong> to process the signal. After the soma finishes with its task, it sends the signal to the <strong>axon</strong> which then sends it to the dendrite of the next neuron in the chain. </p>
<p>The connection between axons and dendrites when passing on information is called a Synapse. So the entire process continues until the brain finds a <strong>Sapiotemporal Synaptic Input</strong> (that's scientific lingo for the brain continues processing until it finds an optimal response to the signal sent to it). Then it sends signals to the necessary effectors, for example your legs, and then the brain sends a signal to your legs to run away from the oncoming car.</p>
<h3 id="heading-the-relationship-between-the-brain-and-ai-systems">The relationship between the brain and AI systems</h3>
<p>The relationship between the brain and AI is largely mutually beneficial. The brain is the main source of inspiration behind the design of AI systems and advances in AI, leading to a better understanding of the brain and how it works.</p>
<p>There is a reciprocal exchange of knowledge and ideas when it comes to the brain and AI. There are several examples that attest to the positively symbiotic nature of this relationship:</p>
<ul>
<li><strong>Neural Networks:</strong> Arguably the most significant impact made by the human brain to the field of Artificial Intelligence is the creation of Neural Networks. In essence, Neural Networks are computational models that mimic the function and structure of biological neurons. The architecture of neural networks and their learning algorithms are largely inspired by the way neurons in the brain interact and adapt.</li>
<li><strong>Brain Simulations:</strong> AI systems have been used to <a target="_blank" href="https://www.frontiersin.org/articles/10.3389/fncom.2020.00016/full">simulate</a> the human brain and study its interactions with the physical world. For example, researchers have Machine Learning techniques to simulate the activity of biological neurons involved in visual processing. The result has provided insight into how the brain handles visual information.</li>
<li><strong>Insights into the brain:</strong> Researchers have begun using Machine Learning Algorithms to analyse and gain insights from brain data, and fMRI scans. These insights serve to identify patterns and relationships which would otherwise have remained hidden. These insights can help us understand internal cognitive functions, memory, and decision-making. They also help in the treatment of brain-native illnesses such as Alzheimer's.</li>
</ul>
<h2 id="heading-core-principles-behind-the-brain-inspired-approach-to-ai">Core Principles Behind the Brain-inspired Approach to AI</h2>
<p>Here we will discuss several concepts which help AI imitate the way the human brain functions. These concepts have helped AI researchers create more powerful and intelligent systems which are capable of performing complex tasks.</p>
<h3 id="heading-neural-networks">Neural Networks</h3>
<p>As discussed earlier, neural networks have arguably derived the most significant inspiration from the human brain and have made the biggest impact on the field of Artificial Intelligence. </p>
<p>In essence, Neural Networks are computational models that mimic the function and structure of biological neurons. The networks are made up of various layers of interconnected nodes, called artificial neurons, which aid in the processing and transmitting of information. This is similar to what is done by dendrites, somas, and axons in biological neural networks. </p>
<p>Neural Networks are architected to learn from past experiences the same way the brain does.</p>
<h3 id="heading-distributed-representations">Distributed Representations</h3>
<p>Distributed representations are simply a way of encoding concepts or ideas in a neural network as a pattern along several nodes in the network in order to form a pattern. </p>
<p>For example, the concept of smoking could be represented (encoded) using a certain set of nodes in a neural network. So if a network comes accross an image of a person smoking, it then uses those selected nodes to make sense of the image (it's a lot more complex than that but for the sake of simplicity we'll leave it at that).</p>
<p>This technique helps AI systems remember complex concepts or relationships between concepts the same way the brain recognizes and remembers complex stimuli.</p>
<h3 id="heading-recurrent-feedback">Recurrent Feedback</h3>
<p>This is a technique used in training AI models where the output of a neural network is returned as input to allow the network to integrate its output as extra data input in training. This is similar to how the brain makes use of feedback loops in order to adjust its model based on previous experiences.</p>
<h3 id="heading-parallel-processing">Parallel Processing</h3>
<p>Parallel processing involves breaking up complex computational tasks into smaller bits in an effort to process the smaller bits on another processor in an attempt to improve speed. This approach enables AI systems to process more input data faster, similar to how the brain is able to perform different tasks at the same time (multi-tasking).</p>
<h3 id="heading-attention-mechanisms">Attention Mechanisms</h3>
<p>This is a technique used which enables AI models to focus on specific parts of input data. It is commonly employed in sectors such as Natural Language Processing which contains complex and cumbersome data. </p>
<p>It is inspired by the brain's ability to attend to only specific parts of a largely distracting environment – like your ability to tune into and interact in one conversation out of a cacophony of conversations.</p>
<h3 id="heading-reinforcement-learning">Reinforcement Learning</h3>
<p>Reinforcement Learning is a technique used to train AI systems. It was inspired by how human beings learn skills through trial and error. It involves an AI agent receiving rewards or punishments based on its actions. This enables the agent to learn from its mistakes and be more efficient in its future actions (this technique is usually used in the creation of games).</p>
<h3 id="heading-unsupervised-learning">Unsupervised Learning</h3>
<p>The brain is constantly receiving new streams of data in the form of sounds, visual content, sensory feelings to the skin, and so on. It has to make sense of it all and attempt to form a coherent and logical understanding of how all these seemingly disparate events affect its physical state.</p>
<p>Take this analogy as an example: you feel water drop on your skin, you hear the sound of water droplets dropping quickly on rooftops, you feel your clothes getting heavy and in that instant, you know rain is falling. </p>
<p>You then search your memory bank to ascertain if you carried an umbrella. If you did, you are fine, otherwise you check to see the distance from your current location to your home. If it is close, you are fine, but otherwise you try to gauge how intense the rain is going to become. If it is a light drizzle you can attempt to continue the journey back to your home, but if it is becoming a heavier shower, then you have to find shelter.</p>
<p>The ability to make sense of seemingly disparate data points (water, sound, feeling, distance) is implemented in Artificial intelligence in the form of a technique called Unsupervised Learning. It is an AI training technique where AI systems are taught to make sense of raw, unstructured data without explicit labelling ( no one tells you rain is falling when it is falling, do they?).</p>
<h2 id="heading-challenges-in-building-brain-inspired-ai-systems">Challenges in Building Brain-Inspired AI Systems</h2>
<p>So far, you've learned how researchers used the brain as inspiration for AI systems. We've also discussed how the brain relates to AI and the core principles behind brain-inspired AI. </p>
<p>In this section, we are going to talk about some of the technical and conceptual challenges inherent in building AI systems modeled after the human brain.</p>
<h3 id="heading-complexity">Complexity</h3>
<p>This is a pretty daunting challenge. The brain-inspired approach to AI is based on modeling the brain and building AI systems after that model. But the human brain is an inherently complex system with 100 billion neurons and approximately 600 trillion synaptic connections (each neuron has, on average, 10,000 synaptic connections with other neurons). These synapses are constantly interacting in dynamic and unpredictable ways. </p>
<p>Building AI systems that are aimed to mimic, and perhaps exceed, that complexity is in itself a challenge and requires equally complex statistical models.</p>
<h3 id="heading-data-requirements-for-training-large-models">Data Requirements for Training Large Models</h3>
<p>Open AI's GPT 4, which is, at the moment, the cutting edge of text-based AI models, requires 47 GigaBytes of data. In comparison, its predecessor GPT3 was trained on 17 Gigabytes of data, which is approximately 3 orders of magnitude lower. Imagine how much GPT 5 will be trained on.</p>
<p>To get acceptable results, brain-inspired AI systems require vast amounts of data for tasks, especially auditory and visual tasks. This places a lot of emphasis on the creation of data collection pipelines. For instance, Tesla has 780 million miles of driving data and its data collection pipeline adds another million every 10 hours.</p>
<h3 id="heading-energy-efficiency">Energy Efficiency</h3>
<p>Building brain-inspired AI systems that emulate the brain's energy efficiency is a huge challenge. The human brain consumes approximately 20 watts of power. In comparison, Tesla's Autopilot, on specialized chips, consumes about 2,500 watts per second and <a target="_blank" href="https://ts2.space/en/exploring-the-environmental-footprint-of-gpt-4-energy-consumption-and-sustainability/#:~:text=The%20paper%20found%20that%20the,hours%20(MWh)%20of%20energy.">it takes around</a> 7.5-megawatt hours (MWh) to train an AI model the size of ChatGPT.</p>
<h3 id="heading-the-explainability-problem">The Explainability Problem</h3>
<p>Developing brain-inspired AI systems that can be trusted by users is crucial to the growth and adoption of AI – but therein lies the problem. </p>
<p>The brain, which AI systems are meant to be modeled after, is essentially a black box. The inner workings of the brain are not easy to understand, partly because of a lack of information surrounding how the brain processes thought. </p>
<p>There is no lack of research on the biological structure of the human brain, but there is a certain lack of empirical information on the functional qualities of the brain – that is, how thought is formed, how deja vu occurs, and so on. This leads to problems in the building of brain-inspired AI systems.</p>
<h3 id="heading-the-interdisciplinary-requirements">The Interdisciplinary Requirements</h3>
<p>The act of building brain-inspired AI systems requires the knowledge of experts in different fields, like Neuroscience, Computer Science, Engineering, Philosophy, and Psychology. </p>
<p>But this presents challenges, both logistical and foundational: getting experts from different fields is financially expensive. Also, there's the problem of knowledge conflict – it can be really difficult to get an engineer to care about the psychological effects of what they're building, not to mention of the problem of colliding egos.</p>
<h2 id="heading-summary">Summary</h2>
<p>While the brain-inspired approach seems like the obvious route to building AI systems, it has its challenges. But we can look to the future with the hope that efforts are being made to solve these problems.</p>
<p>If you enjoyed this article, consider subscribing to my <a target="_blank" href="https://www.freecodecamp.org/news/p/863dd550-5476-4d67-b6cd-93c316dd804a/edemgold.substack.com">newsletter</a> to get more articles like this.</p>
<h2 id="heading-references">References</h2>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/learn/machine-learning-with-python">freeCode Camp Machine Learning certification</a></li>
<li><a target="_blank" href="https://www.tesla.com/VehicleSafetyReport#:~:text=Because%20every%20Tesla%20is%20connected,the%20different%20ways%20accidents%20happen.">Tesla's Vehicle Safety Report</a></li>
<li><a target="_blank" href="https://arxiv.org/abs/1906.01703">Basic Neural Units of the Brain: Neurons, Synapses and Action Potential</a></li>
<li><a target="_blank" href="https://arxiv.org/pdf/2303.15935.pdf">When Brain-inspired AI meets AGI</a></li>
<li><a target="_blank" href="https://towardsdatascience.com/perceptron-the-artificial-neuron-4d8c70d5cc8d">Perceptron: The artificial Neuron (An Essential Upgrade To The McCulloch-Pitts Neuron)</a></li>
<li><a target="_blank" href="https://medium.com/towards-data-science/mcculloch-pitts-model-5fdf65ac5dd1">McCulloch-Pitts Neuron — Mankind’s First Mathematical Model Of A Biological Neuron</a></li>
<li><a target="_blank" href="https://axon.cs.byu.edu/Dan/678/papers/Recurrent/Werbos.pdf">BackPropagation through Time: What it does and How to do it</a></li>
<li><a target="_blank" href="https://edemgold.substack.com/p/the-history-of-ai">The History of AI</a></li>
<li><a target="_blank" href="https://www.frontiersin.org/articles/10.3389/fncom.2020.00016/full">BrainOS: A Novel Artificial Brain-Alike Automatic Machine Learning Framework</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Detect Objects in Images Using the YOLOv8 Neural Network ]]>
                </title>
                <description>
                    <![CDATA[ By Andrey Germanov Object detection is a computer vision task that involves identifying and locating objects in images or videos. It is an important part of many applications, such as self-driving cars, robotics, and video surveillance. Over the year... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-detect-objects-in-images-using-yolov8/</link>
                <guid isPermaLink="false">66d45ee37df3a1f32ee7f85c</guid>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Thu, 04 May 2023 18:17:42 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/05/n2auv9i8405cgnxhru40.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Andrey Germanov</p>
<p>Object detection is a computer vision task that involves identifying and locating objects in images or videos. It is an important part of many applications, such as self-driving cars, robotics, and video surveillance.</p>
<p>Over the years, many methods and algorithms have been developed to find objects in images and their positions. The best quality in performing these tasks comes from using convolutional neural networks. </p>
<p>One of the most popular neural networks for this task is YOLO, created in 2015 by Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi in their famous research paper "You Only Look Once: Unified, Real-Time Object Detection".</p>
<p>Since that time, there have been quite a few versions of YOLO. Recent releases can do even more than object detection. The newest release is <a target="_blank" href="https://ultralytics.com/yolov8">YOLOv8</a>, which we are going to use in this tutorial.</p>
<p>Here, I will show you the main features of this network for object detection. First, we will use a pre-trained model to detect common object classes like cats and dogs. Then, I will show how to train your own model to detect specific object types that you select, and how to prepare the data for this process. Finally, we will create a web application to detect objects on images right in a web browser using the custom trained model.</p>
<p>To follow this tutorial, you should be familiar with <a target="_blank" href="https://python.org">Python</a> and have a basic understanding of machine learning, neural networks, and their application in object detection. You can watch <a target="_blank" href="https://www.youtube.com/playlist?list=PL_IHmaMAvkVxdDOBRg2CbcJBq9SY7ZUvs">this short video course</a> to familiarize yourself with all required machine learning theory.</p>
<p>Once you've refreshed the theory, let's get started with the practice! Here's what we'll cover:</p>
<ol>
<li><a class="post-section-overview" href="#heading-problems-yolov8-can-solve">Problems YOLOv8 Can Solve</a></li>
<li><a class="post-section-overview" href="#heading-how-to-get-started-with-yolov8">How to Get Started with YOLOv8</a></li>
<li><a class="post-section-overview" href="#heading-how-to-prepare-data-to-train-the-yolov8-model">How to Prepare Data to Train the YOLOv8 Model</a></li>
<li><a class="post-section-overview" href="#heading-how-to-train-the-yolov8-model">How to Train the YOLOv8 Model</a></li>
<li><a class="post-section-overview" href="#heading-how-to-create-an-object-detection-web-service">How to Create an Object Detection Web Service</a></li>
<li><a class="post-section-overview" href="#heading-how-to-create-the-frontend">How to Create the Frontend</a></li>
<li><a class="post-section-overview" href="#heading-how-to-create-the-backend">How to Create the Backend</a></li>
<li><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></li>
</ol>
<h1 id="problems_can_solve">Problems YOLOv8 Can Solve</h1>

<p>You can use the YOLOv8 network to solve classification, object detection, and image segmentation problems. All these methods detect objects in images or in videos in different ways, as you can see in the image below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/compvision_tasks.png" alt="Image" width="600" height="400" loading="lazy">
<em>Common computer vision problems - classification, detection, and segmentation</em></p>
<p>The neural network that's created and trained for <strong>image classification</strong> determines a class of object on the image and returns its name and the probability of this prediction. </p>
<p>For example, on the left image, it returned that this is a "cat" and that the confidence level of this prediction is 92% (0.92).</p>
<p>The neural network for <strong>object detection</strong>, in addition to the object type and probability, returns the coordinates of the object on the image: x, y, width and height, as shown on the second image. Object detection neural networks can also detect several objects in the image and their bounding boxes.</p>
<p>Finally, in addition to object types and bounding boxes, the neural network trained for <strong>image segmentation</strong> detects the shapes of the objects, as shown on the right image.</p>
<p>There are many different neural network architectures developed for these tasks, and for each of them you had to use a separate network in the past. Fortunately, things changed after the <a target="_blank" href="https://docs.ultralytics.com/">YOLO</a> created. Now you can use a single platform for all these problems.</p>
<p>In this article, we will explore <strong>object detection</strong> using YOLOv8. I will guide you through how to create a web application that will detect traffic lights and road signs in images. In later articles I will cover other features, including image segmentation.</p>
<p>In the next sections, we will go through all steps required to create an object detector. By the end of this tutorial, you will have a complete AI powered web application.</p>
<h1 id="get_started">How to Get Started with YOLOv8</h1>

<p>Technically speaking, <a target="_blank" href="https://ultralytics.com/">YOLOv8</a> is a group of convolutional neural network models, created and trained using the <a target="_blank" href="https://pytorch.org/">PyTorch</a> framework.</p>
<p>In addition, the YOLOv8 package provides a single Python API to work with all of them using the same methods. That is why, to use it, you need an environment to run Python code. I highly recommend using <a target="_blank" href="https://jupyter.org/">Jupyter Notebook</a>.</p>
<p>After making sure that you have Python and Jupyter installed on your computer, run the notebook and install the YOLOv8 package in it by running the following command:</p>
<pre><code class="lang-python">!pip install ultralytics
</code></pre>
<p>The <code>ultralytics</code> package has the <code>YOLO</code> class, used to create neural network models.</p>
<p>To get access to it, import it to your Python code:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> ultralytics <span class="hljs-keyword">import</span> YOLO
</code></pre>
<p>Now everything is ready to create the neural network model:</p>
<pre><code class="lang-python">model = YOLO(<span class="hljs-string">"yolov8m.pt"</span>)
</code></pre>
<p>As I mentioned before, YOLOv8 is a group of neural network models. These models were created and trained using PyTorch and exported to files with the <code>.pt</code> extension. </p>
<p>There are three types of models and 5 models of different sizes for each type:</p>
<table>
<tbody>
<tr>
<td>
<strong>Classification</strong>
</td>
<td>
<strong>Detection</strong>
</td>
<td>
<strong>Segmentation</strong>
</td>
<td>
<strong>Kind</strong>
</td>
</tr>
<tr>
<td>
yolov8n-cls.pt
</td>
<td>
yolov8n.pt
</td>
<td>
yolov8n-seg.pt
</td>
<td>Nano</td>
</tr>
<tr>
<td>
yolov8s-cls.pt
</td>
<td>
yolov8s.pt
</td>
<td>
yolov8s-seg.pt
</td>
<td>Small</td>
</tr>
<tr>
<td>
yolov8m-cls.pt
</td>
<td>
yolov8m.pt
</td>
<td>
yolov8m-seg.pt
</td>
<td>Medium</td>
</tr>
<tr>
<td>
yolov8l-cls.pt
</td>
<td>
yolov8l.pt
</td>
<td>
yolov8l-seg.pt
</td>
<td>Large</td>
</tr>
<tr>
<td>
yolov8x-cls.pt
</td>
<td>
yolov8x.pt
</td>
<td>
yolov8x-seg.pt
</td>
<td>Huge</td>
</tr>
</tbody>
</table>

<p>The bigger the model you choose, the better the prediction quality you can achieve, but the slower it will work. </p>
<p>In this tutorial I will cover object detection – which is why, in the previous code snippet, I selected the "yolov8m.pt", which is a middle-sized model for object detection.</p>
<p>When you run this code for the first time, it will download the <code>yolov8m.pt</code> file from the Ultralytics server to the current folder. Then it will construct the <code>model</code> object. Now you can train this <code>model</code>, detect objects, and export it to use in production. For all these tasks, there are convenient methods:</p>
<ul>
<li><a target="_blank" href="https://docs.ultralytics.com/modes/train/">train({path to dataset descriptor file})</a> – used to train the model on the images dataset.</li>
<li><a target="_blank" href="https://docs.ultralytics.com/modes/predict">predict({image})</a> – used to make a prediction for a specified image, for example to detect bounding boxes of all objects that the model can find in the image.</li>
<li><a target="_blank" href="https://docs.ultralytics.com/modes/export/">export({format})</a> – used to export the model from the default PyTorch format to a specified format.</li>
</ul>
<p>All YOLOv8 models for object detection ship already pre-trained on the <a target="_blank" href="https://cocodataset.org/">COCO dataset</a>, which is a huge collection of images of 80 different types. So, if you do not have specific needs, then you can just run it as is, without additional training. </p>
<p>For example, you can download this image as "cat_dog.jpg":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/cat_dog.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>A sample image with cat and dog</em></p>
<p>and run <code>predict</code> to detect all objects in it:</p>
<pre><code class="lang-python">results = model.predict(<span class="hljs-string">"cat_dog.jpg"</span>)
</code></pre>
<p>The <code>predict</code> method accepts many different input types, including a path to a single image, an array of paths to images, the Image object of the well-known <a target="_blank" href="https://pillow.readthedocs.io/en/stable/">PIL</a> Python library, and <a target="_blank" href="https://docs.ultralytics.com/modes/predict/#sources">others</a>.</p>
<p>After running the input through the model, it returns an array of results for each input image. As we provided only a single image, it returns an array with a single item that you can extract like this:</p>
<pre><code class="lang-python">result = results[<span class="hljs-number">0</span>]
</code></pre>
<p>The <a target="_blank" href="https://docs.ultralytics.com/modes/predict/#working-with-results">result</a> contains detected objects and convenient properties to work with them. The most important one is the <code>boxes</code> array with information about detected bounding boxes on the image. You can determine how many objects it detected by running the <code>len</code> function:</p>
<pre><code class="lang-python">len(result.boxes)
</code></pre>
<p>When I ran this, I got "2", which means that there are two boxes detected: one for the dog and one for the cat.</p>
<p>Then you can analyze each box either in a loop or manually. Let's get the first one:</p>
<pre><code class="lang-python">box = result.boxes[<span class="hljs-number">0</span>]
</code></pre>
<p>The <a target="_blank" href="https://docs.ultralytics.com/modes/predict/#boxes">box</a> object contains the properties of the bounding box, including:</p>
<ul>
<li><code>xyxy</code> – the coordinates of the box as an array [x1,y1,x2,y2]</li>
<li><code>cls</code> – the ID of object type</li>
<li><code>conf</code> – the confidence level of the model about this object. If it's very low, like &lt; 0.5, then you can just ignore the box.</li>
</ul>
<p>Let's print information about the detected box:</p>
<pre><code class="lang-python">print(<span class="hljs-string">"Object type:"</span>, box.cls)
print(<span class="hljs-string">"Coordinates:"</span>, box.xyxy)
print(<span class="hljs-string">"Probability:"</span>, box.conf)
</code></pre>
<p>For the first box, you will receive the following information:</p>
<pre><code class="lang-bash">Object <span class="hljs-built_in">type</span>: tensor([16.])
Coordinates: tensor([[261.1901,  94.3429, 460.5649, 312.9910]])
Probability: tensor([0.9528])
</code></pre>
<p>As I explained above, YOLOv8 contains PyTorch models. The outputs from the PyTorch models are encoded as an array of PyTorch <a target="_blank" href="https://pytorch.org/docs/stable/tensors.html">Tensor</a> objects, so you need to extract the first item from each of these arrays:</p>
<pre><code class="lang-python">print(<span class="hljs-string">"Object type:"</span>,box.cls[<span class="hljs-number">0</span>])
print(<span class="hljs-string">"Coordinates:"</span>,box.xyxy[<span class="hljs-number">0</span>])
print(<span class="hljs-string">"Probability:"</span>,box.conf[<span class="hljs-number">0</span>])
</code></pre>
<pre><code class="lang-bash">Object <span class="hljs-built_in">type</span>: tensor(16.)
Coordinates: tensor([261.1901,  94.3429, 460.5649, 312.9910])
Probability: tensor(0.9528)
</code></pre>
<p>Now you see the data as <code>Tensor</code> objects. To unpack actual values from Tensor, you need to use the <code>.tolist()</code> method for tensors with array inside, as well as the <code>.item()</code> method for tensors with scalar values. </p>
<p>Let's extract the data to the appropriate variables:</p>
<pre><code class="lang-python">cords = box.xyxy[<span class="hljs-number">0</span>].tolist()
class_id = box.cls[<span class="hljs-number">0</span>].item()
conf = box.conf[<span class="hljs-number">0</span>].item()
print(<span class="hljs-string">"Object type:"</span>, class_id)
print(<span class="hljs-string">"Coordinates:"</span>, cords)
print(<span class="hljs-string">"Probability:"</span>, conf)
</code></pre>
<pre><code class="lang-bash">Object <span class="hljs-built_in">type</span>: 16.0
Coordinates: [261.1900634765625, 94.3428955078125, 460.5649108886719, 312.9909973144531]
Probability: 0.9528293609619141
</code></pre>
<p>Now you see the actual data. The coordinates can be rounded, and the probability also can be rounded to two digits after the dot.</p>
<p>The object type is <code>16</code> here. What does this mean? Let's talk more about that. </p>
<p>All objects that the neural network can detect have numeric IDs. In case of a YOLOv8 pretrained model, there are 80 object types with IDs from 0 to 79. The COCO object classes are well known and you can easily google them on the Internet. In addition, the YOLOv8 result object contains the convenient <code>names</code> property to get these classes:</p>
<pre><code class="lang-python">print(result.names)
</code></pre>
<pre><code class="lang-bash">{0: <span class="hljs-string">'person'</span>,
 1: <span class="hljs-string">'bicycle'</span>,
 2: <span class="hljs-string">'car'</span>,
 3: <span class="hljs-string">'motorcycle'</span>,
 4: <span class="hljs-string">'airplane'</span>,
 5: <span class="hljs-string">'bus'</span>,
 6: <span class="hljs-string">'train'</span>,
 7: <span class="hljs-string">'truck'</span>,
 8: <span class="hljs-string">'boat'</span>,
 9: <span class="hljs-string">'traffic light'</span>,
 10: <span class="hljs-string">'fire hydrant'</span>,
 11: <span class="hljs-string">'stop sign'</span>,
 12: <span class="hljs-string">'parking meter'</span>,
 13: <span class="hljs-string">'bench'</span>,
 14: <span class="hljs-string">'bird'</span>,
 15: <span class="hljs-string">'cat'</span>,
 16: <span class="hljs-string">'dog'</span>,
 17: <span class="hljs-string">'horse'</span>,
 18: <span class="hljs-string">'sheep'</span>,
 19: <span class="hljs-string">'cow'</span>,
 20: <span class="hljs-string">'elephant'</span>,
 21: <span class="hljs-string">'bear'</span>,
 22: <span class="hljs-string">'zebra'</span>,
 23: <span class="hljs-string">'giraffe'</span>,
 24: <span class="hljs-string">'backpack'</span>,
 25: <span class="hljs-string">'umbrella'</span>,
 26: <span class="hljs-string">'handbag'</span>,
 27: <span class="hljs-string">'tie'</span>,
 28: <span class="hljs-string">'suitcase'</span>,
 29: <span class="hljs-string">'frisbee'</span>,
 30: <span class="hljs-string">'skis'</span>,
 31: <span class="hljs-string">'snowboard'</span>,
 32: <span class="hljs-string">'sports ball'</span>,
 33: <span class="hljs-string">'kite'</span>,
 34: <span class="hljs-string">'baseball bat'</span>,
 35: <span class="hljs-string">'baseball glove'</span>,
 36: <span class="hljs-string">'skateboard'</span>,
 37: <span class="hljs-string">'surfboard'</span>,
 38: <span class="hljs-string">'tennis racket'</span>,
 39: <span class="hljs-string">'bottle'</span>,
 40: <span class="hljs-string">'wine glass'</span>,
 41: <span class="hljs-string">'cup'</span>,
 42: <span class="hljs-string">'fork'</span>,
 43: <span class="hljs-string">'knife'</span>,
 44: <span class="hljs-string">'spoon'</span>,
 45: <span class="hljs-string">'bowl'</span>,
 46: <span class="hljs-string">'banana'</span>,
 47: <span class="hljs-string">'apple'</span>,
 48: <span class="hljs-string">'sandwich'</span>,
 49: <span class="hljs-string">'orange'</span>,
 50: <span class="hljs-string">'broccoli'</span>,
 51: <span class="hljs-string">'carrot'</span>,
 52: <span class="hljs-string">'hot dog'</span>,
 53: <span class="hljs-string">'pizza'</span>,
 54: <span class="hljs-string">'donut'</span>,
 55: <span class="hljs-string">'cake'</span>,
 56: <span class="hljs-string">'chair'</span>,
 57: <span class="hljs-string">'couch'</span>,
 58: <span class="hljs-string">'potted plant'</span>,
 59: <span class="hljs-string">'bed'</span>,
 60: <span class="hljs-string">'dining table'</span>,
 61: <span class="hljs-string">'toilet'</span>,
 62: <span class="hljs-string">'tv'</span>,
 63: <span class="hljs-string">'laptop'</span>,
 64: <span class="hljs-string">'mouse'</span>,
 65: <span class="hljs-string">'remote'</span>,
 66: <span class="hljs-string">'keyboard'</span>,
 67: <span class="hljs-string">'cell phone'</span>,
 68: <span class="hljs-string">'microwave'</span>,
 69: <span class="hljs-string">'oven'</span>,
 70: <span class="hljs-string">'toaster'</span>,
 71: <span class="hljs-string">'sink'</span>,
 72: <span class="hljs-string">'refrigerator'</span>,
 73: <span class="hljs-string">'book'</span>,
 74: <span class="hljs-string">'clock'</span>,
 75: <span class="hljs-string">'vase'</span>,
 76: <span class="hljs-string">'scissors'</span>,
 77: <span class="hljs-string">'teddy bear'</span>,
 78: <span class="hljs-string">'hair drier'</span>,
 79: <span class="hljs-string">'toothbrush'</span>}
</code></pre>
<p>This dictionary has everything that this model can detect. Now you can find that <code>16</code> is "dog", so this bounding box is the bounding box for detected DOG. </p>
<p>Let's modify the output to show results in a more representative way:</p>
<pre><code class="lang-python">cords = box.xyxy[<span class="hljs-number">0</span>].tolist()
cords = [round(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> cords]
class_id = result.names[box.cls[<span class="hljs-number">0</span>].item()]
conf = round(box.conf[<span class="hljs-number">0</span>].item(), <span class="hljs-number">2</span>)
print(<span class="hljs-string">"Object type:"</span>, class_id)
print(<span class="hljs-string">"Coordinates:"</span>, cords)
print(<span class="hljs-string">"Probability:"</span>, conf)
</code></pre>
<p>In this code I rounded all coordinates using Python <a target="_blank" href="https://www.freecodecamp.org/news/list-comprehension-in-python-with-code-examples/">list comprehension</a>. Then I got the name of the detected object class by ID using the <code>result.names</code> dictionary. I also rounded the probability. You should get the following output:</p>
<pre><code class="lang-bash">Object <span class="hljs-built_in">type</span>: dog
Coordinates: [261, 94, 461, 313]
Probability: 0.95
</code></pre>
<p>This data is good enough to show in the user interface. Let's now write some code to get this information for all detected boxes in a loop:</p>
<pre><code class="lang-python"><span class="hljs-keyword">for</span> box <span class="hljs-keyword">in</span> result.boxes:
  class_id = result.names[box.cls[<span class="hljs-number">0</span>].item()]
  cords = box.xyxy[<span class="hljs-number">0</span>].tolist()
  cords = [round(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> cords]
  conf = round(box.conf[<span class="hljs-number">0</span>].item(), <span class="hljs-number">2</span>)
  print(<span class="hljs-string">"Object type:"</span>, class_id)
  print(<span class="hljs-string">"Coordinates:"</span>, cords)
  print(<span class="hljs-string">"Probability:"</span>, conf)
  print(<span class="hljs-string">"---"</span>)
</code></pre>
<p>This code will do the same for each box and will output the following:</p>
<pre><code class="lang-python">Object type: dog
Coordinates: [<span class="hljs-number">261</span>, <span class="hljs-number">94</span>, <span class="hljs-number">461</span>, <span class="hljs-number">313</span>]
Probability: <span class="hljs-number">0.95</span>
---
Object type: cat
Coordinates: [<span class="hljs-number">140</span>, <span class="hljs-number">170</span>, <span class="hljs-number">256</span>, <span class="hljs-number">316</span>]
Probability: <span class="hljs-number">0.92</span>
---
</code></pre>
<p>This way you can run object detection for other images and see everything that a COCO-trained model can detect in them.</p>
<p>This video shows the whole coding session of this section in Jupyter Notebook, assuming you have it <a target="_blank" href="https://jupyter.org/install">installed</a>.</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/8Q87QYlonRU" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p>Using models that are pre-trained on well-known objects is ok to start. But in practice, you may need a solution to detect specific objects for a concrete business problem.</p>
<p>For example, someone may need to detect specific products on supermarket shelves or discover brain tumors on x-rays. It's highly likely that this information is not available in public datasets, and there are no free models that know about everything.</p>
<p>So, you have to teach your own model to detect these types of objects. To do that, you need to create a database of annotated images for your problem and train the model on these images.</p>
<h1 id="data">How to Prepare Data to Train the YOLOv8 Model</h1>

<p>To train the model, you need to prepare annotated images and split them into training and validation datasets. </p>
<p>You'll use the training set to teach the model and the validation set to test the results of the study and measure the quality of the trained model. You can put 80% of the images in the training set and 20% in the validation set.</p>
<p>These are the steps that you need to follow to create each of the datasets:</p>
<ol>
<li>Decide on and encode classes of objects you want to teach your model to detect. For example, if you want to detect only cats and dogs, then you can state that "0" is cat and "1" is dog.</li>
<li>Create a folder for your dataset and two subfolders in it: "images" and "labels".</li>
<li>Add the images to the "images" subfolder. The more images you collect, the better for training.</li>
<li>For each image, create an annotation text file in the "labels" subfolder. Annotation text files should have the same names as image files and the ".txt" extensions. In the annotation files you should add records about each object that exist on the appropriate image in the following format:</li>
</ol>
<pre><code>{object_class_id} {x_center} {y_center} {width} {height}
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/04/bounding_box.png" alt="Image" width="600" height="400" loading="lazy">
<em>Bounding box parameters</em></p>
<p>This is the most time-consuming manual work in the machine learning process: to measure bounding boxes for all objects and add them to annotation files. </p>
<p>You should also normalize the coordinates to fit in a range from 0 to 1. To calculate them, you need to use the following formulas:</p>
<ul>
<li>x_center = (box_x_left+box_x_width/2)/image_width</li>
<li>y_center = (box_y_top+box_height/2)/image_height</li>
<li>width = box_width/image_width</li>
<li>height = box_height/image_height</li>
</ul>
<p>For example, if you want to add the "cat_dog.jpg" image that we used before to the dataset, you need to copy it to the "images" folder and then measure and collect the following data about the image, and it's bounding boxes:</p>
<p><strong>Image:</strong></p>
<p>image_width = 612<br>image_height = 415</p>
<p><strong>Objects:</strong></p>
<table>
<tbody>
<tr>
<td><strong>Dog</strong></td>
<td><strong>Cat</strong></td>
</tr>
<tr>
<td>
box_x_left=261<br> 
box_x_top=94<br>
box_width=200<br>
box_height=219
</td>
<td>
box_x_left=140<br>
box_x_top=170<br>
box_width=116<br>
box_height=146
</td>
</tr>
</tbody>
</table>

<p>Then, create the "cat_dog.txt" file in the "labels" folder and, using the formulas above, calculate the coordinates:</p>
<p>Dog (class id=1):</p>
<p>x_center = (261+200/2)/612 = 0.589869281<br>y_center = (94+219/2)/415 = 0.490361446<br>width = 200/612 = 0.326797386<br>height = 219/415 = 0.527710843</p>
<p>Cat (class id=0)</p>
<p>x_center = (140+116/2)/612 = 0.323529412<br>y_center = (170+146/2)/415 = 0.585542169<br>width = 116/612 = 0.189542484<br>height = 146/415 = 0.351807229</p>
<p>and add the following lines to the file:</p>
<pre><code><span class="hljs-number">1</span> <span class="hljs-number">0.589869281</span> <span class="hljs-number">0.490361446</span> <span class="hljs-number">0.326797386</span> <span class="hljs-number">0.527710843</span>
<span class="hljs-number">0</span> <span class="hljs-number">0.323529412</span> <span class="hljs-number">0.585542169</span> <span class="hljs-number">0.189542484</span> <span class="hljs-number">0.351807229</span>
</code></pre><p>The first line contains a bounding box for the dog (class id=1). The second line contains a bounding box for the cat (class id=0). Of course, you can have the image with many dogs and many cats at the same time, and you can add bounding boxes for all of them.</p>
<p>After adding and annotating all images, the dataset is ready. You need to create two datasets and place them in different folders. The final folder structure can look like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/dataset_structure.png" alt="Image" width="600" height="400" loading="lazy">
<em>Dataset structure</em></p>
<p>As you can see, the training dataset is located in the "train" folder and the validation dataset is located in the "val" folder.</p>
<p>Finally, you need to create a dataset descriptor YAML-file that points to the created datasets and describes the object classes in them. This is a sample of this file for the data created above:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">train:</span> <span class="hljs-string">../train/images</span>
<span class="hljs-attr">val:</span> <span class="hljs-string">../val/images</span>

<span class="hljs-attr">nc:</span> <span class="hljs-number">2</span>
<span class="hljs-attr">names:</span> [<span class="hljs-string">'cat'</span>,<span class="hljs-string">'dog'</span>]
</code></pre>
<p>In the first two lines, you need to specify paths to the images of the training and the validation datasets. The paths can be either relative to the current folder or absolute. </p>
<p>Then, the <code>nc</code> line specifies the <strong>n</strong>umber of <strong>c</strong>lasses that exist in these datasets, and <code>names</code> is an array of class names in correct order. </p>
<p>Indexes of these items are numbers that you used when annotating the images, and these indexes will be returned by the model when it detects objects using the <code>predict</code> method. So, if you used "0" for cats, then it should be the first item in the <code>names</code> array.</p>
<p>This YAML file should be passed to the <code>train</code> method of the model to start the training process.</p>
<p>To make the image annotation process easier, there are a lot of programs you can use to visually annotate images for machine learning. You can search for something like "software to annotate images for machine learning" to get a list of these programs. </p>
<p>There are also many online tools that can do all this work, like <a target="_blank" href="https://roboflow.com/annotate">Roboflow Annotate</a>. Using this service, you just need to upload your images, draw bounding boxes on them, and set classes for each bounding box. Then, the tool will automatically create annotation files, split your data to train and validation datasets, and create a YAML descriptor file. Then you can export and download the annotated data as a ZIP file.</p>
<p>In the below video, I show you how to use Roboflow to create the "cats and dogs" micro-dataset.</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/sLZRfzaRBwg" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p>For real life problems, that database should be much bigger. To train a good model, you should have hundreds or thousands of annotated images.</p>
<p>Also, when preparing the images database, try to make it balanced. It should have an equal number of objects of each class, that is an equal number of dogs and cats in this example. Otherwise, the model trained on it may predict one class better than another.</p>
<p>After the data is ready, copy it to the folder with your Python code that you will use for training and return back to your Jupyter Notebook to start the training process.</p>
<h1 id="train">How to Train the YOLOv8 Model</h1>

<p>After the data is ready, you need to pass it through the model. To make it more interesting, we will not use this small "cats and dogs" dataset. We will use another custom dataset for training that contains <a target="_blank" href="https://universe.roboflow.com/roboflow-100/road-signs-6ih4y">traffic lights and road signs</a>. This is a free dataset that I got from the Roboflow Universe. Press "Download Dataset" and select "YOLOv8" as the format.</p>
<p>If it's not available on Roboflow when you read this, then you can get it from <a target="_blank" href="https://drive.google.com/file/d/1PNktsghBqIJVgxa-34FqO3yODNJbH3B0/view?usp=sharing">my Google Drive</a>. You can use this dataset to teach YOLOv8 to detect different objects on roads, like you can see in the next screenshot.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/traffic_lights.png" alt="Image" width="600" height="400" loading="lazy">
<em>Traffic lights detection demo</em></p>
<p>You can open the downloaded zip file and ensure that it's already annotated and structured using the rules described above. You can find the dataset descriptor file <code>data.yaml</code> in the archive as well.</p>
<p>If you downloaded the archive from Roboflow, it will contain the additional "test" dataset, which is not used by the training process. You can use the images from it for additional testing on your own after training.</p>
<p>Extract the archive to the folder with your Python code and execute the <code>train</code> method to start a training loop:</p>
<pre><code class="lang-python">model.train(data=<span class="hljs-string">"data.yaml"</span>, epochs=<span class="hljs-number">30</span>)
</code></pre>
<p>The <code>data</code> is the only required option. You have to pass the YAML descriptor file to it. The <code>epochs</code> option specifies the number of training cycles (100 by default). There are other <a target="_blank" href="https://docs.ultralytics.com/modes/train/#arguments">options</a> that can affect the process and quality of the trained model.</p>
<p>Each training cycle consists of two phases: a training phase and a validation phase.</p>
<p>During the training phase, the <code>train</code> method does the following:</p>
<ul>
<li>Extracts the random batch of images from the training dataset (the number of images in the batch can be specified using the <code>batch</code> option).</li>
<li>Passes these images through the model and receives the resulting bounding boxes of all detected objects and their classes.</li>
<li>Passes the result to the loss function that's used to compare the received output with correct result from annotation files for these images. The loss function calculates the amount of error.</li>
<li>The result of the loss function is passed to the <code>optimizer</code> to adjust the model weights based on the amount of error in the correct direction. This reduces the errors in the next cycle. By default, the <a target="_blank" href="https://towardsdatascience.com/stochastic-gradient-descent-clearly-explained-53d239905d31">SGD (Stochastic Gradient Descent)</a> optimizer is used, but you can try others, like <a target="_blank" href="https://www.linkedin.com/pulse/understanding-adam-optimizer-gradient-descent-evan-dunbar/">Adam</a>, to see the difference.</li>
</ul>
<p>During the validation phase, <code>train</code> does the following:</p>
<ul>
<li>Extracts the images from the validation dataset.</li>
<li>Passes them through the model and receives the detected bounding boxes for these images.</li>
<li>Compares the received result with true values for these images from annotation text files.</li>
<li>Calculates the precision of the model based on the difference between actual and expected results.</li>
</ul>
<p>The progress and results of each phase for each epoch are displayed on the screen. This way you can see how the model learns and improves from epoch to epoch.</p>
<p>When you run the <code>train</code> code, you will see a similar output to the following during the training loop:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/training.png" alt="Image" width="600" height="400" loading="lazy">
<em>Training process</em></p>
<p>For each epoch it shows a summary for both the training and validation phases: lines 1 and 2 show results of the training phase and lines 3 and 4 show the results of the validation phase for each epoch.</p>
<p>The training phase includes a calculation of the amount of error in a loss function, so the most valuable metrics here are <code>box_loss</code> and <code>cls_loss</code>.</p>
<ul>
<li><code>box_loss</code> shows the amount of error in detected bounding boxes.</li>
<li><code>cls_loss</code> shows the amount of error in detected object classes.</li>
</ul>
<p>Why is the loss split to different metrics? Because the model might correctly detect the bounding box coordinates around the object, but incorrectly detect the object class in this box. For example, in my practice, it detected the dog as a horse, but the dimensions of the object were detected correctly.</p>
<p>If the model really learns something from the data, then you should see that these values decrease from epoch to epoch. In a previous screenshot the <code>box_loss</code> decreased: 0.7751, 0.7473, 0.742 and the <code>cls_loss</code> decreased too: 0.702, 0.6422, 0.6211.</p>
<p>In the validation phase, it calculates the quality of the model after training using the images from the validation dataset. </p>
<p>The most valuable quality metric is mAP50-95, which is <a target="_blank" href="https://www.v7labs.com/blog/mean-average-precision">Mean Average Precision</a>. If the model learns and improves, the precision should grow from epoch to epoch. In a previous screenshot you can see that it slowly grew: 0.788, 0.788, 0.791.</p>
<p>If after the last epoch you did not get acceptable precision, you can increase the number of epochs and run the training again. Also, you can tune other parameters like <code>batch</code>, <code>lr0</code>, <code>lrf</code> or change the <code>optimizer</code> you're using. There are no clear rules on what to do here, but there are a lot of recommendations.</p>
<p>The topic of tuning the parameters of the training process goes beyond the scope of article. I think it's possible to write a book about this and many of them already exist. You can easily find them on the Internet. But in a few words, most of them say that you need to experiment and try all possible options and compare results.</p>
<p>In addition to the metrics that are shown during the training process, it writes a lot of statistics on disk. When training starts, it creates the <code>runs/detect/train</code> subfolder in the current folder and after each epoch it logs different log files to it.</p>
<p>It also exports the trained model after each epoch to the <code>/runs/detect/train/weights/last.pt</code> file and the model with the highest precision to the <code>/runs/detect/train/weights/best.pt</code> file. So, after training is finished, you can get the <code>best.pt</code> file to use in production.</p>
<p>You can watch this video to learn more about how the training process works. I used <a target="_blank" href="https://colab.research.google.com/">Google Colab</a> which is a cloud version of Jupyter Notebook to get access to hardware with more powerful GPU to speed up the training process. </p>
<p>The video shows how to train the model on 5 epochs and download the final <code>best.pt</code> model. In real world problems, you need to run much more epochs and be prepared to wait hours or maybe days until training finishes.</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/HZobbSjbAUc" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p>After it's finished, it's time to run the trained model in production. In the next section, we will create a web service to detect objects in images online in a web browser.</p>
<h1 id="detect">How to Create an Object Detection Web Service
</h1>

<p>At this point, we're finished experimenting with the model in the Jupyter Notebook. You'll need to write the next batch of code as a separate project, using any Python IDE like <a target="_blank" href="https://code.visualstudio.com/">VS Code</a> or <a target="_blank" href="https://www.jetbrains.com/pycharm/">PyCharm</a>.</p>
<p>The web service that we are going to create will have a web page with a file input field and an HTML5 canvas element. </p>
<p>When the user selects an image file using the input field, the interface will send it to the backend. Then, the backend will pass the image through the model that we created and trained and return the array of detected bounding boxes to the web page. </p>
<p>When it receives this, the frontend will draw the image on the canvas element and the detected bounding boxes on top of it. </p>
<p>The service will look and work as demonstrated on this video:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/iOIfm_5QIiw" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p>In the video, I used the model trained on 30 epochs, and it still does not detect some traffic lights. You can try to train it more to get better results. But the best way to improve the quality of a machine learning model is by adding more and more data. </p>
<p>So, as an additional exercise, you can import the dataset folder to Roboflow, add and annotate more images to it, and then use the updated data to continue training the model.</p>
<h2 id="frontend">How to Create the Frontend</h2>

<p>To start with, create a folder for a new Python project and an <code>index.html</code> file in it for the frontend web page. Here are the contents of this file:</p>
<pre><code class="lang-html"><span class="hljs-meta">&lt;!DOCTYPE <span class="hljs-meta-keyword">html</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">html</span> <span class="hljs-attr">lang</span>=<span class="hljs-string">"en"</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">head</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">meta</span> <span class="hljs-attr">charset</span>=<span class="hljs-string">"UTF-8"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">title</span>&gt;</span>YOLOv8 Object Detection<span class="hljs-tag">&lt;/<span class="hljs-name">title</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">style</span>&gt;</span><span class="css">
        <span class="hljs-selector-tag">canvas</span> {
            <span class="hljs-attribute">display</span>:block;
            <span class="hljs-attribute">border</span>: <span class="hljs-number">1px</span> solid black;
            <span class="hljs-attribute">margin-top</span>:<span class="hljs-number">10px</span>;
        }
    </span><span class="hljs-tag">&lt;/<span class="hljs-name">style</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">head</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">body</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"uploadInput"</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"file"</span>/&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">script</span>&gt;</span><span class="javascript">
       <span class="hljs-comment">/**
       * "Upload" button onClick handler: uploads selected 
       * image file to backend, receives an array of
       * detected objects and draws them on top of image
       */</span>
       <span class="hljs-keyword">const</span> input = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">"uploadInput"</span>);
       input.addEventListener(<span class="hljs-string">"change"</span>,<span class="hljs-keyword">async</span>(event) =&gt; {
           <span class="hljs-keyword">const</span> file = event.target.files[<span class="hljs-number">0</span>];
           <span class="hljs-keyword">const</span> data = <span class="hljs-keyword">new</span> FormData();
           data.append(<span class="hljs-string">"image_file"</span>,file,<span class="hljs-string">"image_file"</span>);
           <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">"/detect"</span>,{
               <span class="hljs-attr">method</span>:<span class="hljs-string">"post"</span>,
               <span class="hljs-attr">body</span>:data
           });
           <span class="hljs-keyword">const</span> boxes = <span class="hljs-keyword">await</span> response.json();
           draw_image_and_boxes(file,boxes);
       })

       <span class="hljs-comment">/**
       * Function draws the image from provided file
       * and bounding boxes of detected objects on
       * top of the image
       * <span class="hljs-doctag">@param </span>file Uploaded file object
       * <span class="hljs-doctag">@param </span>boxes Array of bounding boxes in format
         [[x1,y1,x2,y2,object_type,probability],...]
       */</span>
       <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">draw_image_and_boxes</span>(<span class="hljs-params">file,boxes</span>) </span>{
          <span class="hljs-keyword">const</span> img = <span class="hljs-keyword">new</span> Image()
          img.src = URL.createObjectURL(file);
          img.onload = <span class="hljs-function">() =&gt;</span> {
              <span class="hljs-keyword">const</span> canvas = <span class="hljs-built_in">document</span>.querySelector(<span class="hljs-string">"canvas"</span>);
              canvas.width = img.width;
              canvas.height = img.height;
              <span class="hljs-keyword">const</span> ctx = canvas.getContext(<span class="hljs-string">"2d"</span>);
              ctx.drawImage(img,<span class="hljs-number">0</span>,<span class="hljs-number">0</span>);
              ctx.strokeStyle = <span class="hljs-string">"#00FF00"</span>;
              ctx.lineWidth = <span class="hljs-number">3</span>;
              ctx.font = <span class="hljs-string">"18px serif"</span>;
              boxes.forEach(<span class="hljs-function">(<span class="hljs-params">[x1,y1,x2,y2,label]</span>) =&gt;</span> {
                  ctx.strokeRect(x1,y1,x2-x1,y2-y1);
                  ctx.fillStyle = <span class="hljs-string">"#00ff00"</span>;
                  <span class="hljs-keyword">const</span> width = ctx.measureText(label).width;
                  ctx.fillRect(x1,y1,width+<span class="hljs-number">10</span>,<span class="hljs-number">25</span>);
                  ctx.fillStyle = <span class="hljs-string">"#000000"</span>;
                  ctx.fillText(label,x1,y1+<span class="hljs-number">18</span>);
              });
          }
       }
  </span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>  
<span class="hljs-tag">&lt;/<span class="hljs-name">body</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">html</span>&gt;</span>
</code></pre>
<p>The HTML part is very tiny and consists only of the file input field with "uploadInput" ID and the canvas element below it. </p>
<p>Then, in the JavaScript part, the "onChange" we define the event handler for the input field. When the user selects an image file, the handler uses <code>fetch</code> to make a POST request to the <code>/detect</code> backend endpoint (which we will create later) and sends this image file to it.</p>
<p>The backend should detect objects on this image and return a response with a <code>boxes</code> array as JSON. This response then gets decoded and passed to the <code>draw_image_and_boxes</code> function along with an image file itself.</p>
<p>The <code>draw_image_and_boxes</code> function loads the image from file. As soon as it's loaded, it draws it on the canvas. Then, it draws each bounding box with a class label on top of the canvas with the image.</p>
<p>So, now let's create the backend with a <code>/detect</code> endpoint for it.</p>
<h2 id="backend">How to Create the Backend</h2>

<p>We'll create the backend using <a target="_blank" href="https://flask.palletsprojects.com/en/2.2.x/">Flask</a>. Flask has its own internal web server, but according to many Flask developers, it's not reliable enough for productio. So we will use the <a target="_blank" href="https://flask.palletsprojects.com/en/2.2.x/deploying/waitress/">Waitress</a> web server and run our Flask app in it.</p>
<p>Also, we will use the <a target="_blank" href="https://pillow.readthedocs.io/en/stable/">Pillow</a> library to read an uploaded binary files as images. Make sure you have all these packages installed on your system before continuing:</p>
<pre><code class="lang-bash">pip3 install flask
pip3 install waitress
pip3 install pillow
</code></pre>
<p>The backend will be in a single file. Let's name it <code>object_detector.py</code>:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> ultralytics <span class="hljs-keyword">import</span> YOLO
<span class="hljs-keyword">from</span> flask <span class="hljs-keyword">import</span> request, Response, Flask
<span class="hljs-keyword">from</span> waitress <span class="hljs-keyword">import</span> serve
<span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
<span class="hljs-keyword">import</span> json

app = Flask(__name__)

<span class="hljs-meta">@app.route("/")</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">root</span>():</span>
    <span class="hljs-string">"""
    Site main page handler function.
    :return: Content of index.html file
    """</span>
    <span class="hljs-keyword">with</span> open(<span class="hljs-string">"index.html"</span>) <span class="hljs-keyword">as</span> file:
        <span class="hljs-keyword">return</span> file.read()


<span class="hljs-meta">@app.route("/detect", methods=["POST"])</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">detect</span>():</span>
    <span class="hljs-string">"""
        Handler of /detect POST endpoint
        Receives uploaded file with a name "image_file", 
        passes it through YOLOv8 object detection 
        network and returns an array of bounding boxes.
        :return: a JSON array of objects bounding 
        boxes in format 
        [[x1,y1,x2,y2,object_type,probability],..]
    """</span>
    buf = request.files[<span class="hljs-string">"image_file"</span>]
    boxes = detect_objects_on_image(Image.open(buf.stream))
    <span class="hljs-keyword">return</span> Response(
      json.dumps(boxes),  
      mimetype=<span class="hljs-string">'application/json'</span>
    )


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">detect_objects_on_image</span>(<span class="hljs-params">buf</span>):</span>
    <span class="hljs-string">"""
    Function receives an image,
    passes it through YOLOv8 neural network
    and returns an array of detected objects
    and their bounding boxes
    :param buf: Input image file stream
    :return: Array of bounding boxes in format 
    [[x1,y1,x2,y2,object_type,probability],..]
    """</span>
    model = YOLO(<span class="hljs-string">"best.pt"</span>)
    results = model.predict(buf)
    result = results[<span class="hljs-number">0</span>]
    output = []
    <span class="hljs-keyword">for</span> box <span class="hljs-keyword">in</span> result.boxes:
        x1, y1, x2, y2 = [
          round(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> box.xyxy[<span class="hljs-number">0</span>].tolist()
        ]
        class_id = box.cls[<span class="hljs-number">0</span>].item()
        prob = round(box.conf[<span class="hljs-number">0</span>].item(), <span class="hljs-number">2</span>)
        output.append([
          x1, y1, x2, y2, result.names[class_id], prob
        ])
    <span class="hljs-keyword">return</span> output

serve(app, host=<span class="hljs-string">'0.0.0.0'</span>, port=<span class="hljs-number">8080</span>)
</code></pre>
<p>First, we import the required libraries:</p>
<ul>
<li><a target="_blank" href="https://github.com/ultralytics/ultralytics">ultralytics</a> for the YOLOv8 model.</li>
<li><a target="_blank" href="https://flask.palletsprojects.com/en/2.2.x/">flask</a> to create a <code>Flask</code> web application, to receive <code>requests</code> from the frontend and send <code>responses</code> back to it.</li>
<li><a target="_blank" href="https://flask.palletsprojects.com/en/2.2.x/deploying/waitress/">waitress</a> to run a web server and <code>serve</code> the Flask web <code>app</code> in it.</li>
<li><a target="_blank" href="https://pillow.readthedocs.io/en/stable/">PIL</a> to load an uploaded file as an <code>Image</code> object, that required for YOLOv8.</li>
<li><a target="_blank" href="https://docs.python.org/3/library/json.html">json</a> to convert the array of bounding boxes to JSON before returning it to the frontend.</li>
</ul>
<p>Then, we defined two routes:</p>
<ul>
<li><code>/</code> that serves as a root of web service. It just returns the content of the "index.html" file.</li>
<li><code>/detect</code> that responds to an image upload request from the frontend. It converts the RAW file to the Pillow Image object, then passes this image to the <code>detect_objects_on_image</code> function.</li>
</ul>
<p>The <code>detect_objects_on_image</code> function creates a model object based on the <code>best.pt</code> model that we trained in the previous section. Make sure that this file exists in the folder where you write the code.</p>
<p>Then it calls the <code>predict</code> method for the image. <code>predict</code> returns the detected bounding boxes. </p>
<p>Next, for each box it extracts the coordinates, class name, and probability in the same way as we did in the beginning of the tutorial. It adds this info to the output array. </p>
<p>Finally, the function returns the array of detected object coordinates and their classes.</p>
<p>After this, the array gets encoded to JSON and is returned to the frontend.</p>
<p>The last line of code starts the web server on port 8080 that serves the <code>app</code> Flask application.</p>
<p>To run the service, execute the following command:</p>
<pre><code class="lang-bash">python3 object_detector.py
</code></pre>
<p>If everything is working properly, you can open <code>http:///localhost:8080</code> in a web browser. It should show the <code>index.html</code> page. When you select any image file, it will process it and display bounding boxes around all detected objects (or just display the image if nothing is detected on it).</p>
<p>The web service we just created is universal. You can use it with any YOLOv8 model. At the moment, it detects traffic lights and road signs using the <code>best.pt</code> model we created. But you can change it to use another model, like the <code>yolov8m.pt</code> model we used earlier to detect cats, dogs, and all other object classes that pretrained YOLOv8 models can detect.</p>
<h1 id="conclusion">Conclusion</h1>

<p>In this tutorial, I guided you thought a process of creating an AI powered web application that uses the YOLOv8, a state-of-the-art convolutional neural network for object detection. </p>
<p>I showed you how to create models using the pre-trained models and prepare the data to train custom models. And finally we created a web application with a frontend and backend that uses the custom trained YOLOv8 model to detect traffic lights and road signs.</p>
<p>You can find a source code of this app in <a target="_blank" href="https://github.com/AndreyGermanov/yolov8_pytorch_python">this GitHub repository</a>. </p>
<p>For all these tasks, we used the Ultralytics high level APIs that come with the YOLOv8 package by default. These APIs are based on the PyTorch framework, which was used to create the bigger part of today's neural networks. </p>
<p>It's quite convenient on the one hand, but dependence on these high level APIs has a negative effect as well. If you need to run this web app in production, you should install all these environments there, including Python, PyTorch and the other dependencies. </p>
<p>To run this on a clean new server, you'll need to download and install more than 1 GB of third party libraries! This is definitely not the best way to go. </p>
<p>Also, what if you do not have Python in your production environment? What if all your other code is written in another programming language, and you do not plan to use Python? Or what if you want to run the model on a mobile phone with Android or iOS?</p>
<p>All this is to say that using Ultralytics packages is great for experimenting, training, and preparing the models for production. But in production itself, you have to load and use the model directly and not use those high-level APIs. </p>
<p>To do this, you need to understand how the YOLOv8 neural network works under the hood and write more code to provide input to the model and to process the output from it. This will make your apps faster and less resource-intense. You will not need to have PyTorch installed to run your object detection model. </p>
<p>Also, you will be able to run your models even without Python, using many other programming languages, including Julia, C++, Go, Node.js on backend, or even without backend at all. You can run the YOLOv8 models right in a browser, using only JavaScript on frontend. </p>
<p>Want to know how? This will be the topic of my next article about YOLOv8.</p>
<p>You can find me on <a target="_blank" href="https://www.linkedin.com/in/andrey-germanov-dev/">LinkedIn</a>, <a target="_blank" href="https://twitter.com/GermanovDev">Twitter</a>, and <a target="_blank" href="https://www.facebook.com/AndreyGermanovDev">Facebook</a> to know first about new articles like this one and other software development news.</p>
<p>Have a fun coding and never stop learning!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Deep Learning with Julia – How to Build and Train a Model using a Neural Network ]]>
                </title>
                <description>
                    <![CDATA[ By Andrey Germanov Julia is a general purpose programming language well suited for numerical analysis and computational science. Some consider it the future of machine learning and the most natural replacement for Python in this field. In the previou... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/deep-learning-with-julia/</link>
                <guid isPermaLink="false">66d45edb230dff01669057f5</guid>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Julia ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Tue, 07 Mar 2023 21:34:07 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/03/cover-1.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Andrey Germanov</p>
<p><a target="_blank" href="https://julialang.org/">Julia</a> is a general purpose programming language well suited for numerical analysis and computational science. Some consider it the future of machine learning and the most natural replacement for Python in this field.</p>
<p>In the previous post "<a target="_blank" href="https://www.freecodecamp.org/news/machine-learning-using-julia/">Machine learning with Julia – How to Build and Deploy a Trained AI Model as a Web Service</a>" I introduced the basic machine learning features of Julia and explained why it's so good for this.</p>
<p>In this article, I want to move one step forward and explore deep learning features of Julia to show how you can use it to solve computer vision tasks using neural networks. </p>
<p>Computer vision is one of the most impressive areas of artificial intelligence. It includes such interesting tasks as image classification, text recognition, object detection and image segmentation. Neural networks showed the best performance in solving computer vision problems.</p>
<p>In this tutorial, I will guide you through the process of building and training a neural network to recognize handwritten digits using Julia. I will also explain how to create a website that will use the trained model to read handwritten phone numbers. </p>
<p>Here's what we'll cover:</p>
<ol>
<li><a class="post-section-overview" href="#heading-what-should-you-know-in-advance">What should you know in advance</a></li>
<li><a class="post-section-overview" href="#heading-handwritten-digits-recognition-workflow">Handwritten digits recognition workflow</a></li>
<li><a class="post-section-overview" href="#heading-how-to-collect-initial-image-data">How to collect initial image data</a></li>
<li><a class="post-section-overview" href="#heading-how-to-work-with-images-in-julia">How to work with images in Julia</a></li>
<li><a class="post-section-overview" href="#heading-how-to-prepare-the-image-data-for-machine-learning">How to prepare the image data for machine learning</a></li>
<li><a class="post-section-overview" href="#heading-how-to-create-a-machine-learning-model">How to create a machine learning model</a></li>
<li><a class="post-section-overview" href="#heading-how-to-train-the-model">How to train the model</a></li>
<li><a class="post-section-overview" href="#heading-how-to-evaluate-the-accuracy-of-the-trained-model">How to evaluate the accuracy of the trained model</a></li>
<li><a class="post-section-overview" href="#heading-how-to-create-and-train-the-convolutional-neural-network">How to create and train the convolutional neural network</a></li>
<li><a class="post-section-overview" href="#how-to-export-the-trained-model-to-a-file">How to export the trained model to a file</a></li>
<li><a class="post-section-overview" href="#heading-how-to-create-a-frontend">How to create a frontend</a></li>
<li><a class="post-section-overview" href="#heading-how-to-create-a-backend">How to create a backend</a></li>
<li><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></li>
</ol>
<h2 id="what-should-you-know-in-advance">What should you know in advance</h2>

<p>This tutorial assumes that you have basic Julia knowledge, that possible to get by reading my <a target="_blank" href="https://www.freecodecamp.org/news/machine-learning-using-julia">previous article</a>. That article also includes instructions on how to install Julia and integrate it with Jupyter notebook, which will be used to write most of the code.</p>
<p>The "Handwritten digit recognition using deep learning" problem and the theory that stands behind it is well known. That is why I will cover it only briefly. There are many good resources that explain how neural networks are used to solve the image classification tasks. Personally, I recommend watching <a target="_blank" href="https://www.youtube.com/watch?v=aircAruvnKk">this video</a> and read the first chapter of this great <a target="_blank" href="http://neuralnetworksanddeeplearning.com/chap1.html">online book</a>. </p>
<p>The goal of this tutorial is only to show you how to implement the theory, explained in those resources, using Julia.</p>
<h2 id="handwritten-numbers-recognition-workflow">Handwritten digits recognition workflow</h2>

<p>To build a machine learning model we will use the <a target="_blank" href="https://fluxml.ai/">Flux.jl</a> framework which is a pure Julia implementation of most well-known neural network types including <a target="_blank" href="https://deepai.org/machine-learning-glossary-and-terms/feed-forward-neural-network">feed forward</a>, <a target="_blank" href="https://deepai.org/machine-learning-glossary-and-terms/convolutional-neural-network">convolutional</a> and <a target="_blank" href="https://deepai.org/machine-learning-glossary-and-terms/recurrent-neural-network">recurrent</a> networks.</p>
<p>Recognizing handwritten numbers is a supervised machine learning task of image classification. To implement it, you need to have a labeled dataset of handwritten digits and use it to train the machine learning model. </p>
<p>This is how the ML workflow looks:</p>
<ul>
<li>Collect the images of handwritten digits for recognition.</li>
<li>Prepare a labeled dataset for machine learning by cleaning and labeling the data.</li>
<li>Create a machine learning model to recognize handwritten digits.</li>
<li>Train the model using training dataset.</li>
<li>Evaluate the accuracy of the trained model by feeding it with data from a testing dataset.</li>
<li>After achieving good accuracy, export the model to a file to use in applications.</li>
</ul>
<h2 id="how-to-collect-initial-image-data">How to collect initial image data</h2>

<p>The first step of any machine learning task is to collect the data that will be used for training. Usually this is the bigger part of the whole process.</p>
<p>How do you collect handwritten digits for this? Well, for example, you can ask all your friends in social networks to write down digits from 0 to 9 and save them to images. They also can ask their friends to do the same and finally send all these images to you. </p>
<p>The more data you collect, the better for machine learning.</p>
<p>Then, you could create folders with names from "0" to "9" and arrange these images within them. Also, you need to convert the images to the same format: convert to grayscale and resize them. All images should have the same size and color format. </p>
<p>Finally, you'll have a labeled collection of handwritten digits that are ready to work with. </p>
<p>Fortunately, you do not need to do all this manual work, because it was already done in 1998 by the National Institute of Standards and Technology. The database of handwritten digits, that called MNIST, is available to download from Kaggle or from many other places. For example, you can download and extract the MNIST archive using <a target="_blank" href="https://www.kaggle.com/datasets/jidhumohan/mnist-png">this link</a>. </p>
<p>This database is already split into testing and training data in appropriate folders. Each of these folders contains images of handwritten digits, classified to folders from "0" to "9". There are 60000 images in the <code>training</code> folder and 10000 images in the <code>testing</code> folder:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/minst.png" alt="Image" width="600" height="400" loading="lazy">
<em>MNIST database images</em></p>
<p>Each file is a 28x28 gray scaled image. We will use the content of the <code>training</code> folder to prepare the dataset for training the neural network model. Then we will use the content of the <code>testing</code> folder to validate the accuracy of the trained model. Before doing that, we need to convert this raw data to datasets.</p>
<p>In order to continue, run the Jupyter notebook and create a new notebook in it, selecting "Julia" as a language. Then, copy the <code>training</code> and <code>testing</code> folders with images to the folder in which you created the notebook.</p>
<h2 id="how-to-work-with-images-in-julia">How to work with images in Julia</h2>

<p>An image is not a natural data format for machine learning models. The models understand only numbers. That is why, to prepare the images for machine learning, you need to load them and convert to numbers. </p>
<p>To work with images in Julia, we will use the <a target="_blank" href="https://juliaimages.org/stable/">Julia Images</a> library. Using this library, you can load the image, convert it to matrix of pixels, and apply different transformations that can be required before pushing it to ML. The transformations include resizing, converting from color to black and white, inverting, cropping, and more.</p>
<p>To start working with these functions, you need to install the <code>Images</code> package and import it to your notebook:</p>
<pre><code class="lang-julia"><span class="hljs-keyword">using</span> Pkg
Pkg.add(<span class="hljs-string">"Images"</span>)
<span class="hljs-keyword">using</span> Images
</code></pre>
<h3 id="heading-how-to-load-and-view-the-image">How to load and view the image</h3>
<p>You can use the <code>load</code> function to load the image. Let's load the first digit from our training dataset. If this file exists, it should load it to the <code>img</code> variable and display the image itself:</p>
<pre><code class="lang-julia">img = load(<span class="hljs-string">"training/0/1.png"</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/image1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Loaded digit image</em></p>
<p>This is a loaded digit. Let's see the shape of the <code>img</code> variable:</p>
<pre><code class="lang-julia">size(img)
</code></pre>
<p>(28,28)</p>
<p>As you see, the <code>img</code> variable is an 2D array or matrix of image pixels. The first dimension of the array is a number of rows and the second dimension is a number of columns. That is why the height of image is the first value and the width of image is the second value. </p>
<p>Let's see the type of this variable now:</p>
<pre><code class="lang-julia">typeof(img)
</code></pre>
<p>Matrix{Gray{N0f8}} (alias for Array{Gray{Normed{UInt8, 8}}, 2})</p>
<p>It shows that this is a matrix of "Gray" objects. The <code>Gray</code> type defines a gray pixel. It means that the image that we loaded does not have color information. </p>
<p>The <code>Gray</code> data type defines the pixel by a single value – the intensity of gray color in a range between 0 and 1. So, the 0 is completely black and the 1 is completely white. </p>
<p>You can change a color of any pixel using the following code:</p>
<pre><code class="lang-julia">img[<span class="hljs-number">5</span>,<span class="hljs-number">5</span>] = Gray(<span class="hljs-number">0.5</span>)
</code></pre>
<p>This way you set the average gray color to the specified pixel (which was previously black).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/pixel_changed.png" alt="Image" width="600" height="400" loading="lazy">
<em>The image with modified pixel</em></p>
<p>If you load the full color image and request its type, it will show something like this:</p>
<p>Matrix{RGB{N0f8}} (alias for Array{RGB{Normed{UInt8, 8}}, 2})</p>
<p>In this case, each pixel has a type of <code>RGB</code> which defined by 3 values: intensity of <strong>R</strong>ed, intensity of <strong>G</strong>reen and intensity of <strong>B</strong>lue. Also, if you run <code>size(img)</code> for a colored image, you will see that this is a 3D array, like this:</p>
<p>(3,28,28)</p>
<p>where the first dimension is a number of color channels, the second dimension is a height and the third dimension is a width. </p>
<p>In other words, this color image consists of three matrices of 28x28 size. Each of them contains intensities of the appropriate color. </p>
<p>To set the color of any pixel in this image, you need to specify intensities of 3 channels in the <code>RGB</code> type constructor:</p>
<pre><code class="lang-julia">img[<span class="hljs-number">5</span>,<span class="hljs-number">5</span>] = RGB(<span class="hljs-number">1</span>,<span class="hljs-number">0.5</span>,<span class="hljs-number">0</span>)
</code></pre>
<p>This code sets the pixel color to orange.</p>
<h3 id="heading-how-to-implement-basic-image-transformations">How to implement basic image transformations</h3>
<p>Because the image is an array, you can use the array syntax to get access to any part of the image or even to individual pixels. </p>
<p>For example, you can run this to extract the first 10 rows and 20 columns of this image and write them to the new image:</p>
<pre><code class="lang-julia">img2 = img[<span class="hljs-number">1</span>:<span class="hljs-number">10</span>,<span class="hljs-number">1</span>:<span class="hljs-number">20</span>]
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/top_crop_image.png" alt="Image" width="600" height="400" loading="lazy">
<em>Part of image</em></p>
<p>You can crop the image by 5 pixels from all sides:</p>
<pre><code class="lang-julia">img3 = img[<span class="hljs-number">5</span>:<span class="hljs-number">22</span>,<span class="hljs-number">5</span>:<span class="hljs-number">22</span>]
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/cropped_image.png" alt="Image" width="600" height="400" loading="lazy">
<em>Cropped image</em></p>
<p>You can apply different filters to the image by applying the specified function to each element of the matrix, using the <a target="_blank" href="https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting">Julia broadcasting</a> feature via "dot" syntax. </p>
<p>For example, this code applies the <code>Gray</code> function to each pixel of the image. This approach can be used to convert images from colored to grayscale:</p>
<pre><code class="lang-julia">img4 = Gray.(img)
</code></pre>
<p>Similarly, you can convert gray images to colored:</p>
<pre><code class="lang-julia">img5 = RGB.(img)
</code></pre>
<p>You can apply custom functions to each pixel. For example, if you apply the next anonymous function to the gray image this way:</p>
<pre><code class="lang-julia">img6 = (x-&gt; Gray(<span class="hljs-number">1</span>)-x.val).(img)
</code></pre>
<p>it will invert the image colors by subtracting the color value of each pixel from 1. If the <code>img</code> has a white digit on a black background, then the <code>img6</code> will have a black digit on a white background:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/inverted_image.png" alt="Image" width="600" height="400" loading="lazy">
<em>Inverted image</em></p>
<p>Finally, to resize the image, you can use the <code>imresize</code> function. For example, to resize the <code>img</code> to 50x50 pixels, you can use the following code:</p>
<pre><code class="lang-julia">img6 = imresize(img,(<span class="hljs-number">50</span>,<span class="hljs-number">50</span>))
</code></pre>
<p>We will use only the features described above to prepare the images for handwritten digit recognition. But the <code>Images</code> module has many more interesting and fun things. Watch <a target="_blank" href="https://www.youtube.com/watch?v=DGojI9xcCfg">this video</a> to see some of them. Also, you can find a lot of interesting information in <a target="_blank" href="https://www.packtpub.com/product/hands-on-computer-vision-with-julia/9781788998796">this book</a>.</p>
<h3 id="heading-how-to-convert-the-image-to-numeric-matrix">How to convert the image to numeric matrix</h3>
<p>The last image preprocessing step is converting the pixels to numbers, because objects of type <code>Gray()</code> or <code>RGB()</code> are not suitable as an input for the machine learning model. </p>
<p>You can do this in two steps. First, you need to apply the <code>channelview</code> function to the image to get the matrix view of the image object, and then, convert the result to float numbers. So, if you run this command:</p>
<pre><code class="lang-julia">data = <span class="hljs-built_in">Float32</span>.(channelview(img))
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/channelview2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Image matrix</em></p>
<p>you will get the matrix, where each value is a float number that represents an intensity of the corresponding pixel. This data is ready to go to the neural network.</p>
<h2 id="how-to-prepare-the-image-data-for-machine-learning">How to prepare the image data for machine learning</h2>

<p>As I wrote in a <a target="_blank" href="https://www.freecodecamp.org/news/machine-learning-using-julia/#how-to-prepare-the-training-data-for-machine-learning">previous article</a>, the training dataset should consist of data from the feature matrix and from the labels vector. Both should contain only numbers. </p>
<p>Let's go back to our image collections in the <code>training</code> and <code>testing</code> folders. The labels are subfolder names where images located. They are already numbers. The features of an image are the pixels. Each pixel is defined by its color intensity. </p>
<p>So, to create a dataset that is ready for training from the images folder, you need to read all files from all subfolders, convert them to matrices of float numbers, and put them in the array. </p>
<pre><code class="lang-julia">path = <span class="hljs-string">"training"</span>
X = []
y = []
<span class="hljs-keyword">for</span> label <span class="hljs-keyword">in</span> readdir(path)
    <span class="hljs-keyword">for</span> file <span class="hljs-keyword">in</span> readdir(<span class="hljs-string">"<span class="hljs-variable">$path</span>/<span class="hljs-variable">$label</span>"</span>)
        img = load(<span class="hljs-string">"<span class="hljs-variable">$path</span>/<span class="hljs-variable">$label</span>/<span class="hljs-variable">$file</span>"</span>)
        data = reshape(<span class="hljs-built_in">Float32</span>.(channelview(img)),<span class="hljs-number">28</span>,<span class="hljs-number">28</span>,<span class="hljs-number">1</span>)
        <span class="hljs-keyword">if</span> length(X) == <span class="hljs-number">0</span>
            X = data
        <span class="hljs-keyword">else</span>
            X = cat(X,data,dims=<span class="hljs-number">3</span>)
        <span class="hljs-keyword">end</span>
        push!(y,parse(<span class="hljs-built_in">Float32</span>,label))
    <span class="hljs-keyword">end</span>
<span class="hljs-keyword">end</span>
</code></pre>
<p>Ensure that the "training" and the "testing" folders with the MNIST images exist in the current folder before running this program. It will take a while to execute this code, because it will load 60000 images and will convert them to matrices. </p>
<p>In the outer loop, it reads the contents of the "training" folder. There are subfolders with names from 0 to 9 that will be used as labels. </p>
<p>Then, in the inner loop, it reads all image files of each of these subfolders using the <code>load</code> function from the <code>Images</code> package. </p>
<p>Next, it converts each image to the matrix of color intensities and places it in the <code>data</code> variable. After that, it appends this matrix to <code>X</code>. </p>
<p>Finally, it appends the name of the subfolder (which is an actual digit) to the labels vector <code>y</code>. </p>
<p>This way, you will have a dataset with feature matrix in <code>X</code> and labels vector in <code>y</code>. Let's refactor this code to a function to be able to reuse it to convert any folder with images, classified this way, to the dataset.</p>
<pre><code class="lang-julia"><span class="hljs-keyword">using</span> Images
<span class="hljs-keyword">function</span> createDataset(path)
    X = []
    y = []
    <span class="hljs-keyword">for</span> label <span class="hljs-keyword">in</span> readdir(path)
        <span class="hljs-keyword">for</span> file <span class="hljs-keyword">in</span> readdir(<span class="hljs-string">"<span class="hljs-variable">$path</span>/<span class="hljs-variable">$label</span>"</span>)
            img = load(<span class="hljs-string">"<span class="hljs-variable">$path</span>/<span class="hljs-variable">$label</span>/<span class="hljs-variable">$file</span>"</span>)
            data = reshape(<span class="hljs-built_in">Float32</span>.(channelview(img)),<span class="hljs-number">28</span>,<span class="hljs-number">28</span>,<span class="hljs-number">1</span>)
            <span class="hljs-keyword">if</span> length(X) == <span class="hljs-number">0</span>
                X = data
            <span class="hljs-keyword">else</span>
                X = cat(X,data,dims=<span class="hljs-number">3</span>)
            <span class="hljs-keyword">end</span>
            push!(y,parse(<span class="hljs-built_in">Float32</span>,label))
        <span class="hljs-keyword">end</span>
    <span class="hljs-keyword">end</span>
    <span class="hljs-keyword">return</span> X,y
<span class="hljs-keyword">end</span>
</code></pre>
<p>Using this function, you can now easily create both training and testing datasets:</p>
<pre><code class="lang-julia">x_train, y_train = createDataset(<span class="hljs-string">"training"</span>)
x_test, y_test = createDataset(<span class="hljs-string">"testing"</span>)
</code></pre>
<h2 id="how-to-create-a-machine-learning-model">How to create a machine learning model</h2>

<p>We will use a neural network to create a model and train it using the training data. To work with neural networks we will use the <a target="_blank" href="https://fluxml.ai/">Flux.jl framework</a> which allows you to create and train neural networks of various types, including feed forward, convolutional, and recurrent. </p>
<p>For handwritten image classification, we will implement both the Feed Forward and the Convolutional networks and compare their accuracy. If you need to, you can review the basics of neural networks by <a target="_blank" href="https://www.youtube.com/watch?v=aircAruvnKk&amp;t=313s">watching this video</a>. Now is the best time to watch this before you continue reading.</p>
<h3 id="heading-neural-network-basics">Neural network basics</h3>
<p>A neural network is a chain of layers. Each layer has a defined number of neurons with inputs and outputs. </p>
<p>To convert input to output for each layer, the neurons use the activation function, defined for this layer. Features of the image are the inputs of the first layer, and the classification results are the outputs of the last layer.</p>
<p>The best way to understand all this is to visualize some neural network architecture. Let's see the following basic neural net of 3 layers:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/dense_net.png" alt="Image" width="600" height="400" loading="lazy">
<em>Feed forward neural network for digits recognition. Source: http://neuralnetworksanddeeplearning.com/chap1.html</em></p>
<p>In this picture, the input layer contains 784 neurons that should receive the features of each image. As you remember, the training dataset consists of 28x28 images, which is 784 pixels. This is how this neural network works:</p>
<ul>
<li>The color value of each pixel goes to each neuron of the input layer.</li>
<li>Each neuron of the input layer sends its value to each neuron of the hidden layer. </li>
<li>Each neuron of the hidden layer has a weight coefficient for each input. By default, these coefficients are random numbers. So, each neuron on the hidden layer receives input values from the previous layer and multiplies each input by the appropriate weight, summarizes these products, and applies the activation function to that sum.</li>
<li>Each neuron of the hidden layer sends the resulting sum to each neuron of the output layer, which has 10 neurons.</li>
<li>The output layer does exactly the same for each input value as the previous layer and finally accumulates some sum inside.</li>
<li>This sum is treated as a probability of the appropriate digit, for example the first neuron should contain the probability that the input image is "0", the second neuron should contain the probability that the image is "1", and so on. </li>
</ul>
<p>Then, the application should look at which of these 10 neurons has the highest value and make the appropriate prediction.</p>
<h3 id="heading-how-to-create-the-neural-network-with-flux">How to create the neural network with Flux</h3>
<p>Let's create this neural network using Flux. If you haven't installed and imported it yet, do this in your notebook:</p>
<pre><code class="lang-julia"><span class="hljs-keyword">using</span> Pkg
Pkg.add(<span class="hljs-string">"Flux"</span>)
<span class="hljs-keyword">using</span> Flux
</code></pre>
<p>As you have seen, the neural network is a chain of layers with different parameters. So, Flux has a <code>Chain</code> function that you use to construct neural networks. Let's construct that network:</p>
<pre><code class="lang-julia">model = Chain(
    Flux.flatten,
    Dense(<span class="hljs-number">784</span>=&gt;<span class="hljs-number">15</span>,relu),
    Dense(<span class="hljs-number">15</span>=&gt;<span class="hljs-number">10</span>,sigmoid),
    softmax
)
</code></pre>
<p>The <code>Chain</code> receives an array of functions as arguments. Each function defines a layer and it's parameters. Each of these functions receives some inputs, then after the appropriate actions returns the outputs and forwards them as inputs to the next function in the chain. </p>
<p>So, this is how the defined neural network works:</p>
<ul>
<li>The input image, which is a 28x28 array of pixel color intensities, comes to the <code>Flux.flatten</code> function. This function just converts this 28x28 matrix to a vector with 784 elements. This way we constructed the input for the first Dense layer.</li>
<li>Then, the next Dense function receives 784 values by 15 neurons. Then it multiplies these values by weights, summarizes these products, applies the <code>[relu](https://fluxml.ai/Flux.jl/stable/models/activation/#NNlib.relu)</code> activation function to this sum, and forwards these 15 values to 10 neurons of the next layer.</li>
<li>Next, the dense layer also multiplies each 15 inputs by the weight coefficients, summarizes them, and applies the <code>sigmoid</code> activation function to convert these sums to fractions of 1.</li>
<li>The final <code>[softmax](https://en.wikipedia.org/wiki/Softmax_function)</code> function actually doesn't build a new layer, but it just converts values that accumulated in the 10 neurons of the output layer to correct probabilities to properly show the probability distribution. Applying this function ensures that the sum of all 10 probabilities is equal to 1. The array of these probabilities will be returned by the model as a result.</li>
</ul>
<p>You can call the <code>model</code> which you just created as a function by passing an image matrix as an input argument. </p>
<p>You can run the model to predict the digit for the first image from the training dataset using the following code:</p>
<pre><code class="lang-julia">predict = model(Flux.unsqueeze(x_train[:,:,<span class="hljs-number">1</span>],dims=<span class="hljs-number">3</span>))
</code></pre>
<p>We use the <code>[unsqueeze](https://fluxml.ai/Flux.jl/stable/utilities/#Flux.unsqueeze)</code> function here to convert the image without channels of the (28,28) shape to the single channel image of the (28,28,1) shape. </p>
<p>This is an important rule for deep neural network processing – that the image is something that has a width, height, and color channels. So, even if it has only a single channel, it must be specified.</p>
<p>The model function receives the input image matrix, passes it through a chain of layers, and returns the array of probabilities.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/probs1-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>New neural network probabilities</em></p>
<p>As you can see, the highest probability has a neuron number 2 (0.12457416) which means that the model predicted the digit "1". However, if you check the real answer in the labels vector:</p>
<pre><code class="lang-julia">y_train[<span class="hljs-number">1</span>]
</code></pre>
<p>you will see "0", so the prediction is incorrect. This is because this model is untrained and just uses random weights to calculate the output for each layer. You need to train it to adjust these weights and calculate more accurate probability.</p>
<h2 id="how-to-train-the-model">How to train the model</h2>

<p>Flux.jl has different approaches to training a model. The most obvious one is the <code>[Flux.train](https://fluxml.ai/Flux.jl/stable/training/reference/#Flux.Optimise.train!-NTuple{4,%20Any})</code> function. The function runs the following training process:</p>
<ul>
<li>The function receives the training dataset as an argument, including the features matrix and the labels vector.</li>
<li>The function runs the <code>model</code> for each row of the training dataset and receives the resulting probabilities array.</li>
<li>The function compares these probabilities with the true values from the labels vector and calculates the <strong>amount of error</strong> (about this later).</li>
<li>Using information about the error, the function adjusts the weights and bias for each neuron on each layer.</li>
</ul>
<p>Usually you need to run this training process many times in a loop. On each iteration it will adjust the weights for each neuron, decreasing the error value more and more.</p>
<p>This visualization shows how the training process in a loop works for a single neuron on a single layer. For the whole network it works similar.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/1677823812311.gif" alt="Image" width="600" height="400" loading="lazy">
<em>The training process in a loop for a single neuron</em></p>
<p>This is a syntax of the <code>train</code> function:</p>
<pre><code class="lang-julia">Flux.train!(loss_function, model, data, optimizer)
</code></pre>
<p>Let's break this down:</p>
<ul>
<li><code>loss_function</code> – as I described before, during the training process, the <code>train</code> function measures the amount of error. To do this, it uses the <code>loss_function</code>, which you should define and provide here.   </li>
</ul>
<p>This function receives the model, the row of the training data, and the truth label. Based on these arguments, the loss function should make a prediction by passing the row of data through the model, comparing this prediction with the truth label, calculating the difference between them, and returning the amount of error as a float number.  </p>
<p>There are different algorithms exist to calculate the amount of error for different machine learning problem types. For classification problems we will use <strong><a target="_blank" href="https://fluxml.ai/Flux.jl/stable/models/losses/#Flux.Losses.crossentropy">cross entropy</a></strong>.</p>
<ul>
<li><code>model</code> – the neural network model to train.</li>
<li><code>data</code> – the training data that includes both <code>x_train</code> and <code>y_train</code> assembled to a single array of tuples. You can do this simply by using the <code>[Flux.DataLoader](https://fluxml.ai/Flux.jl/v0.10/data/dataloader/)</code> function, which we will use below.</li>
<li><code>optimizer</code> – as described above, after measuring the amount of error, the function adjusts the weights to decrease the error. The weights are not adjusted randomly, but by the <code>optimizer</code> that defines the algorithm. You use it to adjust the weights in the correct direction.   </li>
</ul>
<p>Most of the weight adjustment algorithms are based on <a target="_blank" href="https://builtin.com/data-science/gradient-descent">Gradient Descent</a>. In particular, we will use the <a target="_blank" href="https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.Adam">ADAM</a> optimizer, which is very common today.</p>
<p>Let's connect all these parts together in the following code:</p>
<pre><code class="lang-julia"><span class="hljs-comment"># Assemble the training data</span>
data = Flux.DataLoader((x_train,y_train), shuffle=<span class="hljs-literal">true</span>)

<span class="hljs-comment"># Initialize the ADAM optimizer with default settings</span>
optimizer = Flux.setup(Adam(), model)

<span class="hljs-comment"># Define the loss function that uses the cross-entropy to </span>
<span class="hljs-comment"># measure the error by comparing model predictions of data </span>
<span class="hljs-comment"># row "x" with true data label in the "y"</span>
<span class="hljs-keyword">function</span> loss(model, x, y)
    <span class="hljs-keyword">return</span> Flux.crossentropy(model(x),Flux.onehotbatch(y,<span class="hljs-number">0</span>:<span class="hljs-number">9</span>))
<span class="hljs-keyword">end</span>

<span class="hljs-comment"># Train the model 10 times in a loop</span>
<span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:<span class="hljs-number">10</span>
    Flux.train!(loss, model, data, optimizer)
<span class="hljs-keyword">end</span>
</code></pre>
<p>For each row of data, the <code>Flux.train!</code> calls the loss function, then the <code>loss</code> function runs the <code>model</code>. Using cross entropy, it calculates the difference between the predictions with true values of this row. This difference is returned as an error, and then the <code>optimizer</code> is used to adjust the weights of the model neurons based on this error value and the <code>loss</code> function. On each iteration, the error value should go down.</p>
<p>Finally, after running the training process, you can check how it predicts the digit for the first image using the trained model:</p>
<pre><code class="lang-julia">predict = model(Flux.unsqueeze(x_train[:,:,<span class="hljs-number">1</span>],dims=<span class="hljs-number">3</span>))
</code></pre>
<p>When I did that, I received the following probabilities:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/probs2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Trained model probabilities</em></p>
<p>The first one, related to "0" is the highest and this is definitely true. You can try to check other images, like image number 100 or 200. But it doesn't make much sense to measure model quality this way, because this is a training data that the model has already seen. Only the testing data should be used to measure the accuracy of the model.</p>
<h2 id="how-to-evaluate-the-accuracy-of-the-trained-model">How to evaluate the accuracy of the trained model</h2>

<p>We have the testing dataset in the <code>x_test</code> features matrix and in the <code>y_test</code> labels vector. We will run the <code>model</code> for each row of this data and measure the accuracy: the number of correct predictions divided by the number of all predictions.</p>
<p>Let's create a function for this:</p>
<pre><code class="lang-julia"><span class="hljs-keyword">function</span> accuracy()
    correct = <span class="hljs-number">0</span>
    <span class="hljs-keyword">for</span> index <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:length(y_test)
        probs = model(Flux.unsqueeze(x_test[:,:,index],dims=<span class="hljs-number">3</span>))
        predicted_digit = argmax(probs)[<span class="hljs-number">1</span>]-<span class="hljs-number">1</span>
        <span class="hljs-keyword">if</span> predicted_digit == y_test[index]
            correct +=<span class="hljs-number">1</span>
        <span class="hljs-keyword">end</span>
    <span class="hljs-keyword">end</span>
    <span class="hljs-keyword">return</span> correct/length(y_test)
<span class="hljs-keyword">end</span>
</code></pre>
<p>The function goes over all items of the testing dataset. For each item it runs the model and receives the <code>probs</code> array. Then, it writes an index of the highest probability using the <code>[argmax](https://docs.julialang.org/en/v1/base/collections/#Base.argmax)</code> function to the <code>predicted_digit</code> variable. Next it compares the predicted digit with the truth value from <code>y_test</code> labels vector and increases the number of correct predictions if they match. The function returns the quotient of the number of correct predictions and the total number of rows.</p>
<p>Now you can run this function to see the accuracy. For example, when I ran this, I received the 0.9455, which is about 94.6%. </p>
<p>However, it's better to place this function call inside the training loop, right after the <code>Flux.train!</code> line to see how the accuracy changes after each training iteration.</p>
<pre><code class="lang-julia"><span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:<span class="hljs-number">10</span>
    Flux.train!(loss, model, data, optimizer)
    println(accuracy())
<span class="hljs-keyword">end</span>
</code></pre>
<p>Then run the training again. You should receive output similar to this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/accuracies.png" alt="Image" width="600" height="400" loading="lazy">
<em>Accuracy of the neural network</em></p>
<p>It shows that accuracy was going up until the 6th iteration. Since then, it started to go down, which could be a sign that the model started to overfit.</p>
<p>To increase the prediction quality, you can either add more data to the training dataset or change the model architecture. </p>
<p>For example, you can add more Dense layers, increase the number of neurons on the hidden layer, or change activation functions from <code>relu</code> to <code>sigmoid</code> or vice versa.</p>
<p>When I increased the number of neurons from 15 to 42 on the hidden layer and then removed the <code>sigmoid</code> activation from the output layer, I've achieved about 97% accuracy. But when I added one more hidden layer before output, the accuracy dropped to 90%. </p>
<p>So, building the neural net architecture is like art – you need to try different options a lot of times and finally select the one that works the best. </p>
<p>Regardless of the options I chose, I could never achieve more than 97%. Also, when I finally tried to use this network architecture in production with real handwritten digits from users, the prediction quality was poor. Very often it could not recognize the 7 digit properly, and it recognized 1 as 4 and 6 as 5.</p>
<p>This is because using the feed forward neural network, in which we just put all 784 pixels of the image as an input without any filters, is not the best approach.</p>
<p>For most machine learning tasks with images, the <strong>Convolutional</strong> neural networks is the better option. We will create and try this one in the next section.</p>
<h2 id="how-to-create-and-train-the-convolutional-neural-network">How to create and train the convolutional neural network</h2>

<p>The most important step during the machine learning process is data preprocessing. If input features are processed properly, then the prediction accuracy will be better. </p>
<p>To increase the model quality, you need to remove noise from data, or features that are not relevant for the value that you need to predict. </p>
<p>Also, oftentimes you need to create new features from existing ones that could be more relevant to the result. </p>
<p>For example, for the Titanic machine learning problem, you can remove such features as "Passenger ID" and "Passenger name", because they can't help to predict whether the passenger might survive or not. </p>
<p>Also, if you have a task to predict the price of a flat and have input data with fields of room areas like "Area 1", "Area 2" and so on, you can create a new field "Total Flat Area" and write the sum of all room areas to it. </p>
<p>Perhaps this new feature that you generated is more relevant than others for the model, so you can remove the fields from which you generated that new column.</p>
<p>Using these techniques, you generalize the data by keeping and creating the features that are important and by removing others that can only confuse the machine learning model.</p>
<p>When working with tabular data, you can use your own experience or statistical methods to find which features to generate or remove from input data. But when working with images, things are not as clear as with strings or numbers.</p>
<p>For example, the model for the handwritten digits recognition task receives the 784 pixel colors in a single row as an input. They have an equal value from a human point of view, and it's unknown which of them are more important and which of them are less.</p>
<p>To help you in this, you can use <strong>convolutional neural networks</strong> to preprocess this kind of data. They help you do the feature engineering automatically.</p>
<p>You build a convolutional neural network from two types of layers:</p>
<ul>
<li><strong>Convolution layers</strong> used to generate new features from input image pixels.</li>
<li><strong>Pooling layers</strong> used to generalize features using some rules and this way reduce their quantity.</li>
</ul>
<p>By combining these two types of layers in the chain, you can preprocess the input image matrix to receive a reduced number of the most valuable features. Then, you can train the network using these generated features as input data in the same way as you did before.</p>
<p>I think it's difficult to describe CNNs better than it's done in <a target="_blank" href="https://www.youtube.com/watch?v=JB8T_zN7ZC0">this video</a>, so I highly recommend watching it (or at least the first 15 minutes) before continue. It clearly explains the theoretical aspects of all steps that you will do below.</p>
<p>So, let's review the neural network that you have now:</p>
<pre><code class="lang-julia">model = Chain(
    Flux.flatten,
    Dense(<span class="hljs-number">784</span>=&gt;<span class="hljs-number">15</span>,relu),
    Dense(<span class="hljs-number">15</span>=&gt;<span class="hljs-number">10</span>,sigmoid),
    softmax
)
</code></pre>
<p>The only data preprocessing step here is the <code>Flux.flatten</code>, that receives the image of 28x28 pixels and returns it joined to a single row of 784 numbers. We need to add some convolution layers before the <code>Flux.flatten</code> to give to our network the ability to generate better features than just raw pixels.</p>
<p>To create the convolution layer, the Flux.jl has the <code>[Conv](https://fluxml.ai/Flux.jl/stable/models/layers/#Flux.Conv)</code> function with the following main parameters:</p>
<pre><code class="lang-julia">Conv(filter,<span class="hljs-keyword">in</span>=&gt;out,activation_function)
</code></pre>
<ul>
<li><strong>filter</strong> defines dimensions of the kernel matrix that will be applied to each pixel of the input matrix to create a feature from it. For example, the value (3,3) defines the 3x3 kernel matrix. This is how the convolution using this kernel matrix works to generate the features for an image of 6x6 size:</li>
</ul>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/2D_Convolution_Animation.gif" alt="Image" width="600" height="400" loading="lazy">
<em>How convolution layer works</em></p>
<ul>
<li><strong>in</strong> is the number of input image channels. For our input data, gray images have a single channel. For other layers, the number of <strong>in</strong> channels of current layer must be equal to the <strong>out</strong> channels of previous layer.</li>
<li><strong>out</strong> is the number of output channels after apply the convolution. In other words, it's a number of features that will be generated for each pixel.</li>
<li><strong>activation_function</strong> is the function that will be applied to each feature after convolution and before sending to the next layer of the network, the same as we did before for <code>Dense</code> layers.</li>
</ul>
<p>For example, if you add the following <code>Conv</code> layer on top of the others to the Chain:</p>
<pre><code class="lang-julia">model = Chain(
    Conv((<span class="hljs-number">5</span>,<span class="hljs-number">5</span>),<span class="hljs-number">1</span>=&gt;<span class="hljs-number">6</span>,relu),
    Flux.flatten,
    Dense(<span class="hljs-number">4704</span>=&gt;<span class="hljs-number">15</span>,relu),
    Dense(<span class="hljs-number">15</span>=&gt;<span class="hljs-number">10</span>,sigmoid),
    softmax
)
</code></pre>
<p>this network will get a single channel image of the following shape: (28,28,1). It will produce 6 matrices from this image by applying different convolution kernels of 5x5 to the input data. </p>
<p>The output of this layer will be the image of the following shape: (28,28,6). In other words, this convolution layer will generate 28<em>28</em>6 =  4704 features from 784 input pixels for our network.</p>
<p>But if you have more features, it does not mean that they are all good. Perhaps you need to generalize them and leave only the most valuable ones. This is why the pooling layers are created. </p>
<p>In Flux.jl, the pooling layer can be defined using the <code>[MaxPool](https://fluxml.ai/Flux.jl/stable/models/layers/#Flux.MaxPool)</code> function. It receives the pooling window dimensions as an argument.</p>
<p>For example, if you create the following MaxPool layer:</p>
<pre><code class="lang-julia">MaxPool((<span class="hljs-number">2</span>,<span class="hljs-number">2</span>))
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/maxpool_animation.gif" alt="Image" width="600" height="400" loading="lazy">
<em>How Max pool layer works</em></p>
<p>it will apply the 2x2 window to the input image. As you can see, for each window it selects the maximum value and adds it to the output. This way it reduces the input data by leaving only maximums in it. That is why it's called the MAX pool layer.</p>
<p>Let's add the MaxPool layer to the chain:</p>
<pre><code class="lang-julia">model = Chain(
    Conv((<span class="hljs-number">5</span>,<span class="hljs-number">5</span>),<span class="hljs-number">1</span>=&gt;<span class="hljs-number">6</span>,relu),
    MaxPool((<span class="hljs-number">2</span>,<span class="hljs-number">2</span>)),
    Flux.flatten,
    Dense(<span class="hljs-number">1176</span>=&gt;<span class="hljs-number">15</span>,relu),
    Dense(<span class="hljs-number">15</span>=&gt;<span class="hljs-number">10</span>,sigmoid),
    softmax
)
</code></pre>
<p>So, the MaxPool receives the (28,28,6) sized image from the convolution layer, applies the 2x2 max pool window to it, and outputs (14,14,6) image. After this, the 14<em>14</em>6=1176 generalized features are forwarded to the network layers below.</p>
<p>The main question is how to know which number of convolution and max pool layers to add, and which parameters to set for each of them to achieve good prediction accuracy. </p>
<p>Well, the first way is to try different options. But to build a good neural network architecture this way could take days, months, or even years.</p>
<p>Fortunately, for many machine learning tasks, it has already been done by other people. You can find suitable architectures for most of your problems, including the model for the handwritten digit recognition.</p>
<p>The most known architecture for this task was created by Yann LeCun, and it's named LeNet. You can find a full description and implementations of this model for different ML platforms <a target="_blank" href="https://d2l.ai/chapter_convolutional-neural-networks/lenet.html">here</a>. It was created exactly for the digit images from MNIST dataset. It's relatively old, but still used in many ATMs to recognize digits for processing deposits.</p>
<p>This is how this architecture looks: </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/LeNet-5.png" alt="Image" width="600" height="400" loading="lazy">
<em>LeNet architecture</em></p>
<p>Just like the network we created, this one consists of a convolutional part and a feed forward part. The convolutional net part consists of 2 Conv and 2 MaxPool layers. The feed forward neural network part consists of 3 dense layers.</p>
<p>You can create this network using Flux.jl this way:</p>
<pre><code class="lang-julia">model = Chain(
    Conv((<span class="hljs-number">5</span>,<span class="hljs-number">5</span>),<span class="hljs-number">1</span> =&gt; <span class="hljs-number">6</span>, relu),
    MaxPool((<span class="hljs-number">2</span>,<span class="hljs-number">2</span>)),
    Conv((<span class="hljs-number">5</span>,<span class="hljs-number">5</span>),<span class="hljs-number">6</span> =&gt; <span class="hljs-number">16</span>, relu),
    MaxPool((<span class="hljs-number">2</span>,<span class="hljs-number">2</span>)),
    Flux.flatten,
    Dense(<span class="hljs-number">256</span>=&gt;<span class="hljs-number">120</span>,relu),
    Dense(<span class="hljs-number">120</span>=&gt;<span class="hljs-number">84</span>, relu),
    Dense(<span class="hljs-number">84</span>=&gt;<span class="hljs-number">10</span>, sigmoid),
    softmax
)
</code></pre>
<p>After applying 2 convolutions and pooling to the input image matrix, the <code>Flux.flatten</code> layer receives the 4x4x16 image and converts it to 4<em>4</em>16=256 generalized features. Then they go through 3 dense layers to finally calculate probabilities for 10 digits.</p>
<p>Before training this model using the data from <code>x_train</code>, you need to reshape it a little bit. The convolution layer expects to get the data in the following 4-dimensional shape (width,height,channels,length), but the x_train has the following shape: (28,28,60000) which is 60000 images of 28x28. </p>
<p>To make it compatible, you need to reshape it to (28, 28, 1, 60000). You can do this using the following code:</p>
<pre><code class="lang-julia">x_train = reshape(x_train, <span class="hljs-number">28</span>, <span class="hljs-number">28</span>, <span class="hljs-number">1</span>, :)
</code></pre>
<p>You'll need to do the same with <code>x_test</code>:</p>
<pre><code class="lang-julia">x_test = reshape(x_test, <span class="hljs-number">28</span>, <span class="hljs-number">28</span>, <span class="hljs-number">1</span>, :)
</code></pre>
<p>To run this model, you also need to pass a 4 dimensional image structure to the <code>model</code> function. For example, to make a prediction for the first image, you can run this:</p>
<pre><code class="lang-julia">model(Flux.unsqueeze(x_test[:,:,:,<span class="hljs-number">1</span>],dims=<span class="hljs-number">4</span>))
</code></pre>
<p>Then you can train the model the same way as you did before. </p>
<p>This is the whole code to define and train the convolutional network:</p>
<pre><code class="lang-julia"><span class="hljs-comment"># Create a LeNet model</span>
model = Chain(
    Conv((<span class="hljs-number">5</span>,<span class="hljs-number">5</span>),<span class="hljs-number">1</span> =&gt; <span class="hljs-number">6</span>, relu),
    MaxPool((<span class="hljs-number">2</span>,<span class="hljs-number">2</span>)),
    Conv((<span class="hljs-number">5</span>,<span class="hljs-number">5</span>),<span class="hljs-number">6</span> =&gt; <span class="hljs-number">16</span>, relu),
    MaxPool((<span class="hljs-number">2</span>,<span class="hljs-number">2</span>)),
    Flux.flatten,
    Dense(<span class="hljs-number">256</span>=&gt;<span class="hljs-number">120</span>,relu),
    Dense(<span class="hljs-number">120</span>=&gt;<span class="hljs-number">84</span>, relu),
    Dense(<span class="hljs-number">84</span>=&gt;<span class="hljs-number">10</span>, sigmoid),
    softmax
)

<span class="hljs-comment"># Function to measure the model accuracy</span>
<span class="hljs-keyword">function</span> accuracy()
    correct = <span class="hljs-number">0</span>
    <span class="hljs-keyword">for</span> index <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:length(y_test)
        probs = model(Flux.unsqueeze(x_test[:,:,:,index],dims=<span class="hljs-number">4</span>))
        predicted_digit = argmax(probs)[<span class="hljs-number">1</span>]-<span class="hljs-number">1</span>
        <span class="hljs-keyword">if</span> predicted_digit == y_test[index]
            correct +=<span class="hljs-number">1</span>
        <span class="hljs-keyword">end</span>
    <span class="hljs-keyword">end</span>
    <span class="hljs-keyword">return</span> correct/length(y_test)
<span class="hljs-keyword">end</span>

<span class="hljs-comment"># Reshape the data</span>
x_train = reshape(x_train, <span class="hljs-number">28</span>, <span class="hljs-number">28</span>, <span class="hljs-number">1</span>, :)
x_test = reshape(x_test, <span class="hljs-number">28</span>, <span class="hljs-number">28</span>, <span class="hljs-number">1</span>, :)

<span class="hljs-comment"># Assemble the training data</span>
train_data = Flux.DataLoader((x_train,y_train), shuffle=<span class="hljs-literal">true</span>)

<span class="hljs-comment"># Initialize the ADAM optimizer with default settings</span>
optimizer = Flux.setup(Adam(), model)

<span class="hljs-comment"># Define the loss function that uses the cross-entropy to </span>
<span class="hljs-comment"># measure the error by comparing model predictions of </span>
<span class="hljs-comment"># data row "x" with true data from label "y"</span>
<span class="hljs-keyword">function</span> loss(model, x, y)
    <span class="hljs-keyword">return</span> Flux.crossentropy(model(x),Flux.onehotbatch(y,<span class="hljs-number">0</span>:<span class="hljs-number">9</span>))
<span class="hljs-keyword">end</span>

<span class="hljs-comment"># Train model 10 times in a loop</span>
<span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:<span class="hljs-number">10</span>
    Flux.train!(loss, model, train_data, optimizer)
    println(accuracy())
<span class="hljs-keyword">end</span>
</code></pre>
<p>After running this code, I received about 99% accuracy, which is close to ideal:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/conv_accuracy-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Accuracy of the convolutional network</em></p>
<p>Now it's time to save this model to a file and move it to production.</p>
<h2 id="how-to-export-trained-model-to-a-file">How to export trained model to a file</h2>

<p>Flux.jl models can be saved to BSON files. You need to import the <code>BSON</code> package and use the <code>@save</code> macro command to export the <code>model</code> object:</p>
<pre><code class="lang-julia"><span class="hljs-keyword">using</span> BSON
BSON.<span class="hljs-meta">@save</span> <span class="hljs-string">"digits.bson"</span> model
</code></pre>
<p>This will save the model to the <code>digits.bson</code> file into the current folder.</p>
<p>This is the end of your work in the Jupyter notebook. We'll implement the following code as a new application.</p>
<h2 id="how-to-create-a-frontend">How to create a frontend</h2>

<p>The application which you are going to create will allow a user to write their phone number and recognize it using the model that you created and trained before. The frontend page will look like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/frontend.png" alt="Image" width="600" height="400" loading="lazy">
<em>Frontend</em></p>
<p>Using this interface, the user can draw digits of a phone number in the boxes using the mouse, then press the "Recognise" button and display the recognised digits in the "Result" input field. </p>
<p>Also, there is a "Switch to eraser" button. When the user presses it, the drawing mode changes to the eraser mode and the user can erase any number in any box.</p>
<p>Let's start building the web application. Create a new folder with any name that you like. Then create an <code>index.html</code> file in it and copy the following code to this file:</p>
<pre><code class="lang-html"><span class="hljs-meta">&lt;!DOCTYPE <span class="hljs-meta-keyword">html</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">html</span> <span class="hljs-attr">lang</span>=<span class="hljs-string">"en"</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">head</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">meta</span> <span class="hljs-attr">charset</span>=<span class="hljs-string">"UTF-8"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">meta</span> <span class="hljs-attr">http-equiv</span>=<span class="hljs-string">"X-UA-Compatible"</span> <span class="hljs-attr">content</span>=<span class="hljs-string">"IE=edge"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">meta</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"viewport"</span> <span class="hljs-attr">content</span>=<span class="hljs-string">"width=device-width, initial-scale=1.0"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">title</span>&gt;</span>Phones reader<span class="hljs-tag">&lt;/<span class="hljs-name">title</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">head</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">body</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">h1</span>&gt;</span>Draw phone number and recognise it<span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"digits"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">strong</span>&gt;</span>+<span class="hljs-tag">&lt;/<span class="hljs-name">strong</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"50"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"50"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">strong</span>&gt;</span>(<span class="hljs-tag">&lt;/<span class="hljs-name">strong</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"50"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"50"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"50"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"50"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"50"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"50"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">strong</span>&gt;</span>)<span class="hljs-tag">&lt;/<span class="hljs-name">strong</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"50"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"50"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"50"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"50"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"50"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"50"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">strong</span>&gt;</span>-<span class="hljs-tag">&lt;/<span class="hljs-name">strong</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"50"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"50"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"50"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"50"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"50"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"50"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">canvas</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"50"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"50"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">canvas</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"buttons"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"mode"</span>&gt;</span>Switch to eraser<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"result"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"recognise"</span>&gt;</span>Recognise<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">label</span>&gt;</span>Result:<span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"result"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">body</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">script</span>&gt;</span><span class="javascript">
    <span class="hljs-keyword">let</span> mode = <span class="hljs-string">"brush"</span>;
    <span class="hljs-comment">// "Switch" button handler. Switches mode from </span>
    <span class="hljs-comment">// brush to eraser and back</span>
    <span class="hljs-built_in">document</span>.querySelector(<span class="hljs-string">"#mode"</span>).addEventListener(<span class="hljs-string">"click"</span>,<span class="hljs-function">() =&gt;</span> {
        <span class="hljs-keyword">if</span> (mode === <span class="hljs-string">"brush"</span>) {
            mode = <span class="hljs-string">"eraser"</span>;
            event.target.innerHTML = <span class="hljs-string">"Switch to brush"</span>;
        } <span class="hljs-keyword">else</span> {
            mode = <span class="hljs-string">"brush"</span>;
            event.target.innerHTML = <span class="hljs-string">"Switch to eraser"</span>;
        }
    });
    <span class="hljs-comment">// Digits canvases mouse move handler.</span>
    <span class="hljs-comment">// If mouse button pressed while user moves the mouse</span>
    <span class="hljs-comment">// on canvas, it draws circles in cursor position.</span>
    <span class="hljs-comment">// If mode="brush" then circles are black, otherwise</span>
    <span class="hljs-comment">// they are white</span>
    <span class="hljs-built_in">document</span>.querySelectorAll(<span class="hljs-string">"canvas"</span>).forEach(<span class="hljs-function"><span class="hljs-params">item</span> =&gt;</span> {
        ctx = item.getContext(<span class="hljs-string">"2d"</span>);  
        ctx.fillStyle=<span class="hljs-string">"#FFFFFF"</span>;
        ctx.fillRect(<span class="hljs-number">0</span>,<span class="hljs-number">0</span>,<span class="hljs-number">50</span>,<span class="hljs-number">50</span>);
        item.addEventListener(<span class="hljs-string">"mousemove"</span>, <span class="hljs-function">(<span class="hljs-params">event</span>) =&gt;</span> {
            <span class="hljs-keyword">if</span> (event.buttons) {
                ctx = event.target.getContext(<span class="hljs-string">"2d"</span>);  
                <span class="hljs-keyword">if</span> (mode === <span class="hljs-string">"brush"</span>) {
                    ctx.fillStyle = <span class="hljs-string">"#000000"</span>;         
                } <span class="hljs-keyword">else</span> {
                    ctx.fillStyle = <span class="hljs-string">"#FFFFFF"</span>;         
                }
                ctx.beginPath();               
                ctx.arc(event.offsetX<span class="hljs-number">-1</span>,event.offsetY<span class="hljs-number">-1</span>,<span class="hljs-number">2</span>,<span class="hljs-number">0</span>, <span class="hljs-number">2</span> * <span class="hljs-built_in">Math</span>.PI);
                ctx.fill();   
            }
        })
    })
    <span class="hljs-comment">// "Recognise" button handler. Captures</span>
    <span class="hljs-comment">// content of all digit canvases as BLOB.</span>
    <span class="hljs-comment">// Construct files from these blobs and</span>
    <span class="hljs-comment">// posts them to backend as a files as a</span>
    <span class="hljs-comment">// multipart form</span>
    <span class="hljs-built_in">document</span>.querySelector(<span class="hljs-string">"#recognise"</span>).addEventListener(<span class="hljs-string">"click"</span>, <span class="hljs-keyword">async</span>() =&gt; {
        data = <span class="hljs-keyword">new</span> FormData();
        canvases = <span class="hljs-built_in">document</span>.querySelectorAll(<span class="hljs-string">"canvas"</span>);
        <span class="hljs-keyword">const</span> getPng = <span class="hljs-function">(<span class="hljs-params">canvas</span>) =&gt;</span> {
            <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Promise</span>(<span class="hljs-function"><span class="hljs-params">resolve</span> =&gt;</span> {
                canvas.toBlob(<span class="hljs-function"><span class="hljs-params">png</span> =&gt;</span> {
                    resolve(png)
                })
            })
        }
        index = <span class="hljs-number">0</span>
        <span class="hljs-keyword">for</span> (<span class="hljs-keyword">let</span> canvas <span class="hljs-keyword">of</span> canvases) {
            <span class="hljs-keyword">const</span> png = <span class="hljs-keyword">await</span> getPng(canvas);
            data.append((++index)+<span class="hljs-string">".png"</span>,<span class="hljs-keyword">new</span> File([png],index+<span class="hljs-string">".png"</span>));
        }
        <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">"http://localhost:8080/api/recognize"</span>, {
            <span class="hljs-attr">body</span>: data,
            <span class="hljs-attr">method</span>: <span class="hljs-string">"POST"</span>
        })
        <span class="hljs-built_in">document</span>.querySelector(<span class="hljs-string">"#result"</span>).value = <span class="hljs-keyword">await</span> response.text();
    })

</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">style</span>&gt;</span><span class="css">
    <span class="hljs-selector-tag">body</span> {
        <span class="hljs-attribute">display</span>:flex;
        <span class="hljs-attribute">flex-direction</span>: column;
        <span class="hljs-attribute">justify-content</span>: flex-start;
        <span class="hljs-attribute">align-items</span>: flex-start;
    }
    <span class="hljs-selector-tag">canvas</span> {
        <span class="hljs-attribute">border-width</span>:<span class="hljs-number">1px</span>;
        <span class="hljs-attribute">border-color</span>:black;
        <span class="hljs-attribute">border-style</span>: solid;
        <span class="hljs-attribute">margin-right</span>:<span class="hljs-number">5px</span>;
        <span class="hljs-attribute">cursor</span>:crosshair;
    }
    <span class="hljs-selector-class">.digits</span> {
        <span class="hljs-attribute">display</span>:flex;
        <span class="hljs-attribute">flex-direction</span>: row;
        <span class="hljs-attribute">align-items</span>: center;
        <span class="hljs-attribute">justify-content</span>: flex-start;
    }
    <span class="hljs-selector-class">.digits</span> <span class="hljs-selector-tag">strong</span> {
        <span class="hljs-attribute">font-size</span>: <span class="hljs-number">72px</span>;
        <span class="hljs-attribute">margin</span>:<span class="hljs-number">10px</span>;
    }
    <span class="hljs-selector-class">.buttons</span> {
        <span class="hljs-attribute">display</span>:flex;
        <span class="hljs-attribute">flex-direction</span>: column;
        <span class="hljs-attribute">justify-content</span>: flex-start;
        <span class="hljs-attribute">align-items</span>: center;
    }
    <span class="hljs-selector-tag">button</span> {
        <span class="hljs-attribute">width</span>:<span class="hljs-number">100px</span>;
        <span class="hljs-attribute">margin-bottom</span>:<span class="hljs-number">5px</span>;
        <span class="hljs-attribute">margin-right</span>:<span class="hljs-number">10px</span>;
    }
    <span class="hljs-selector-class">.result</span> {
        <span class="hljs-attribute">margin-top</span>:<span class="hljs-number">10px</span>;
        <span class="hljs-attribute">display</span>:flex;
        <span class="hljs-attribute">flex-direction</span>: row;
        <span class="hljs-attribute">align-items</span>: flex-start;
        <span class="hljs-attribute">justify-content</span>: flex-start;
    }
    <span class="hljs-selector-tag">input</span> {
        <span class="hljs-attribute">margin-left</span>:<span class="hljs-number">10px</span>;
    }
</span><span class="hljs-tag">&lt;/<span class="hljs-name">style</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">html</span>&gt;</span>
</code></pre>
<p>The HTML part of this code contains 11 <a target="_blank" href="https://www.w3schools.com/html/html5_canvas.asp">HTML5 canvas</a> elements that display the boxes where you can draw. Each box has a size of 50x50 pixels and is filled with a white color. Also, the HTML contains "Switch to ..." and "Recognise" buttons and the "Result" input field.</p>
<p>The JavaScript part defines the "mode" global variable, which is equal to "brush" by default. When the user presses the "Switch to ..." button, it changes the mode to the "eraser". If they press it again, it switches back to the "brush".</p>
<p>Next, the JavaScript code defines "mousemove" event handlers for all canvas boxes. If the user presses the left mouse button in the "brush mode" and moves the mouse in the box, it draws black circles in place of the mouse cursor. This way, the user draws the digits. If the mode is "eraser", then it draws white circles. This way, the user can erase the black marks.</p>
<p>Finally, we defined the "Recognise" button click handler. When the user clicks this button, the handler function collects 11 digit images from the <code>canvas</code> elements and converts them to <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/Blob">BLOB</a> objects in a PNG image format. </p>
<p>Then it creates a POST request, puts these 11 digit images in it as files with names 1.png, 2.png and so on, and sends them to the <code>/api/recognize</code> endpoint of the backend service on port 8080 of a local host (which we will create in the next section). </p>
<p>The backend should receive these images, recognise digits in them, and return the recognition result as a string. This string will be displayed in the "Result" input field.</p>
<p>Lastly, I defined some CSS to apply basic styles to this page. You can modify them as you want. Now, let's move to the most interesting part – the digits recognition backend.</p>
<h2 id="how-to-create-a-backend">How to create a backend</h2>

<p>As a modern and mature programming language, Julia has a lot of libraries and frameworks for different tasks. Web frameworks are not an exception. We will use the <a target="_blank" href="https://genieframework.com/">Genie.jl</a> framework, which is similar to the Express in Node.js or Flask in Python.</p>
<p>With Genie.jl you can run a basic web service in two lines of code:</p>
<pre><code class="lang-julia"><span class="hljs-keyword">using</span> Genie
up(<span class="hljs-number">8080</span>, async=<span class="hljs-literal">false</span>)
</code></pre>
<p>It will run a web server on port 8080 of a local host.</p>
<p>Using any text editor, for example VSCode with the <a target="_blank" href="https://www.julia-vscode.org/">Julia extension</a>, create a new Julia file like <code>digits.jl</code> in the same folder with the <code>index.html</code>. This is where you'll write the next bit of code.</p>
<p>This web service will have two endpoints:</p>
<ul>
<li><strong><code>/</code></strong> to display the index.html web page that you created before.</li>
<li><strong><code>/api/recognize</code></strong> to receive POST requests with the images of digits, recognize them, and return a string with recognized numbers.</li>
</ul>
<p>As with most other web frameworks, to receive and process HTTP requests Genie.jl uses routes. This application will have two routes: </p>
<pre><code class="lang-julia"><span class="hljs-keyword">using</span> Genie, Genie.Router, Genie.Requests

route(<span class="hljs-string">"/"</span>) <span class="hljs-keyword">do</span> 
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">String</span>(read(<span class="hljs-string">"index.html"</span>))
<span class="hljs-keyword">end</span>

route(<span class="hljs-string">"/api/recognize"</span>, method=POST) <span class="hljs-keyword">do</span>
    result = <span class="hljs-string">""</span>
    <span class="hljs-comment"># TODO: in a loop, extract each image </span>
    <span class="hljs-comment"># from POST request body, send it to </span>
    <span class="hljs-comment"># the digit recognition function, </span>
    <span class="hljs-comment"># receive recognized digit and add </span>
    <span class="hljs-comment"># it to the result</span>
    <span class="hljs-keyword">return</span> result
<span class="hljs-keyword">end</span>

up(<span class="hljs-number">8080</span>, async=<span class="hljs-literal">false</span>)
</code></pre>
<p>To work with routes and requests, you need to import two additional subpackages – <code>Genie.Router</code> and <code>Genie.Requets</code>. </p>
<p>The first route just returns the content of the <code>index.html</code> file.</p>
<p>The second route processes the POST requests to the <code>/api/recognize</code> endpoint. This is how you can define it:</p>
<pre><code class="lang-julia"><span class="hljs-keyword">using</span> Images
route(<span class="hljs-string">"/api/recognize"</span>, method=POST) <span class="hljs-keyword">do</span>
    result = <span class="hljs-string">""</span>
    files = filespayload();   
    <span class="hljs-keyword">for</span> index <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:<span class="hljs-number">11</span>
        file = files[<span class="hljs-string">"<span class="hljs-variable">$index</span>.png"</span>]
        img = load(<span class="hljs-built_in">IOBuffer</span>(file.data))
        result *= recognizeDigit(img)        
    <span class="hljs-keyword">end</span>    
    <span class="hljs-keyword">return</span> result
<span class="hljs-keyword">end</span>
</code></pre>
<p>To load the received file as an image, we will use the Julia Images library that we imported on the first line.</p>
<p>Then, the <code>[filespayload](https://github.com/GenieFramework/Genie.jl/blob/7eb45c9ec32f0e4659abb08559b0b2729451421a/src/Requests.jl#L50)()</code> function extracts all files from the received request. </p>
<p>Then, we assume that the request has 11 files and we process them in a loop. Each file data is extracted as an array of bytes, but the <code>[load](https://juliaimages.org/stable/function_reference/#FileIO.load)</code> function requires the object that implements an IO buffer. That is why the <code>[IOBuffer](https://docs.julialang.org/en/v1/base/io-network/#Base.IOBuffer)</code> converts the array of bytes to a suitable format. </p>
<p>Then, the loaded image gets passed to the <code>recognizeDigit</code> function. This function will be written below. It should receive the image, then recognize it using the trained model and return the recognized digit as a string. This digit will be appended to the <code>result</code> string. Finally, the result with 11 recognized digits will be sent to the web page.</p>
<p>Before writing the <code>recognizeDigit</code> function, ensure that the saved model file <code>digits.bson</code> was copied to the folder with your backend code.</p>
<p>Also, it's important to understand that we can't process the input image as is because it has a size of 50x50, and it is a black digit on a white background.</p>
<p>If the model trained on images with size 28x28, then it can't be used to recognize images of other sizes. </p>
<p>Also, the model that trained on images that had white text written on black background will work poorly for colored images and for images with black text on a white background. </p>
<p>So, before you send the image to the model for recognition, you need to preprocess them using the following steps:</p>
<ul>
<li>Convert the images to gray</li>
<li>Invert the colors</li>
<li>Resize them to 28x28</li>
</ul>
<p>Now you are ready to implement the digits recognition function:</p>
<pre><code class="lang-julia"><span class="hljs-keyword">using</span> Flux, MLUtils, BSON
<span class="hljs-keyword">function</span> recognizeDigit(img)
    <span class="hljs-comment"># load the model</span>
    BSON.<span class="hljs-meta">@load</span> <span class="hljs-string">"digits.bson"</span> model
    <span class="hljs-comment"># Convert image to grayscale</span>
    img = Gray.(img)
    <span class="hljs-comment"># Invert each pixel color</span>
    img = (x-&gt;Gray(<span class="hljs-number">1</span>)-x.val).(img)
    <span class="hljs-comment"># resize image to 28x28 pixels</span>
    img = imresize(img,(<span class="hljs-number">28</span>,<span class="hljs-number">28</span>))
    <span class="hljs-comment"># Get matrix of image</span>
    digit_data = <span class="hljs-built_in">Float32</span>.(channelview(img))
    <span class="hljs-comment"># predict the digit (get probabilities)</span>
    probs = model(cat(digit_data,dims=<span class="hljs-number">4</span>))
    <span class="hljs-comment"># return the digit with the largest </span>
    <span class="hljs-comment"># probability, converted to a string</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">"<span class="hljs-subst">$(argmax(probs)</span>[1]-1)"</span>
<span class="hljs-keyword">end</span>
</code></pre>
<p>When all this is done, you are almost ready to run the app. Before doing that, ensure that all required packages are installed. Run the <code>julia</code> REPL in a project folder. Then run the following code line by line, to install all packages mentioned in the <code>using</code> lines:</p>
<pre><code class="lang-bash">using Pkg
Pkg.add(<span class="hljs-string">"Genie"</span>)
Pkg.add(<span class="hljs-string">"Images"</span>)
Pkg.add(<span class="hljs-string">"Flux"</span>)
Pkg.add(<span class="hljs-string">"MLUtils"</span>)
</code></pre>
<p>Then exit the repl using the <code>exit()</code> command.</p>
<p>Now you can run the app. To do that, either execute the <code>julia digits.jl</code> command from the terminal or press Ctrl+F5 in VSCode. </p>
<p>Then, go to <code>http://localhost:8080</code> in a web browser, draw the digits, press the "Recognise" button, and in a few moments you will see the recognised number as a text in the "Result" field.</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/e5ScpCggVbs" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<h2 id="conclusion">Conclusion</h2>

<p>In this tutorial, I demonstrated how to create and train both feed forward and convolutional neural networks using Julia. You also learned how to export and use them in a web application.</p>
<p>In addition, I tried to show that you should not reinvent the wheel when creating neural networks.</p>
<p>When solving real life problems, you should not build neural network architectures from scratch. Most of them have already been created by data scientists and enthusiasts around the world. In practice, you will just reuse them. </p>
<p>You'll just need to find the suitable architecture and either use it as is or change the last few layers to adjust the outputs according to your needs.</p>
<p>For example, you can search <a target="_blank" href="https://huggingface.co/models">this collection</a> where you'll find different models classified by problem types. Even if many of them were not created with Julia, you can create them using Flux.jl after reading their descriptions.</p>
<p>The way we created and trained our neural network is not the best or the only possible one. Perhaps in some points I oversimplified things, because I wanted to explain all this as simply as possible. </p>
<p>But if you've understood the examples here, you can learn and reuse the following more advanced Julia solutions of the handwritten digits recognition task:</p>
<ul>
<li><a target="_blank" href="https://fluxml.ai/Flux.jl/stable/tutorials/2021-01-26-mlp/">Tutorial: Simple Multi-Layer Perceptron</a> </li>
<li><a target="_blank" href="https://github.com/FluxML/model-zoo/tree/master/vision/conv_mnist">MNIST example in the Julia model-zoo</a></li>
</ul>
<p>You can see the source code of this article including the Jupyter Notebook and the web service in <a target="_blank" href="https://github.com/AndreyGermanov/phones_reader">this repository</a>.</p>
<p>Have a fun coding and never stop learning!</p>
<p>You can find me on <a target="_blank" href="https://www.linkedin.com/in/andrey-germanov-dev/">LinkedIn</a>, <a target="_blank" href="https://twitter.com/GermanovDev">Twitter</a>, and <a target="_blank" href="https://www.facebook.com/AndreyGermanovDev">Facebook</a> to know first about new articles like this one and other software development news.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use TensorFlow for Deep Learning – Basics for Beginners ]]>
                </title>
                <description>
                    <![CDATA[ TensorFlow is a library that helps engineers build and train deep learning models. It provides all the tools we need to create neural networks. We can use TensorFlow to train simple to complex neural networks using large sets of data. TensorFlow is u... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/tensorflow-basics/</link>
                <guid isPermaLink="false">66d0362131fbfb6c3390f1f3</guid>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tensor ]]>
                    </category>
                
                    <category>
                        <![CDATA[ TensorFlow ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Tue, 14 Feb 2023 23:46:51 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/02/tensorflow.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>TensorFlow is a library that helps engineers build and train deep learning models. It provides all the tools we need to create neural networks.</p>
<p>We can use TensorFlow to train simple to complex neural networks using large sets of data.</p>
<p>TensorFlow is used in a variety of applications, from image and speech recognition to natural language processing and robotics. TensorFlow enables us to quickly and easily build powerful AI models with high accuracy and performance.</p>
<p>TensorFlow also works with GPUs and TPUs, which are types of computer chips built to extend TensorFlow’s capabilities. These chips make TensorFlow run faster, which is helpful when you have a lot of data to work with.</p>
<p>In this article, we will learn about tensors and how to work with tensors using TensorFlow. Let’s dive right in.</p>
<h2 id="heading-what-is-a-tensor">What is a Tensor?</h2>
<p>A simple explanation would be that a tensor is a multi-dimensional array.</p>
<p><img src="https://miro.medium.com/max/1050/1*rLcM-j8b61Xlfk81k_exKw.png" alt="Image" width="600" height="400" loading="lazy">
<em>Scalar, Vector, Matrix and Tensor</em></p>
<p>A scalar is a single number. A vector is an array of numbers. A matrix is a 2-dimensional array. A tensor is an n-dimensional array.</p>
<p>In TensorFlow, everything can be considered a tensor including a scalar. A scalar would be a tensor of dimension 0, a vector of dimension 1, and a matrix of dimension 2.</p>
<p>Now, this is useful because we are not limited to working with complex datasets in TensorFlow. TensorFlow can handle any type of data and feed it to machine learning models.</p>
<h2 id="heading-what-is-tensorflow">What is TensorFlow?</h2>
<p>TensorFlow is an open-source software library for building neural networks. Google Brain team was the one who built it and it is the most popular deep learning library in the market today.</p>
<p>You can use TensorFlow to build AI models including image and speech recognition, natural language processing, and predictive modeling.</p>
<p><img src="https://miro.medium.com/max/1050/1*mPUeOmKYoWvPcZFjMdsiUQ.gif" alt="Image" width="600" height="400" loading="lazy">
<em>Classification neural network</em></p>
<p>TensorFlow uses a dataflow graph to represent computations. To put it simply, TensorFlow has made it easy to build complex machine learning models.</p>
<p>TensorFlow takes care of a lot of work behind the scenes which makes it useful while building and training any type of machine learning model. TensorFlow also manages the computation, including parallelization and optimization, on the user’s behalf.</p>
<h2 id="heading-tensorflow-and-keras">TensorFlow and Keras</h2>
<p><img src="https://miro.medium.com/max/1050/1*X7QA_c8KHk7nD0tywv-OVg.png" alt="Image" width="600" height="400" loading="lazy">
<em>Tensorflow and Keras</em></p>
<p>TensorFlow has a high-level API called Keras. Keras was a standalone project which is now available within the TensorFlow library. Keras makes it easy to define and train models while TensorFlow provides more control over the computation.</p>
<p>TensorFlow supports a wide range of hardware, including CPUs, GPUs, and TPUs. TPUs are Tensor processing Unites, built specifically to work with Tensors and TensorFlow.</p>
<p>We can also run TensorFlow on mobile devices and IoT devices using TensorFlow Lite. TensorFlow also has a large community of developers, and it is updated with new features and capabilities.</p>
<h2 id="heading-how-to-build-tensors-with-tensorflow">How to Build Tensors with TensorFlow</h2>
<p>Let’s start writing some code. If you don't have TensorFlow installed, you can use a <a target="_blank" href="https://colab.research.google.com/">Google colab notebook</a> to follow along.</p>
<p>Let’s start by importing TensorFlow and printing out the version.</p>
<pre><code><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf
print(tf.__version__)
</code></pre><pre><code>OUTPUT:
<span class="hljs-number">2.9</span><span class="hljs-number">.2</span>
</code></pre><p>Let’s first create a scalar using tf.constant. We use tf.constant to create a new constant value. We can also use tf.variable to create a variable value. We will then print the value and also check the dimension of the scalar using the ndim property. Its dimension will be zero because it is a single value.</p>
<pre><code>scalar = tf.constant(<span class="hljs-number">7</span>)
print(scalar)
print(scalar.ndim)
</code></pre><pre><code>OUTPUT:
tf.Tensor(<span class="hljs-number">7</span>, shape=(), dtype=int32)
<span class="hljs-number">0</span>
</code></pre><p>Now let’s create a vector and print its dimensions. You can see that the dimension is 1.</p>
<pre><code>vector = tf.constant([<span class="hljs-number">10</span>,<span class="hljs-number">10</span>])
print(vector)
print(vector.ndim)
</code></pre><pre><code>OUTPUT:
tf.Tensor([<span class="hljs-number">10</span> <span class="hljs-number">10</span>], shape=(<span class="hljs-number">2</span>,), dtype=int32)
<span class="hljs-number">1</span>
</code></pre><p>Now let’s try creating a matrix and printing its dimensions.</p>
<pre><code>matrix = tf.constant([
    [<span class="hljs-number">10</span>,<span class="hljs-number">11</span>],
    [<span class="hljs-number">12</span>,<span class="hljs-number">13</span>]
])
print(matrix)
print(matrix.ndim)
</code></pre><pre><code>OUTPUT:
tf.Tensor(
[[<span class="hljs-number">10</span> <span class="hljs-number">11</span>]
 [<span class="hljs-number">12</span> <span class="hljs-number">13</span>]], shape=(<span class="hljs-number">2</span>, <span class="hljs-number">2</span>), dtype=int32)
<span class="hljs-number">2</span>
</code></pre><p>You will see that the dimension is now 2. You can also see that the shape of the matrix is 2 by 2.</p>
<p>Shapes and dimensions are useful when working with TensorFlow because we will often change them while using these data to train neural networks.</p>
<p>We have seen that these tensors have a default datatype of int32. What if we want to create a dataset with a custom datatype?</p>
<p>tf.constant provides us with the dtype argument. Let’s create the same matrix again with float16 as the data type.</p>
<pre><code>tensor_1 = tf.constant([
    [
        [<span class="hljs-number">1</span>,<span class="hljs-number">2</span>,<span class="hljs-number">3</span>]
    ],
    [
        [<span class="hljs-number">4</span>,<span class="hljs-number">5</span>,<span class="hljs-number">6</span>]
    ],
    [
        [<span class="hljs-number">7</span>,<span class="hljs-number">8</span>,<span class="hljs-number">9</span>]
    ]
],dtype=<span class="hljs-string">'float32'</span>)
print(tensor_1)
</code></pre><pre><code>OUTPUT:
tf.Tensor(
[[[<span class="hljs-number">1.</span> <span class="hljs-number">2.</span> <span class="hljs-number">3.</span>]]

 [[<span class="hljs-number">4.</span> <span class="hljs-number">5.</span> <span class="hljs-number">6.</span>]]

 [[<span class="hljs-number">7.</span> <span class="hljs-number">8.</span> <span class="hljs-number">9.</span>]]], shape=(<span class="hljs-number">3</span>, <span class="hljs-number">1</span>, <span class="hljs-number">3</span>), dtype=float32)
</code></pre><p>Now let’s create a tensor. We will input a 3-dimensional array to tf.constant. We will also print its dimensions.</p>
<pre><code>tensor = tf.constant([
    [
        [<span class="hljs-number">1</span>,<span class="hljs-number">2</span>,<span class="hljs-number">3</span>]
    ],
    [
        [<span class="hljs-number">4</span>,<span class="hljs-number">5</span>,<span class="hljs-number">6</span>]
    ],
    [
        [<span class="hljs-number">7</span>,<span class="hljs-number">8</span>,<span class="hljs-number">9</span>]
    ]
])
print(tensor)
print(tensor.ndim)
</code></pre><pre><code>OUTPUT:
tf.Tensor(
[[[<span class="hljs-number">1</span> <span class="hljs-number">2</span> <span class="hljs-number">3</span>]]
 [[<span class="hljs-number">4</span> <span class="hljs-number">5</span> <span class="hljs-number">6</span>]]
 [[<span class="hljs-number">7</span> <span class="hljs-number">8</span> <span class="hljs-number">9</span>]]], shape=(<span class="hljs-number">3</span>, <span class="hljs-number">1</span>, <span class="hljs-number">3</span>), dtype=int32)
<span class="hljs-number">3</span>
</code></pre><p>Now we have a tensor of dimension 3 and shape 3 by 1 by 3. This is the simplest tensor you can create. In real-world scenarios, we will be dealing with tensors of higher dimensions and bigger shapes.</p>
<p>Now let’s look at how to create a variable tensor. We won’t be using variable tensors very often compared to constant tensors, but it is good to know that we have an option.</p>
<p>We will use tf.Variable to create a variable tensor. The difference between the constant tensor and variable tensor is that you can change the data in a variable tensor, but you can’t change the values in a constant tensor. Let’s create a variable tensor and print the dimensions.</p>
<pre><code>var_tensor = tf.Variable([
    [
        [<span class="hljs-number">1</span>,<span class="hljs-number">2</span>,<span class="hljs-number">3</span>]
    ],
    [
        [<span class="hljs-number">4</span>,<span class="hljs-number">5</span>,<span class="hljs-number">6</span>]
    ],
    [
        [<span class="hljs-number">7</span>,<span class="hljs-number">8</span>,<span class="hljs-number">9</span>]
    ]
])
print(var_tensor)
</code></pre><pre><code>OUTPUT:
<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">tf.Variable</span> '<span class="hljs-attr">Variable:0</span>' <span class="hljs-attr">shape</span>=<span class="hljs-string">(3,</span> <span class="hljs-attr">1</span>, <span class="hljs-attr">3</span>) <span class="hljs-attr">dtype</span>=<span class="hljs-string">int32,</span> <span class="hljs-attr">numpy</span>=
<span class="hljs-string">array([[[1,</span> <span class="hljs-attr">2</span>, <span class="hljs-attr">3</span>]],
       [[<span class="hljs-attr">4</span>, <span class="hljs-attr">5</span>, <span class="hljs-attr">6</span>]],
       [[<span class="hljs-attr">7</span>, <span class="hljs-attr">8</span>, <span class="hljs-attr">9</span>]]], <span class="hljs-attr">dtype</span>=<span class="hljs-string">int32)</span>&gt;</span></span>
</code></pre><h1 id="heading-how-to-generate-and-load-tensors">How to Generate and Load Tensors</h1>
<p>Let’s look at how to generate tensors. In most cases, you won’t be creating tensors from scratch. You will either load a dataset, convert other datasets like NumPy arrays to tensors, or generate tensors. First, let’s look at how to generate tensors.</p>
<p>Let’s create a tensor with random values. There are two common ways you can do this: generate a normal distribution of data or a uniform distribution of data.</p>
<p><img src="https://miro.medium.com/max/1050/0*tRWkwBjuQvgi2rGG.png" alt="Image" width="600" height="400" loading="lazy">
<em>Normal distribution</em></p>
<p>The normal distribution is a bell-shaped curve that represents the distribution of data. Most of the data will be close to the average and fewer data will be away from the average. This means the probability of getting a value near the average is higher.</p>
<p><img src="https://miro.medium.com/max/1050/0*SOM1PR1htTNzuRMS.png" alt="Image" width="600" height="400" loading="lazy">
<em>Uniform distribution</em></p>
<p>The uniform distribution is a straight line that represents the distribution of data. All the values in a uniform distribution will have an equal probability of occurring within a given range.</p>
<p>Before we generate random values, you must understand what a seed is. If we use a seed value, we can regenerate the same set of data multiple times. This will be useful when we want to test our machine-learning model against the same data after we tweak its performance.</p>
<p>Let’s create two arrays of random tensors. We will first set a seed and generate the random values using that seed.</p>
<pre><code>seed = tf.random.Generator.from_seed(<span class="hljs-number">42</span>)
</code></pre><p>Now we will create a normal and uniform distribution with the shape of 3 by 2.</p>
<pre><code>normal_tensor = seed.normal(shape=(<span class="hljs-number">3</span>,<span class="hljs-number">2</span>))
print(normal_tensor)
uniform_tensor = seed.uniform(shape=(<span class="hljs-number">3</span>,<span class="hljs-number">2</span>))
print(uniform_tensor)
</code></pre><pre><code>OUTPUT:
tf.Tensor( [[<span class="hljs-number">-0.7565803</span>  <span class="hljs-number">-0.06854702</span>]  [ <span class="hljs-number">0.07595026</span> <span class="hljs-number">-1.2573844</span> ]  [<span class="hljs-number">-0.23193765</span> <span class="hljs-number">-1.8107855</span> ]], shape=(<span class="hljs-number">3</span>, <span class="hljs-number">2</span>), dtype=float32)
tf.Tensor( [[<span class="hljs-number">0.7647915</span>  <span class="hljs-number">0.03845465</span>]  [<span class="hljs-number">0.8506975</span>  <span class="hljs-number">0.20781887</span>]  [<span class="hljs-number">0.711869</span>   <span class="hljs-number">0.8843919</span> ]], shape=(<span class="hljs-number">3</span>, <span class="hljs-number">2</span>), dtype=float32)
</code></pre><p>We have two tensors created, one with a normal distribution of random numbers and the other with a uniform distribution of random numbers.</p>
<p>Next, we will create a tensor with zeros and ones. In TensorFlow, tensors filled with zeros or ones are often used as a starting point for creating other tensors. They can also be placeholders for inputs in a computational graph.</p>
<p>To create a tensor of zeroes, use the tf.zeros function with a shape as the input argument. To create a tensor with ones, we use tf.ones with the shape as input argument.</p>
<pre><code>zeros = tf.zeros(shape=(<span class="hljs-number">3</span>,<span class="hljs-number">2</span>))
print(zeros)
ones = tf.ones(shape=(<span class="hljs-number">3</span>,<span class="hljs-number">2</span>))
print(ones)
</code></pre><pre><code>OUTPUT:
tf.Tensor(
[[<span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
 [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
 [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]], shape=(<span class="hljs-number">3</span>, <span class="hljs-number">2</span>), dtype=float32)
tf.Tensor(
[[<span class="hljs-number">1.</span> <span class="hljs-number">1.</span>]
 [<span class="hljs-number">1.</span> <span class="hljs-number">1.</span>]
 [<span class="hljs-number">1.</span> <span class="hljs-number">1.</span>]], shape=(<span class="hljs-number">3</span>, <span class="hljs-number">2</span>), dtype=float32)
</code></pre><p>Now, let’s look at converting NumPy arrays into tensors. If you don’t know <a target="_blank" href="https://numpy.org/">what NumPy is</a>, it is a Python library for numerical computing. It helps us handle large datasets and perform a variety of computations on them.</p>
<p>Let’s import NumPy and create a NumPy array using NumPy’s arrange function.</p>
<pre><code><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
numpy_arr = np.arange(<span class="hljs-number">1</span>,<span class="hljs-number">25</span>,dtype=np.int32)
</code></pre><p>Now, we can create a tensor using the tf.constant function with the NumPy array as input. TensorFlow has built-in support to handle NumPy arrays, so it is just a matter of importing a NumPy array and setting a shape.</p>
<pre><code>print(numpy_arr)
numpy_tensor = tf.constant(numpy_arr,shape=[<span class="hljs-number">2</span>,<span class="hljs-number">4</span>,<span class="hljs-number">3</span>])
print(numpy_tensor)
</code></pre><pre><code>OUTPUT:
[ <span class="hljs-number">1</span>  <span class="hljs-number">2</span>  <span class="hljs-number">3</span>  <span class="hljs-number">4</span>  <span class="hljs-number">5</span>  <span class="hljs-number">6</span>  <span class="hljs-number">7</span>  <span class="hljs-number">8</span>  <span class="hljs-number">9</span> <span class="hljs-number">10</span> <span class="hljs-number">11</span> <span class="hljs-number">12</span> <span class="hljs-number">13</span> <span class="hljs-number">14</span> <span class="hljs-number">15</span> <span class="hljs-number">16</span> <span class="hljs-number">17</span> <span class="hljs-number">18</span> <span class="hljs-number">19</span> <span class="hljs-number">20</span> <span class="hljs-number">21</span> <span class="hljs-number">22</span> <span class="hljs-number">23</span> <span class="hljs-number">24</span>]
tf.Tensor(
[[[ <span class="hljs-number">1</span>  <span class="hljs-number">2</span>  <span class="hljs-number">3</span>]
  [ <span class="hljs-number">4</span>  <span class="hljs-number">5</span>  <span class="hljs-number">6</span>]
  [ <span class="hljs-number">7</span>  <span class="hljs-number">8</span>  <span class="hljs-number">9</span>]
  [<span class="hljs-number">10</span> <span class="hljs-number">11</span> <span class="hljs-number">12</span>]]
 [[<span class="hljs-number">13</span> <span class="hljs-number">14</span> <span class="hljs-number">15</span>]
  [<span class="hljs-number">16</span> <span class="hljs-number">17</span> <span class="hljs-number">18</span>]
  [<span class="hljs-number">19</span> <span class="hljs-number">20</span> <span class="hljs-number">21</span>]
  [<span class="hljs-number">22</span> <span class="hljs-number">23</span> <span class="hljs-number">24</span>]]], shape=(<span class="hljs-number">2</span>, <span class="hljs-number">4</span>, <span class="hljs-number">3</span>), dtype=int32)
</code></pre><p>You can see both the NumPy array as well as our tensor. The original NumPy array was 1x12 but our tensor is 2x4x3. This is called re-shaping a tensor which we will often do while training deep neural networks.</p>
<h1 id="heading-basic-operations-using-tensorflow">Basic Operations using Tensorflow</h1>
<p>We have learned how tensors are created in TensorFlow. Now let’s look at some basic operations using tensors.</p>
<p>We will start by getting some information on our tensors. Let’s create a 4D tensor with 0 values with the shape 2x3x4x5.</p>
<pre><code>rank4_tensor = tf.zeros([<span class="hljs-number">2</span>,<span class="hljs-number">3</span>,<span class="hljs-number">4</span>,<span class="hljs-number">5</span>])
print(rank4_tensor)
</code></pre><pre><code>OUTPUT:
tf.Tensor(
[[[[<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]]
  [[<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]]
  [[<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]]]
 [[[<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]]
  [[<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]]
  [[<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]
   [<span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span> <span class="hljs-number">0.</span>]]]], shape=(<span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>), dtype=float32)
</code></pre><p>We have created our rank 4 tensor. Now let's get some information about the size, shape (number of values), and the dimension of the tensor. </p>
<p>We will use tf.size function to get the size. The shape and ndim properties will give us the shape and dimensions of the tensor. </p>
<pre><code>print(<span class="hljs-string">"Size"</span>,tf.size(rank4_tensor))
print(<span class="hljs-string">"shape"</span>,rank4_tensor.shape)
print(<span class="hljs-string">"Dimension"</span>,rank4_tensor.ndim)
</code></pre><pre><code>OUTPUT: 

Size tf.Tensor(<span class="hljs-number">120</span>, shape=(), dtype=int32)
shape (<span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>)
Dimension <span class="hljs-number">4</span>
</code></pre><p>Let’s look at some simple calculations using the tensor. I will create a new basic tensor.</p>
<pre><code>basic_tensor = tf.constant([[<span class="hljs-number">10</span>,<span class="hljs-number">11</span>],[<span class="hljs-number">12</span>,<span class="hljs-number">13</span>]])
print(basic_tensor)
</code></pre><pre><code>OUTPUT: 

tf.Tensor(
[[<span class="hljs-number">10</span> <span class="hljs-number">11</span>]
 [<span class="hljs-number">12</span> <span class="hljs-number">13</span>]], shape=(<span class="hljs-number">2</span>, <span class="hljs-number">2</span>), dtype=int32)
</code></pre><p>Let’s try some simple operations. We can add, subtract, multiply, and divide every value in a tensor using the basic operators.</p>
<pre><code>print(basic_tensor + <span class="hljs-number">10</span>)
print(basic_tensor - <span class="hljs-number">10</span>)
print(basic_tensor * <span class="hljs-number">10</span>)
print(basic_tensor / <span class="hljs-number">10</span>)
</code></pre><pre><code>OUTPUT:
tf.Tensor(
[[<span class="hljs-number">20</span> <span class="hljs-number">21</span>]
 [<span class="hljs-number">22</span> <span class="hljs-number">23</span>]], shape=(<span class="hljs-number">2</span>, <span class="hljs-number">2</span>), dtype=int32)
tf.Tensor(
[[<span class="hljs-number">0</span> <span class="hljs-number">1</span>]
 [<span class="hljs-number">2</span> <span class="hljs-number">3</span>]], shape=(<span class="hljs-number">2</span>, <span class="hljs-number">2</span>), dtype=int32)
tf.Tensor(
[[<span class="hljs-number">100</span> <span class="hljs-number">110</span>]
 [<span class="hljs-number">120</span> <span class="hljs-number">130</span>]], shape=(<span class="hljs-number">2</span>, <span class="hljs-number">2</span>), dtype=int32)
tf.Tensor(
[[<span class="hljs-number">1.</span>  <span class="hljs-number">1.1</span>]
 [<span class="hljs-number">1.2</span> <span class="hljs-number">1.3</span>]], shape=(<span class="hljs-number">2</span>, <span class="hljs-number">2</span>), dtype=float64)
</code></pre><p>Now let’s try matrix multiplication. I will create two simple tensors tensor_011 and tensor_012.</p>
<pre><code>tensor_011 = tf.constant([[<span class="hljs-number">2</span>,<span class="hljs-number">2</span>],[<span class="hljs-number">4</span>,<span class="hljs-number">4</span>]])
tensor_012 = tf.constant([[<span class="hljs-number">2</span>,<span class="hljs-number">3</span>],[<span class="hljs-number">4</span>,<span class="hljs-number">5</span>]])
</code></pre><p>Keep in mind that in matrix multiplication, the inner dimensions should match. For example, a (3, 5) <em> (3, 5) multiplication won’t work but (3, 5) </em> (5, 3) will work.</p>
<p>The final shape of the resulting matrix will be its outer dimension. so, a 3x5 tensor multiplied by a 5x3 tensor will give us a 5x5 tensor. We will use the tf.matmul function to perform matrix multiplication.</p>
<pre><code>print(tf.matmul(tensor_011,tensor_012))
</code></pre><pre><code>OUTPUT:
tf.Tensor(
[[<span class="hljs-number">12</span> <span class="hljs-number">16</span>]
 [<span class="hljs-number">24</span> <span class="hljs-number">32</span>]], shape=(<span class="hljs-number">2</span>, <span class="hljs-number">2</span>), dtype=int32)
</code></pre><p>Next, let’s look at reshaping and transposing a matrix. As we saw before, we will often use reshaping to change our matrix structure while training neural networks.</p>
<p>For example, an image pixel matrix of 28x28 will be converted into a 1-dimensional 784-pixel array for an image classification neural network.</p>
<p>To reshape, we use the tf.reshape function. To transpose, we use the tf.transpose function. If you don't know what a transpose is, it's converting rows into columns and columns into rows.</p>
<pre><code>print(tf.reshape(tensor_011,[<span class="hljs-number">4</span>,<span class="hljs-number">1</span>]))
print(tf.transpose(tensor_011))
</code></pre><pre><code>OUTPUT:
tf.Tensor(
[[<span class="hljs-number">2</span>]
 [<span class="hljs-number">2</span>]
 [<span class="hljs-number">4</span>]
 [<span class="hljs-number">4</span>]], shape=(<span class="hljs-number">4</span>, <span class="hljs-number">1</span>), dtype=int32)
tf.Tensor(
[[<span class="hljs-number">2</span> <span class="hljs-number">4</span>]
 [<span class="hljs-number">2</span> <span class="hljs-number">4</span>]], shape=(<span class="hljs-number">2</span>, <span class="hljs-number">2</span>), dtype=int32)
</code></pre><p>Finally, let’s look at some aggregate operations like min, max, standard deviation, square and square root.</p>
<p>To find the minimum and maximum values, we use the tf.reduce_min and tf.reduce_max functions. And to find the sum of the array, we use the tf.reduce_sum function.</p>
<pre><code>tensor_013 = tf.constant([
    [<span class="hljs-number">1</span>,<span class="hljs-number">2</span>,<span class="hljs-number">3</span>],
    [<span class="hljs-number">4</span>,<span class="hljs-number">5</span>,<span class="hljs-number">6</span>],
    [<span class="hljs-number">7</span>,<span class="hljs-number">8</span>,<span class="hljs-number">9</span>]
],dtype=<span class="hljs-string">'float32'</span>)
print(tf.reduce_min(tensor_013))
print(tf.reduce_max(tensor_013))
print(tf.reduce_sum(tensor_013))
</code></pre><pre><code>OUTPUT:
tf.Tensor(<span class="hljs-number">1.0</span>, shape=(), dtype=float32)
tf.Tensor(<span class="hljs-number">9.0</span>, shape=(), dtype=float32)
tf.Tensor(<span class="hljs-number">45.0</span>, shape=(), dtype=float32)
</code></pre><p>Now for the standard deviation and variance, we use the tf.math.reduce_std function and tf.math.reduce_variance function.</p>
<pre><code>print(tf.math.reduce_std(tensor_013))
print(tf.math.reduce_variance(tensor_013))
</code></pre><pre><code>OUTPUT:
tf.Tensor(<span class="hljs-number">2.5819888</span>, shape=(), dtype=float32)
tf.Tensor(<span class="hljs-number">6.6666665</span>, shape=(), dtype=float32)
</code></pre><p>Let’s find the square, square root, and log of each value in a tensor.</p>
<pre><code>print(tf.sqrt(tensor_013))
print(tf.square(tensor_013))
print(tf.math.log(tensor_013))
</code></pre><pre><code>OUTPUT:
tf.Tensor(
[[<span class="hljs-number">1.</span>        <span class="hljs-number">1.4142135</span> <span class="hljs-number">1.7320508</span>]
 [<span class="hljs-number">2.</span>        <span class="hljs-number">2.236068</span>  <span class="hljs-number">2.4494898</span>]
 [<span class="hljs-number">2.6457512</span> <span class="hljs-number">2.828427</span>  <span class="hljs-number">3.</span>       ]], shape=(<span class="hljs-number">3</span>, <span class="hljs-number">3</span>), dtype=float32)
tf.Tensor(
[[ <span class="hljs-number">1.</span>  <span class="hljs-number">4.</span>  <span class="hljs-number">9.</span>]
 [<span class="hljs-number">16.</span> <span class="hljs-number">25.</span> <span class="hljs-number">36.</span>]
 [<span class="hljs-number">49.</span> <span class="hljs-number">64.</span> <span class="hljs-number">81.</span>]], shape=(<span class="hljs-number">3</span>, <span class="hljs-number">3</span>), dtype=float32)
tf.Tensor(
[[<span class="hljs-number">0.</span>        <span class="hljs-number">0.6931472</span> <span class="hljs-number">1.0986123</span>]
 [<span class="hljs-number">1.3862944</span> <span class="hljs-number">1.609438</span>  <span class="hljs-number">1.7917595</span>]
 [<span class="hljs-number">1.9459102</span> <span class="hljs-number">2.0794415</span> <span class="hljs-number">2.1972246</span>]], shape=(<span class="hljs-number">3</span>, <span class="hljs-number">3</span>), dtype=float32)
</code></pre><p>We have learned the basics of TensorFlow in this article. You are now equipped to work with TensorFlow and use it to model data.</p>
<p>If you want to start using this knowledge and build a project, you can check out my course on <a target="_blank" href="https://learn.manishmshiva.com/tensorflow-basics-handwriting-recognition-using-computer-vision">building a handwriting recognition neural network using TensorFlow</a>. You can also learn advanced TensorFlow concepts using the <a target="_blank" href="https://www.tensorflow.org/overview">official documentation.</a></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Tensorflow is a powerful library to build deep-learning models. It has all the tools we need to construct neural networks to solve problems like image classification, sentiment analysis, stock market predictions, etc. </p>
<p>With the advent of technologies like ChatGPT, learning TensorFlow will give you a head start in the current job market.</p>
<p><em>Hope you liked this article. You can learn more about me and my articles/videos at</em> <a target="_blank" href="https://www.manishmshiva.com/"><em>manishmshiva.com</em></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn Neural Networks by Building a Self-Driving Car Sim Using JavaScript ]]>
                </title>
                <description>
                    <![CDATA[ "Any application that can be written in JavaScript, will eventually be written in JavaScript." – Jeff Atwood It's time for you to create a self-driving car using JavaScript! We just published a course on the freeCodeCamp.org YouTube channel that will... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/self-driving-car-javascript/</link>
                <guid isPermaLink="false">66b2066343f24c1bb1598186</guid>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Thu, 12 May 2022 16:29:42 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/05/NN.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>"Any application that can be written in JavaScript, will eventually be written in JavaScript." – Jeff Atwood</p>
<p>It's time for you to create a self-driving car using JavaScript!</p>
<p>We just published a course on the freeCodeCamp.org YouTube channel that will help you learn about neural networks by teaching you how to build a self-driving car simulator in JavaScript (with no libraries!).</p>
<p>Radu Mariescu-Istodor developed this course. Radu has a PhD in computer science and is known for creating creative tutorials relating to machine learning and programming.</p>
<p>In this course you will learn how to implement the car driving mechanics, how to define the environment, how to simulate some sensors, how to detect collisions, and how to make the car control itself using a neural network.</p>
<p>The course covers how artificial neural networks work, by comparing them with the real neural networks in our brain. You will learn how to implement a neural network and how to visualize it so we can see it in action.</p>
<p>Radu uses JavaScript to implement the system and he teaches modern JavaScript techniques. This course is perfect for people interested in becoming software engineers or machine learning specialists (like Radu – he has over 10 years research experience with machine learning).</p>
<p>Here are the sections covered in this course:</p>
<ul>
<li>Car driving mechanics</li>
<li>Defining the road</li>
<li>Artificial sensors</li>
<li>Collision detection</li>
<li>Simulating traffic</li>
<li>Neural network</li>
<li>Parallelization</li>
<li>Genetic algorithm</li>
</ul>
<p>Watch the full course below or <a target="_blank" href="https://youtu.be/Rs_rAxEsAvI">on the freeCodeCamp.org YouTube channel</a> (2.5-hour watch).</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/Rs_rAxEsAvI" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What Are Graph Neural Networks? How GNNs Work, Explained with Examples ]]>
                </title>
                <description>
                    <![CDATA[ By Rishit Dagli Graph Neural Networks are getting more and more popular and are being used extensively in a wide variety of projects. In this article, I help you get started and understand how graph neural networks work while also trying to address t... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/graph-neural-networks-explained-with-examples/</link>
                <guid isPermaLink="false">66d460c7c7632f8bfbf1e48f</guid>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MathJax ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Tue, 01 Feb 2022 16:50:35 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/01/download-1.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Rishit Dagli</p>
<p>Graph Neural Networks are getting more and more popular and are being used extensively in a wide variety of projects.</p>
<p>In this article, I help you get started and understand how graph neural networks work while also trying to address the question "why" at each stage. </p>
<p>Finally we will also take a look at implementing some of the methods we talk about in this article in code.</p>
<p>And don't worry – you won't need to know very much math to understand these concepts and learn how to apply them.</p>
<h2 id="heading-what-is-a-graph">What is a graph?</h2>
<p>Put quite simply, a graph is a collection of nodes and the edges between the nodes. In the below diagram, the white circles represent the nodes, and they are connected with edges, the red colored lines. </p>
<p>You could continue adding nodes and edges to the graph. You could also add directions to the edges which would make it a directed graph.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/01/image-89.png" alt="Image" width="600" height="400" loading="lazy">
<em>A simple representation of a graph</em></p>
<p>Something quite handy is the adjacency matrix which is a way to express the graph. The values of this matrix (A_{ij}) are defined as:</p>
<p>$$A_{ij} = \left\{\begin{array}{ c l }1 &amp; \quad \textrm{if there exists an edge } j \rightarrow i \\  0  &amp; \quad \textrm{if no edge exists} \end{array} \right.$$</p><p>Another way to represent the adjacency matrix is simply flipping the direction so in the same equation (A_{ij}) will be 1 if there is an edge (i \rightarrow j) instead. </p>
<p>The later representation is in fact what I studied in school. But often in Machine Learning papers, you will find the first notation used – so for this article we will stick to the first representation.</p>
<p>There are a lot interesting things you might notice from the adjacency matrix. First of all, you might notice that if the graph is undirected, you essentially end up with a symmetric matrix and more interesting properties, especially with the eigen values of this matrix. </p>
<p>One such interpretation which would be helpful in the context is taking powers of the matrix ((A^n)_{ij}) gives us the number of (directed or undirected) walks of length (n) between nodes (i) and (j).</p>
<h2 id="heading-why-work-with-data-in-graphs">Why work with data in Graphs?</h2>
<p>Well graphs are used in all kinds of common scenarios, and they have many possible applications. </p>
<p>Probably the most common application of representing data with graphs is using molecular graphs to represent chemical structures. These have helped predict bond lengths, charges, and new molecules.</p>
<p>With molecular graphs, you can use Machine Learning to predict if a molecule is a potent drug. </p>
<p>For example, you could train a graph neural network to predict if a molecule will inhibit certain bacteria and train it on a variety of compounds you know the results for.</p>
<p>Then you could essentially apply your model to any molecule and end up discovering that a previously overlooked molecule would in fact work as an excellent antibiotic. This is how <a target="_blank" href="https://www.sciencedirect.com/science/article/pii/S0092867420301021">Stokes et al.</a> in their paper (2020) predicted a new antibiotic called Halicin.</p>
<p>Another interesting paper by DeepMind (<a target="_blank" href="https://arxiv.org/abs/2108.11482">ETA Prediction with Graph Neural Networks in Google Maps</a>, 2021) modeled transportation maps as graphs and ran a graph neural network to improve the accuracy of ETAs by up to 50% in Google Maps. </p>
<p>In this paper they partition travel routes into super segments which model a part of the route. This gave them a graph structure to operate over on which they run a graph neural network.</p>
<p>There have been other interesting papers that represent naturally occurring data as graphs (social networks, electrical circuits, Feynman diagrams and more) that made significant discoveries as well. </p>
<p>And if you think abut it, a standard neural network can be represented as a graph too 🤯.</p>
<h2 id="heading-what-can-we-do-with-graph-neural-networks">What can we do with Graph Neural Networks?</h2>
<p>Let's first start with what we might want to do with our graph neural network before understanding how we would do that. </p>
<p>One kind of output we might want from our graph neural network is on the entire graph level, to have a single output vector. You could relate this kind of output with the ETA prediction or predicting binding energy from a molecular structure from the examples we talked about.</p>
<p>Another kind of output you might want is the node or edge level predictions and end up with a vector for each node or edge. You could relate this with an example where you need to <em>rank</em> every node in the prediction or probably predict the bond angle for all bonds given the molecular structure.</p>
<p>You might also be interested in answering the question "Where should I place a new edge or a node" or predict where an edge or a node might appear. We could not only get that prediction from the graph, but then we could also turn some other data into a graph.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/01/image-90.png" alt="Image" width="600" height="400" loading="lazy">
<em>Defining what we want our GNN to do</em></p>
<p>As you might have guessed with the graph neural network, we first want to generate an output graph or latents from which we would then be able to work on this wide variety of standard tasks. </p>
<p>So essentially what we need to do <em>from the latent graph</em> (features for each node represented as (\vec{h_i})) for the graph level predictions is:</p>
<ul>
<li>first figure out some way to aggregate all the vectors (like simply summing), and </li>
<li>then create some function to get the predictions:</li>
</ul>
<p>$$\vec{Z_G} = f(\sum_i \vec{h_i})$$</p><p>And now it is quite simple to show on a high level what we need to do from the latents to get our outputs. </p>
<p>For node level outputs we would just have one node vector passed into our function and get the predictions for that node:</p>
<p>$$\vec{Z_i} = f(\vec{h_i})$$</p><h2 id="heading-the-problem-with-variable-sized-inputs">The problem with variable sized inputs</h2>
<p>Now that we know what we can do with the graph neural networks and why you might want to represent your data in graphs, let's see how we would go about training on graph data.</p>
<p>But first off, we have a problem on our hands: graphs are essentially variable size inputs. In a standard neural network, as shown in the figure below, the input layer (shown in the figure as (x_i)) has a fixed number of neurons. In this network you cannot suddenly apply the network to a variable sized input.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/01/image-99.png" alt="Image" width="600" height="400" loading="lazy">
<em>Why the standard neural network won't work?</em></p>
<p>But if you recall, you can apply convolutional neural networks on variable sized inputs. </p>
<p>Let's put this in terms of an example: you have a convolution with the filter count (K=5), spatial extent (F=2), stride (S=4), and no zero padding (P=0). You can pass in ((256 \times 256 \times 3)) inputs and get ((64 \times 64 \times 5)) outputs ((\left \lfloor{\frac{256-2+0}{4}+1}\right \rfloor)) and you can also pass ((96 \times 96 \times 6)) inputs and get ((24 \times 24 \times 5)) outputs and so on – it is essentially independent of size. </p>
<p>This does make us wonder if we can draw some inspiration from convolutional neural networks.</p>
<p>Another really interesting way of solving the problem of variable input sizes that takes inspiration from Physics comes from the paper <a target="_blank" href="https://arxiv.org/abs/2002.09405">Learning to Simulate Complex Physics with Graph Networks</a> by DeeepMind (2020). </p>
<p>Let's start off by taking some particles (i) and each of those particles have a certain location (\vec{r_i}) and some velocity (\vec{v_i}). Let's say that these particles have springs in between them to help us understand any interactions.</p>
<p>Now this system is, of course, a graph: you can take the particles to be nodes and the springs to be edges. If you now recall simple high-school physics, (force = mass \cdot acceleration) – and, well, what is another way in this system to denote the total force acting on the particle? It is the sum of forces acting on all neighboring particles. </p>
<p>You can now write ((e_{ij}) represents the properties of the edge or spring between i and j):</p>
<p>$$m\frac{\mathrm{d} \vec{v_i}}{\mathrm{d}t} = \sum_{j \in \textrm{ neighbours of } i } \vec{F}(\vec{r_i}, \vec{r_j}, e_{ij})$$</p><p>Something I would like to draw your attention to here is that this force law is always the same. Maybe there are differences in the properties of the spring or edge, but you can still apply the same law. You can have different numbers of nodes and edges and you can still apply the exact same equation of motion.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/01/download-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Visualizing the presented solutions to variable sized inputs</em></p>
<p>If you look closely, the intuitions we discussed to get around the problem of fixed inputs have an aspect of similarity to them: it is fairly clear in writing that the second approach takes into account the neighboring nodes and edges and creates some function (here force) of it. I wanted to point out that the way convolutional neural networks work is not much different.</p>
<h2 id="heading-how-to-learn-from-data-in-a-graph">How to learn from data in a graph</h2>
<p>Now that we've discussed what might give us inspiration to create a graph neural network, let's now try actually building one. Here we'll see how we can learn from the data residing in a graph. </p>
<p>We will start by talking about "<strong>Neural Message Passing</strong>" which is <em>analogous</em> to filters in a convolutional neural network or force which we talked about in the earlier section.</p>
<p>So let's say we have a graph with 3 nodes (directed or undirected). As you might have guessed, we have a corresponding value for each node (x_1), (x_2) and (x_3). </p>
<p>Just like any neural network, our goal is to find an algorithm to update these node values which is analogous to a layer in the graph neural network. And then you can of course keep on adding such layers.</p>
<p>So how do you do these updates? One idea would be to use the edges in our graph. For the purposes of this article, let's assume that from the 3 nodes we have an edge pointing from (x_3 \rightarrow x_1). We can send a message along this edge which will carry a value that will be computed by some neural network. </p>
<p>For this case we can write this down like below (and we will break down what this means too):</p>
<p>$$\vec{m_{31}}=f_e(\vec{h_3}, \vec{h_1}, \vec{e_{31}})$$</p><p>We will use our same notations:</p>
<ul>
<li>(m_31) is the message passed from node 3 to node 1, </li>
<li>(\vec{h_3}) is the value node 3 has, </li>
<li>(\vec{e_{31}}) is the value of edge between node 3 and node 1, and </li>
<li>(f_e) represents the "some neural network" function which depends on all these values often called the message function.</li>
</ul>
<p>And let's say we have an edge from (x_2 \rightarrow x_1) as well. We can apply the same expression we created above, just replacing the node numbers. </p>
<p>If you have more nodes, you would want to do this for every edge pointing to node 1. And the easiest way to accumulate all these is to simply sum them up. Look closely and you will see this is really similar to the intuition from particles we had discussed earlier!</p>
<p>Now you have an aggregated value of the messages coming to node 2 but you still need to update its weights. So we will use another neural network (f_v) often called the update network. It depends on two things: your original value of node 3 of course and the aggregate of the messages we had. </p>
<p>Simply putting these together not just for node 3 in our example but for any node in any graph, we can write it down as:</p>
<p>$$\vec{h_i^{\prime}} = f_v(h_i, \sum_{j \in N_i} \vec{m_{ij}})$$</p><p>(\vec{h<em>i^{\prime}}) are our update node values, and (\vec{m</em>{ij}}) is the messages coming to node (i) we calculate earlier. </p>
<p>You would then apply these same two neural networks (f_e) and (f_v) for each of the nodes comprising the graph. </p>
<p>A really important thing to note here is that the two neural networks where we have to update our node values operate on <strong>fixed sized</strong> inputs like a standard neural network. Generally the two neural networks we spoke of (f_e) and (f_v) are small MLPs.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/01/image-130.png" alt="Image" width="600" height="400" loading="lazy">
<em>Visualizing Message Passing Neural Networks</em></p>
<p>Earlier we talked about the different kind of outputs we are interested in obtaining from our graph neural networks. You might have already noticed that when training our model the way we talked about, we will be able to generate the node level predictions: a vector for each node. </p>
<p>To perform graph classification, we want to try and aggregate all the node values we have after training our network. We will use a readout or pooling layer (quite clear how the name comes). </p>
<p>Generally we can create a function (f_r) depending on the set of node values. But it should also be permutation independent (should not matter on your choice of labelling the nodes), and it should look something like this:</p>
<p>$$y^{\prime} = f_r({x_i \vert i \in \textrm{ graph} })$$</p><p>The simplest way to define a readout function would be by summing over all node values. Then finding the mean, maximum, or minimum, or even a combination of these or other permutation invariant properties best suiting the situation. Your (f_r), as you might have guessed, can also be a neural network which is often used in practice.</p>
<p>The ideas and intuitions we just talked about create the Message Passing Neural Networks (MPNNs), one of the most potent graph neural networks first proposed in <a target="_blank" href="http://proceedings.mlr.press/v70/gilmer17a.html">Neural Message Passing for Quantum Chemistry</a> (Gilmer et al. 2017).</p>
<h3 id="heading-how-to-change-edge-values">How to change edge values</h3>
<p>It now seems like we have indeed created a general graph neural network. But you can see that our message network requires (e_{ij}), the edge property – just as you randomly initialize node values at start. </p>
<p>But while the node values get changed at each step, the edge values are also initialized by you – but they're not changed. So, we need to try and generalize this as well, an extension to what we just saw.</p>
<p>Understanding how the node updates work, I think you can very easily apply something similar for an edge update function as well. </p>
<p>(U_{edge}) is another standard neural network:</p>
<p>$$e_{ij}^{\prime} = U_{edge}(e_{ij}, x_i, x_j)$$</p><p>Something you could also do with this framework is that the outputs by (U_{edge}) are already edge level properties – so why not just use them as my message? Well, you could do this as well.</p>
<h3 id="heading-message-passing-neural-network-discussion">Message Passing Neural Network discussion</h3>
<p>Message Passing Neural Networks (MPNN) are the most general graph neural network layers. But this does require storage and manipulation of edge messages as well as the node features. </p>
<p>This can get a bit troublesome in terms of memory and representation. So sometimes these do suffer from scalability issues, and in practice are applicable to small sized graphs. </p>
<p>As Petar Veličković says "MPNNs are the MLPs of the graph domain". We will be looking at some extensions of MPNNs as well as how to implement an MPNN in code.</p>
<p>You can quite easily apply exactly what we talked about in either PyTorch or TensorFlow – but try doing so and you will see that this just blows up the memory.</p>
<p>Usually what we do with standard neural networks is work on batches of data. So you usually pass in an input array of shape [batch size, # of input neurons] to the neural network to make it work efficiently. </p>
<p>Now our number of input neurons here are not the same as highlighted earlier, and yes, convolutional neural networks do deal with arbitrary sized images. But when you think in terms of batches, you need all the images to be the same dimensions.</p>
<p>There are multiple things you could do:</p>
<ul>
<li>Operate with a single graph at a time (of course very inefficient)</li>
<li>You could also aggregate your graphs into one big graph and not allow messages to pass from one of the smaller graphs to another smaller graph. This would introduce complications when doing graph level predictions and you would have to adapt your readout function.</li>
<li>You could also use Ragged Tensors which are variable length tensors: a great tutorial can be found <a target="_blank" href="https://www.tensorflow.org/guide/ragged_tensor">here</a>.</li>
<li>Take inspiration from CNNs again: you could use padding so your batch has, for example, graphs with different sizes. So you just take a graph with 7 nodes and set the remaining 3 nodes to be 0. It's similar with a graph with 8 nodes, set the remaining 2 nodes to be 0.</li>
</ul>
<h2 id="heading-other-popular-gnn-architectures">Other popular GNN architectures</h2>
<p>In this section I will give you an overview of some other widely used graph neural network layers. </p>
<p>We won't be looking at the intuition behind any of these layers and how each part pieces together in the update function. Instead I'll just give you a high level overview of these methods. You could most certainly read the original papers to get a better understanding.</p>
<h3 id="heading-graph-convolutional-networks">Graph Convolutional Networks</h3>
<p>One of the most popular GNN architectures is <a target="_blank" href="https://arxiv.org/abs/1609.02907">Graph Convolutional Networks</a> (GCN) by Kipf et al. which is essentially a spectral method. </p>
<p>Spectral methods work with the representation of a graph in the <a target="_blank" href="https://arxiv.org/abs/1312.6203">spectral domain</a>. Spectral here means that we will utilize the Laplacian eigenvectors. </p>
<p>GCNs are based on top of ChebNets which propose that the feature representation of any vector should be affected only by his k-hop neighborhood. We would compute our convolution using Chebyshev polynomials.</p>
<p>In a GCN this is simplified to (K=1). We will start off by defining a degree matrix (row wise summation of adjacency matrix):</p>
<p>​$$\tilde{D}_{ij}=\sum<em>j\tilde{A}</em>{ij}$$</p>
<p>The graph convolutional network update rule after using a symmetric normalization can be written where H is the feature matrix and W is the trainable weight matrix:</p>
<p>$$H^{\prime}=\sigma(\tilde{D}^{-1/2} \tilde{A}\tilde{D}^{-1/2} HW)$$</p><p>Node-wise, you can write this as where (N_i) and (N_j) are the sizes of the node neighborhoods:</p>
<p>$$\vec{h_i^{\prime}} = \sigma(\sum_{i \in N_j} \frac{1}{\sqrt{|N_i||N_j|}} W \vec{h_j^{\prime}} )$$</p><p>Of course with GCN you no longer have edge features, and the idea that a node can send a value across the graph which we had with MPNN we discussed earlier.</p>
<h3 id="heading-graph-attention-network">Graph Attention Network</h3>
<p>Recall the node-wise update rule in GCN we just saw? (\frac{1}{\sqrt{|N_i||N_j|}}) is derived from the degree matrix of the graph. </p>
<p>In <a target="_blank" href="https://arxiv.org/abs/1710.10903">Graph Attention Network</a> (GAT) by Veličković et al., this coefficient (\alpha_{ij}) is computed implicitly. So for a particular edge you take the features of the sender node, receiver node, and the edge features as well and pass them through an attention function.</p>
<p>$$a_{ij}=a(\vec{h_i}, \vec{h_j}, \vec{e_{ij}})$$</p><p>(a) could be any learnable, shared, self-attention mechanism like transformers. These could then be normalized with a softmax function across the neighborhood:</p>
<p>$$\alpha_{ij}=\frac{e^{a_{ij}}}{\sum_{k \in N_i} e^{a_{ik}}}$$</p><p>This constitutes the GAT update rule. The authors hypothesize that this could be significantly stabilized with multi-head self attention. Here is a visualization by the paper's authors showing a step of the GAT.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/01/image-131.png" alt="Image" width="600" height="400" loading="lazy">
<em>A single GAT step</em></p>
<p>This method is also very scalable because it had to compute a <em>scalar</em> for the influence form node i to node j and note a vector as in MPNN. But this is probably not as general as MPNNs, though.</p>
<h2 id="heading-code-implementation-for-graph-neural-networks">Code Implementation for Graph Neural Networks</h2>
<p>With multiple frameworks like PyTorch Geometric, TF-GNN, Spektral (based on TensorFlow) and more, it is indeed quite simple to implement graph neural networks. We will see a couple of examples here starting with MPNNs.</p>
<p>Here is how you create a message passing neural network similar to the one in the original paper "Neural Message Passing for Quantum Chemistry" with PyTorch Geometric:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">import</span> torch.nn.functional <span class="hljs-keyword">as</span> F
<span class="hljs-keyword">import</span> torch_geometric.transforms <span class="hljs-keyword">as</span> T
<span class="hljs-keyword">from</span> torch_geometric.utils <span class="hljs-keyword">import</span> normalized_cut
<span class="hljs-keyword">from</span> torch_geometric.nn <span class="hljs-keyword">import</span> NNConv, global_mean_pool, graclus, max_pool, max_pool_x


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">normalized_cut_2d</span>(<span class="hljs-params">edge_index, pos</span>):</span>
    row, col = edge_index
    edge_attr = torch.norm(pos[row] - pos[col], p=<span class="hljs-number">2</span>, dim=<span class="hljs-number">1</span>)
    <span class="hljs-keyword">return</span> normalized_cut(edge_index, edge_attr, num_nodes=pos.size(<span class="hljs-number">0</span>))


<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Net</span>(<span class="hljs-params">nn.Module</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        super().__init__()
        nn1 = nn.Sequential(
            nn.Linear(<span class="hljs-number">2</span>, <span class="hljs-number">25</span>), nn.ReLU(), nn.Linear(<span class="hljs-number">25</span>, d.num_features * <span class="hljs-number">32</span>)
        )
        self.conv1 = NNConv(d.num_features, <span class="hljs-number">32</span>, nn1, aggr=<span class="hljs-string">"mean"</span>)

        nn2 = nn.Sequential(nn.Linear(<span class="hljs-number">2</span>, <span class="hljs-number">25</span>), nn.ReLU(), nn.Linear(<span class="hljs-number">25</span>, <span class="hljs-number">32</span> * <span class="hljs-number">64</span>))
        self.conv2 = NNConv(<span class="hljs-number">32</span>, <span class="hljs-number">64</span>, nn2, aggr=<span class="hljs-string">"mean"</span>)

        self.fc1 = torch.nn.Linear(<span class="hljs-number">64</span>, <span class="hljs-number">128</span>)
        self.fc2 = torch.nn.Linear(<span class="hljs-number">128</span>, d.num_classes)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self, data</span>):</span>
        data.x = F.elu(self.conv1(data.x, data.edge_index, data.edge_attr))
        weight = normalized_cut_2d(data.edge_index, data.pos)
        cluster = graclus(data.edge_index, weight, data.x.size(<span class="hljs-number">0</span>))
        data.edge_attr = <span class="hljs-literal">None</span>
        data = max_pool(cluster, data, transform=transform)

        data.x = F.elu(self.conv2(data.x, data.edge_index, data.edge_attr))
        weight = normalized_cut_2d(data.edge_index, data.pos)
        cluster = graclus(data.edge_index, weight, data.x.size(<span class="hljs-number">0</span>))
        x, batch = max_pool_x(cluster, data.x, data.batch)

        x = global_mean_pool(x, batch)
        x = F.elu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        <span class="hljs-keyword">return</span> F.log_softmax(self.fc2(x), dim=<span class="hljs-number">1</span>)
</code></pre>
<p>You can find a complete Colab Notebook demonstrating the implementation <a target="_blank" href="https://colab.research.google.com/drive/11gtwzl_E4TWqEswwv5mZh4ZWHRz0b3PA?usp=sharing">here</a>, and it is indeed quite heavy. It is quite simple to implement this in TensorFlow as well, and you can find a full length tutorial on <a target="_blank" href="https://keras.io/examples/graph/mpnn-molecular-graphs">Keras Examples here</a>.</p>
<p>Implementing a GCN is also quite simple with PyTorch Geometric. You can easily implement it with TensorFlow as well, and you can find a complete Colab Notebook <a target="_blank" href="https://colab.research.google.com/drive/1Dgs2rpYleGGTYg0ciCX792zGpfQrtp4p?usp=sharing">here</a>.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Net</span>(<span class="hljs-params">torch.nn.Module</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        super().__init__()
        self.conv1 = GCNConv(dataset.num_features, <span class="hljs-number">16</span>, cached=<span class="hljs-literal">True</span>,
                             normalize=<span class="hljs-keyword">not</span> args.use_gdc)
        self.conv2 = GCNConv(<span class="hljs-number">16</span>, dataset.num_classes, cached=<span class="hljs-literal">True</span>,
                             normalize=<span class="hljs-keyword">not</span> args.use_gdc)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self</span>):</span>
        x, edge_index, edge_weight = data.x, data.edge_index, data.edge_attr
        x = F.relu(self.conv1(x, edge_index, edge_weight))
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index, edge_weight)
        <span class="hljs-keyword">return</span> F.log_softmax(x, dim=<span class="hljs-number">1</span>)
</code></pre>
<p>And now let's try implementing a GAT. You can find the complete Colab Notebook <a target="_blank" href="https://colab.research.google.com/drive/1gzRJsRbUUVesxj5bxMz3zkapwdeTuR8F?usp=sharing">here</a>.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Net</span>(<span class="hljs-params">torch.nn.Module</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, in_channels, out_channels</span>):</span>
        super().__init__()

        self.conv1 = GATConv(in_channels, <span class="hljs-number">8</span>, heads=<span class="hljs-number">8</span>, dropout=<span class="hljs-number">0.6</span>)
        <span class="hljs-comment"># On the Pubmed dataset, use heads=8 in conv2.</span>
        self.conv2 = GATConv(<span class="hljs-number">8</span> * <span class="hljs-number">8</span>, out_channels, heads=<span class="hljs-number">1</span>, concat=<span class="hljs-literal">False</span>,
                             dropout=<span class="hljs-number">0.6</span>)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self, x, edge_index</span>):</span>
        x = F.dropout(x, p=<span class="hljs-number">0.6</span>, training=self.training)
        x = F.elu(self.conv1(x, edge_index))
        x = F.dropout(x, p=<span class="hljs-number">0.6</span>, training=self.training)
        x = self.conv2(x, edge_index)
        <span class="hljs-keyword">return</span> F.log_softmax(x, dim=<span class="hljs-number">-1</span>)
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Thank you for sticking with me until the end. I hope that you've taken away a thing or two about graph neural networks and enjoyed reading through how these intuitions for graph neural networks form in the first place.</p>
<p>If you learned something new or enjoyed reading this article, please share it so that others can see it. Until then, see you in the next post!</p>
<p>Lastly, for the motivated reader, among others I would also encourage you to read the original paper "The Graph Neural Network Model" where GNN was first proposed, as it is really interesting. An open-access archive of the paper can be found <a target="_blank" href="https://persagen.com/files/misc/scarselli2009graph.pdf">here</a>. This article also takes inspiration from <a target="_blank" href="https://www.youtube.com/watch?v=uF53xsT7mjc">Theoretical Foundations of Graph Neural Networks</a> and <a target="_blank" href="http://web.stanford.edu/class/cs224w/index.html">CS224W</a> which I suggest you to check out.</p>
<p>You can also find me on Twitter <a target="_blank" href="https://twitter.com/rishit_dagli">@rishit_dagli</a>, where I tweet about machine learning, and a bit of Android.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Improve the Accuracy of Your Image Recognition Models ]]>
                </title>
                <description>
                    <![CDATA[ These 7 tricks and tips will take you from 50% to 90% accuracy for your image recognition models in literally minutes. So, you have gathered a dataset, built a neural network, and trained your model. But despite the hours (and sometimes days) of work... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/improve-image-recognition-model-accuracy-with-these-hacks/</link>
                <guid isPermaLink="false">66d45f379208fb118cc6cfc9</guid>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image recognition ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Jason ]]>
                </dc:creator>
                <pubDate>Mon, 29 Nov 2021 17:09:30 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/11/image-recognition-model-image.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>These 7 tricks and tips will take you from 50% to 90% accuracy for your image recognition models in literally minutes.</p>
<p>So, you have gathered a dataset, built a neural network, and trained your model.</p>
<p>But despite the hours (and sometimes days) of work you've invested to create the model, it spits out predictions with an accuracy of 50–70%. Chances are, this is not what you expected.</p>
<p>Here are a few strategies, or hacks, to boost your model’s performance metrics.</p>
<h2 id="heading-1-get-more-data">1. Get More Data</h2>
<p>Deep learning models are only as powerful as the data you bring in. One of the easiest ways to increase validation accuracy is to add more data. This is especially useful if you don’t have many training instances.</p>
<p>If you’re working on image recognition models, you may consider increasing the diversity of your available dataset by employing data augmentation. These techniques include anything from flipping an image over an axis and adding noise to zooming in on the image. If you are a strong machine learning engineer, you could also try data augmentation with GANs.</p>
<p>Read more about <a target="_blank" href="https://bair.berkeley.edu/blog/2019/06/07/data_aug/">data augmentation here</a>.</p>
<p>Keras has an amazing image preprocessing class to perform data augmentation: <a target="_blank" href="https://keras.io/api/preprocessing/image/#imagedatagenerator-class">ImageDataGenerator</a>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-119.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Be careful that the augmentation technique you use changes the entire class of an image. For example, the image of a 3 flipped over the y-axis doesn’t make sense! [Source](https://bair.berkeley.edu/blog/2019/06/07/data_aug/" rel="noopener)</em></p>
<h2 id="heading-2-add-more-layers">2. Add More Layers</h2>
<p>Adding more layers to your model increases its ability to learn your dataset’s features more deeply. This means that it will be able to recognize subtle differences that you, as a human, might not have picked up on.</p>
<p>This hack entirely relies on the nature of the task you are trying to solve.</p>
<p>For complex tasks, such as differentiating between the breeds of cats and dogs, adding more layers makes sense because your model will be able to learn the subtle features that differentiate a poodle from a Shih Tzu.</p>
<p>For simple tasks, such as classifying cats and dogs, a simple model with few layers will do.</p>
<p>More layers -&gt; More nuanced model.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-120.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Photo by [Unsplash](https://unsplash.com/@alvannee?utm_source=medium&amp;utm_medium=referral" rel="photo-creator noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener"&gt;Alvan Nee on &lt;a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral" rel="photo-source noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener)</em></p>
<h2 id="heading-3-change-your-image-size">3. Change Your Image Size</h2>
<p>When you preprocess your images for training and evaluation, there is a lot of experimentation you can do with regards to the image size.</p>
<p>If you choose an image size that is too small, your model will not be able to pick up on the distinctive features that help with image recognition.</p>
<p>Conversely, if your images are too big, it increases the computational resources required by your computer and/or your model might not be sophisticated enough to process them.</p>
<p>Common image sizes include 64x64, 128x128, 28x28 (MNIST), and 224x224 (VGG-16).</p>
<p>Keep in mind that most preprocessing algorithms do not consider the aspect ratio of the image, so smaller-sized images might appear to have shrunk over a certain axis.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-121.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Converting an image from a large resolution to a small size, like 28x28, usually ends up with a lot of pixelation that tends to have negative effects on your model’s performance. [Source](https://dribbble.com/shots/4829233-Pixelated-Mona-Lisa" rel="noopener)</em></p>
<h2 id="heading-4-increase-epochs">4. Increase Epochs</h2>
<p><em>Epochs</em> are basically how many times you pass the entire dataset through the neural network. Incrementally train your model with more epochs with intervals of +25, +100, and so on.</p>
<p>Increasing epochs makes sense only if you have a lot of data in your dataset. However, your model will eventually reach a point where increasing epochs will not improve accuracy.</p>
<p>At this point, you should consider playing around with your model’s learning rate. This little hyperparameter dictates whether your model reaches its global minimum (the ultimate goal for neural nets) or gets stuck in a local minimum.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-122.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Global Minimum is the ultimate goal for neural networks. [Source](https://www.dna-ghost.com/single-post/2018/03/13/Neural-network-Escaping-from-variety-of-non-global-minimum-traps" rel="noopener)</em></p>
<h2 id="heading-5-decrease-colour-channels">5. Decrease Colour Channels</h2>
<p>Colour channels reflect the dimensionality of your image arrays. Most colour (RGB) images are composed of three colour channels, while grayscale images have just one channel.</p>
<p>The more complex the colour channels are, the more complex the dataset is and the longer it will take to train the model.</p>
<p>If colour is not such a significant factor in your model, you can go ahead and convert your colour images to grayscale.</p>
<p>You can even consider other colour spaces, like HSV and L<em>a</em>b.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-123.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>RGB images are composed of three colour channels: red, green, and blue. [Source](https://www.youtube.com/watch?v=ZqUotba3V5Y" rel="noopener)</em></p>
<h2 id="heading-6-transfer-learning">6. Transfer Learning</h2>
<p>Transfer learning involves the use of a pre-trained model, such as YOLO and ResNet, as a starting point for most computer vision and natural language processing tasks.</p>
<p>Pre-trained models are state-of-the-art deep learning models that were trained on millions and millions of samples, and often for months. These models have an astonishingly huge capability of detecting nuances in different images.</p>
<p>These models can be used as a base for your model. Most models are so good that you won’t need to add convolutional and pooling Layers.</p>
<p>Read more about <a target="_blank" href="https://machinelearningmastery.com/transfer-learning-for-deep-learning/">using transfer learning</a>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-124.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Transfer learning can greatly improve your model’s accuracy from ~50% to 90%! Source: [Nvidia blog](https://www.nvidia.com/content/dam/en-zz/en_sg/ai-innovation-day-2019/assets/pdf/9_NVIDIA-Transfer-Learning-Toolkit-for-Intelligent-Video-Analytics.pdf" rel="noopener)</em></p>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>The hacks above offer a base for you to optimize a model. To really fine tune a model, you’ll need to consider tuning the various hyperparameters and functions involved in your model, such as the learning rate (as discussed above), activation functions, loss functions, and so on.</p>
<p>This hack comes as an “I hope you know what you’re doing” warning because there is a wider scope to mess up your model.</p>
<h3 id="heading-always-save-your-models">Always Save Your Models</h3>
<p>Always save your model every time you make a change to your deep learning model. This will help you reuse a previous configuration of the model if it provides greater accuracy.</p>
<p>Most deep learning frameworks like Tensorflow and Pytorch have a “save model” method.</p>
<pre><code class="lang-python"><span class="hljs-comment"># In Tensorflow</span>
model.save(<span class="hljs-string">'model.h5'</span>) <span class="hljs-comment"># Saves the entire model to a single artifact</span>

<span class="hljs-comment"># In Pytorch</span>
torch.save(model, PATH)
</code></pre>
<p>There are countless other ways to further optimize your deep learning, but the hacks described above serve as a base in the optimization part of deep learning.</p>
<p><a target="_blank" href="http://twitter.com/jasmcaus"><em>Tweet at me</em></a> letting me know what your favourite hack is!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Deep Learning Tutorial – How to Use PyTorch and Transfer Learning to Diagnose COVID-19 Patients ]]>
                </title>
                <description>
                    <![CDATA[ Ever since the outbreak of COVID-19 in December 2019, researchers in the field of artificial intelligence and machine learning have been trying to find better ways to diagnose the disease. They've worked on developing algorithms that would detect the... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/deep-learning-with-pytorch/</link>
                <guid isPermaLink="false">66d039da64be048ac359a35b</guid>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                    <category>
                        <![CDATA[ pytorch ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Transfer learning ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Juan Cruz Martinez ]]>
                </dc:creator>
                <pubDate>Wed, 03 Nov 2021 19:49:35 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/11/Featured-Orange.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Ever since the outbreak of COVID-19 in December 2019, researchers in the field of artificial intelligence and machine learning have been trying to find better ways to diagnose the disease.</p>
<p>They've worked on developing algorithms that would detect the disease within a matter of seconds – and only by looking at chest X-rays and/or CT scan images. </p>
<p>Some of these techniques have proven to be extremely useful and accurate in diagnosing COVID-19 cases.</p>
<p>There are multiple approaches that use both machine and deep learning to detect and/or classify of the disease. And researches have proposed newly developed architectures along with transfer learning approaches. </p>
<p>In this article, we will look at a transfer learning approach that classifies COVID-19 cases using chest X-ray images. </p>
<p>The model we are going to use is one of the seven variants of the EfficientNet architecture. We will use a pre-trained model on the immense ImageNet dataset. EfficientNet is an advanced and complex convolutional neural network-based architecture. </p>
<p>We will further investigate the details of Convolutional Neural Networks, pre-trained models, and EfficientNet during the course of this article. I've divided it into five parts:</p>
<ol>
<li>What are convolutional neural networks?</li>
<li>A dive into transfer learning.</li>
<li>What is EfficientNet?</li>
<li>An introduction to PyTorch.</li>
<li>Implementation of COVID-19 classifier using EfficientNet with PyTorch.</li>
</ol>
<p>This tutorial assumes that you have prior knowledge of both machine learning and deep learning. If you want to further develop your foundation in these topics, check out this article on <a target="_blank" href="https://livecodestream.dev/post/artificial-intelligence-vs-machine-learning-vs-deep-learning/">Artificial Intelligence vs Machine Learning vs Deep Learning</a>.</p>
<p>Also, although the dataset we'll work with here is COVID-related, you can apply the actual code implementation and analysis to other datasets.</p>
<h2 id="heading-what-is-a-convolutional-neural-network">What is a Convolutional Neural Network?</h2>
<p>Convolutional Neural networks (CNNs) are a type of deep neural network that works on visual data – this is, images. A CNN takes an image as an input and performs two or three-dimensional convolutional operations on the image with several filters, also referred to as kernels. </p>
<p>These convolution operations output a 2D or 3D matrix which contains the learnable weights and biases regarding the spatial information of the input image. This output matrix is referred to as the feature map of the image.</p>
<p>Processing a convolutional neural network in the training process can be, in some cases, extremely slow. This is why it's a good idea to use GPUs and TPUs during training for deep learning techniques, especially convolutional neural networks.</p>
<p>Convolutional neural networks learn spatial and temporal information about the image far better than the basic feed forward neural network. Also, CNNs can reduce the size of the image while retaining the most important information in the image, which is crucial for predictive analysis of images.</p>
<p><img src="https://lh6.googleusercontent.com/vma10ZOrxzyEEbJVvIZuygeDyqlkAKEUxWkJ8of7spwvrA9zktP1FYJQWZC6ZhMqrP2V0gMh04nqb74gNGNM3eO_g1ZwuvI753j-oS7fN_E0Txn4T3TXTW65MG3ubi67pBcX19o" alt="Deep Learning – Introduction to Convolutional Neural Networks - Vinod  Sharma's Blog" width="600" height="400" loading="lazy">
<em><a target="_blank" href="https://i0.wp.com/vinodsblog.com/wp-content/uploads/2018/10/CNN-2.png?resize=1300%2C479&amp;ssl=1">Source</a></em></p>
<p>The starting layers of convolutional neural networks learn the abstract and simpler features in an image, such as lines and edges. But as we move deeper into the network, the feature map turns to the more complex structures in the image. </p>
<p>It starts to learn the more specific features of the image, such as a cat, a dog, or a person, the same way we would, as humans, perceive the world around us. This is a core concept in modern deep learning-based computer vision. </p>
<p>Now before we move on to advanced concepts, it is important to learn the basics of 2D convolution.</p>
<h2 id="heading-what-is-2d-convolution">What is 2D Convolution?</h2>
<p>2D convolution is a bit complex to explain, but here it goes: if the convolutional process (which is extensively used in <a target="_blank" href="https://www.tutorialspoint.com/signals_and_systems/convolution_and_correlation.htm">h1-D signal processing</a>) is performed between two signals – but not just along a single dimension, rather along two mutually perpendicular dimensions – it is called 2D convolution. </p>
<p>In the case of images, the two mutually perpendicular dimensions are the rows and columns of a greyscale image. The convolutional operation is mathematically done by multiplying and then accumulating the values of the overlapping samples of the two input signals, where one of the signals is flipped. The output of this multiplication and accumulation gives a single point on the feature map.</p>
<p>In the case of CNNs, the image is one signal and the filter/kernel is the second signal which is flipped. The size of the kernel is always smaller than that of the image. </p>
<p>The flipped kernel is then swept across the whole image both row by row and column by column to output the feature map.</p>
<p><img src="https://lh3.googleusercontent.com/p5ht8HdKUxxCwcNoas2qAusdT8dYq_XzLS2YqVORYqb0cCnXPPAlPu40Z73kVEXerQ5s6epDozQdYRsleeUncnSV4Opx2Q1CNk8wseTdXEPz8eHt5dJ0R2TSFnnhRRZzjO7xH4A" alt="https://miro.medium.com/max/700/1*kOThnLR8Fge_AJcHrkR3dg.gif" width="600" height="400" loading="lazy">
<em>2d convolution</em></p>
<p>Here a 3x3 kernel is swept across a 6x6 image to output a 4x4 feature map. As you can see, the dimensions of the output feature map are smaller than the input image. So there are a few concepts used in convolution to control the dimensions of the output feature map. These include padding, stride, and kernel size.</p>
<p><strong>Padding</strong> is the manual addition of rows and columns around the input to keep the output dimension the same as the input dimension or vary it. </p>
<p><strong>Stride</strong> refers to the jump the kernel takes during the sweep, both in columns and rows. In the example above, the stride of the convolution is 1 as the kernel is moving one unit in both rows and columns. </p>
<p><strong>Kernel size</strong> refers to the dimensions of the kernel used. Changing the dimensions of the kernel to be swept changes the output size of the feature map. </p>
<p>The image below describes the convolution with the same kernel size but with a padding of 1 and stride of 2.</p>
<p><img src="https://lh3.googleusercontent.com/ceNWhzTPHzqGi5wMyUrqCSS2kp6-mF75BHxlNaEnGVwrsIiGamEq4pm_Mndmaz0weJnZfgOnl7L0CPy1OF19lRyRTAkDWZEzREBr8H36_mW_6bqJ-P8XzuJqTbzwNvPKXd_7N9U" alt="https://miro.medium.com/max/395/1*1VJDP6qDY9-ExTuQVEOlVg.gif" width="600" height="400" loading="lazy"></p>
<p>The equation that describes the relationship of stride, padding, and kernel size to input and output dimensions is as follows:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-2.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The concept of 3D convolution is just an extension of 2D convolution where both the input image and the kernel are three-dimensional. </p>
<p>Like 2D convolution, we sweep the three-dimensional kernel across the whole image in two mutually perpendicular dimensions, namely the rows and the columns. </p>
<p>We do not usually sweep the kernel across the color channels because the kernel has the same third dimension, that is the channel length, as the original image. This gives an output feature map that is two-dimensional instead of three. </p>
<p>To learn more about the details of 3D convolution, you can read <a target="_blank" href="https://paperswithcode.com/method/3d-convolution">this article</a>.</p>
<h2 id="heading-what-is-transfer-learning">What is Transfer Learning?</h2>
<p>In transfer learning, you take a machine or deep learning model that is pre-trained on a previous dataset and use it to solve a different problem without needing to re-train the whole model. </p>
<p>Instead, you can just use the weights and biases of the pre-trained model to make a prediction. You transfer the weights from one model to your own model and adjust them to your own dataset without re-training all the previous layers of the architecture.</p>
<p>We use transfer learning in the applications of convolutional neural networks and natural language processing because it decreases the computation time and complexity of the training process. And, in many cases, it performs surprisingly well. </p>
<p>This also helps in cases where we have limited data available – since neural networks demand an extremely large amount of data to achieve good performance.</p>
<p>This means that using transfer learning methods can greatly reduce the demand for data since the weights and biases are pre-adjusted and are able to work better with just a small amount of data by tweaking the weights and biases a little.</p>
<p>But transfer learning models do not always give you great performance (although the newer architectures perform efficiently on almost every problem). Still, sometimes the problem at hand needs an architecture that is pre-trained on data that's similar to what you have. This factor depends upon the complexity of the problem you are trying to solve. </p>
<p>There are a couple ways you can perform transfer learning:</p>
<ol>
<li><strong>Using a pre-trained model.</strong></li>
<li><strong>Developing a new model.</strong></li>
</ol>
<p>You can use a pre-trained model in two ways. First, you can use the pre-trained weights and biases as initial parameters for your own model, and then train a whole convolutional model using those weights. </p>
<p>The other way is to perform feature extraction from the pre-trained model. You use the parameters of the pre-trained model to extract features from your input image and just train a simple classifier on top of it.</p>
<p>Another option is that if you have a problem with a small amount of data, you develop another model for a similar problem that has a large amount of data and train the model. Then you can use the trained weights from the new model to solve the original problem with less data. </p>
<p>In this tutorial, we will be using a pre-trained model as a feature extractor and we'll train a simple classifier on top of it to output the prediction.</p>
<p>There are many well-known architectures in the field of deep learning that are nowadays used for the purpose of transfer learning. Almost all of these are trained on the ImageNet dataset which is the largest open-source dataset available. It contains around 1000 classes and has around fifteen million instances. </p>
<p>Among these pre-trained architectures, LeNet is the first one that was proposed in 1998. Other well-known models include VGG, ResNet, AlexNet, GoogleNet, Inception, and Xception. </p>
<p>EfficientNet is also part of the series that was proposed recently, in 2019.</p>
<h2 id="heading-what-is-efficientnet">What is EfficientNet?</h2>
<p>EfficientNet (or perhaps it's better to say EfficientNets) is a family of convolutional neural network-based image classification models. They perform extremely well on the state-of-the-art ImageNet dataset and other popular datasets such as CIFAR-100 and Flowers. </p>
<p>In addition to performing so well, the architecture is small and computes faster than any of the previous models. The architecture has variants ranging from EfficientNet-B0 up to EffieicntNet-B7.</p>
<p>The variants ranging from B0 to B7 are based on the compound scaling method to scale up the baseline in B0 to obtain B1 to B7. EfficientNet-B7 acquired a Top-1 accuracy of 84.4% on the ImageNet dataset, which is the highest level of Top-1 accuracy ever achieved on ImageNet. </p>
<p>If you want to learn more about how EfficientNets work, you can read this paper ‘<a target="_blank" href="https://arxiv.org/abs/1905.11946v5">Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks.</a>’</p>
<p><img src="https://lh3.googleusercontent.com/FvX6r1u1vR9kfoSb7tJbQ5I7aDgGQNhZCtU_OTGkHpOLTX3ZZnc-zIc-AO1MLaE-eLCsyfaj_grRXAJapYb9pJqhbzwH5R0qcXAxGUWIsHqm9zvDy6h4EQB63GOwaFZP1fV43mk" alt="Image" width="600" height="400" loading="lazy">
<em><a target="_blank" href="https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet">Source</a></em></p>
<p>In the coding tutorial further along in this article, we'll be using the EfficientNet-B0 as a feature extractor and a classifier on top of it to classify COVID-19 using chest x-ray images.</p>
<h2 id="heading-an-introduction-to-pytorch">An Introduction to PyTorch</h2>
<p>PyTorch is a Python-supported library that helps us build deep learning models. Unlike Keras (another deep learning library), PyTorch is flexible and gives the developer more control. </p>
<p>It is similar to NumPy in processing but has a faster GPU acceleration. To learn more about NumPy and its features, you can check out <a target="_blank" href="https://www.freecodecamp.org/news/the-ultimate-guide-to-the-numpy-scientific-computing-library-for-python/">this in-depth guide</a> along with its <a target="_blank" href="https://numpy.org/doc/stable/user/whatisnumpy.html">documentation</a>.</p>
<p>PyTorch has a data structure known as a ‘Tensor’ that is similar to the NumPy ndarray but it has the option to operate on GPU. </p>
<p>PyTorch provides an uncomplicated way to switch computation between a CPU and a GPU. It also supports processing on NumPy arrays by simply providing a built-in module that can convert NumPy arrays into Tensors and vice versa.</p>
<p>One of the handiest modules in PyTorch is <code>grad()</code>. It allows you to compute the gradient of a tensor as it goes forward into processing without needing to manually compute the gradient and store it. </p>
<p>This gives you greater control of your deep learning operations, specifically back propagation, during the training process. This is helpful when computing the loss function which lets you adjust the parameters of a model. </p>
<p>We can also limit a tensor so that its gradient is not computed during the entire process by making the module's <code>requires_grad</code> equal <code>False</code>. To learn more about tensors and how to perform gradient computations in PyTorch, you can <a target="_blank" href="https://www.freecodecamp.org/news/pytorch-tensor-methods/">check out this tutorial</a> and <a target="_blank" href="https://www.freecodecamp.org/news/pytorch-full-course/">this course</a>.</p>
<h2 id="heading-how-to-implement-a-covid-19-classifier-using-efficientnet-with-pytorch">How to Implement a COVID-19 Classifier using EfficientNet with PyTorch</h2>
<p>Now let's move on to the practical implementation of EfficientNet in PyTorch. We will use the B0 variant of the EfficientNet family.</p>
<p>First, we'll examine the data and preprocess it. <a target="_blank" href="https://www.kaggle.com">Kaggle</a> has an vast library of datasets available for open-source use in projects and research. There are no limits as to what dataset can be used for this project. You can use any dataset containing chest X-ray images of COVID-19 patients and people without COVID. </p>
<p>For the sake of this tutorial, we'll use this dataset <a target="_blank" href="https://www.kaggle.com/asraf047/covid19-pneumonia-normal-chest-xray-pa-dataset">here</a>. But for the code to work on your custom dataset, you must divide your data into three directories: train, test, and valid. </p>
<p>Each directory should contain two more directories with the labels <code>covid</code> and <code>norma</code>l. These covid and normal folders will contain the images corresponding to the specific class of the directory they are present in.</p>
<p><img src="https://lh5.googleusercontent.com/aaZIPn8TEUsfqo3rA7xtJf7T-3PMSRU_jSZZ60DCeloIyadr40u1oguQycDMDeL-puqjdZ40xEGIu8i_PYdpufi_o-8pcGTlarJ37A_KJm_R0lV4mwGFKPAIhQmKd3Lr7b6dNHM" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The original dataset we'll use in this article contains three folders: covid, normal, and pneumonia. We discard the pneumonia folder completely and divide the other data in the same way described above. </p>
<p>We do this to create a logical division between the data used for training and the data used for testing and validation. Also, PyTorch, by default, takes the name of the folder, an instance it is present in, as the label of the class – so we do not have a label file corresponding to the input dataset.</p>
<h3 id="heading-the-data-and-the-architecture">The data and the architecture</h3>
<p>Let's have a look at the data. Below we can see the x-ray images of patients with COVID-19:</p>
<p><img src="https://lh4.googleusercontent.com/cB8kT-qcFsIqly9wi2yHiDZpD3of9wOgr7j9XggMWC0Yehva5H1QHiGmLq1g-qIz5wyk_6Kdy_roJiyTxUNFtPmGr6-0BKLy5KscJesZddQUGpKSDn8ZH5cRqDTWeSXswCxH8W8" alt="Image" width="600" height="400" loading="lazy"></p>
<p>And here we can see the normal category’s x-ray images:</p>
<p><img src="https://lh4.googleusercontent.com/CRu82skVkh6fIaLuSD5ucOyjhjCk9o_j6ZO0zQLw8J4_UKk5nSJhxfiEtdwhmSCFVakoG0RLSwr6IL7b-ij30thBD_S6WYumx6XUYLSMkPdHfjvxzAfuwF_MaoUG89VmFGXUa9Y" alt="Image" width="600" height="400" loading="lazy"></p>
<p>There are 237 total layers in the B-0 architecture. The whole architecture can be condensed into the following diagrams. We provide the x-ray data to the input layer.</p>
<p><img src="https://lh3.googleusercontent.com/de9n3HWqb4kqVLV4VkPiCphCbfSDDSmKFXu826ITg1Z-LkWaB28JCkzfVlHaOVSrBHbSToDe5k45-bSGwUpQLglgoa4ai_YhhYAe9_th6pJIKts64kzbhgNS3GihARgRscJABlw" alt="Image" width="600" height="400" loading="lazy"></p>
<p><img src="https://lh5.googleusercontent.com/rCTjM83oPyAi-RddlHJufeDAql0ee_ExJmxqTbL7BgPk6unoZXmL5cabb0zuDrM7EBdDupxE1YXOmRCQt5Ntyn2gZYpzdEDb7kI0ea3BifBZp3q1MBYkVzxV9N4Mwd-882ciO7o" alt="Image" width="600" height="400" loading="lazy"></p>
<p><img src="https://lh4.googleusercontent.com/sZ34-xflacMLYBg33trm8RxJypHPxRqAHWtt_dm8fEdwhW1eFV0eEL66g8Yr8GcX8mo_6Sz4N6PkL7M_UbhG7S5n1eU5dpyrKZoJL7ROQ8TQLJjh_Nm4vokmtwi-4pOfCMzFHRk" alt="Image" width="600" height="400" loading="lazy">
<em><a target="_blank" href="https://towardsdatascience.com/complete-architectural-details-of-all-efficientnet-models-5fd5b736142">Source</a></em></p>
<p>We will freeze the learning of the weights across all these blocks as we will be using the pre-trained weights to extract the features from our own input. </p>
<p>We'll do the feature extraction after the input passes Module 7. We then transfer the feature map obtained from Module 7 to our own final classification layers (this is why it's called transfer learning). We top the architecture with the following top layers:</p>
<ul>
<li>BatchNorm1d</li>
<li>Linear(output neurons = 512)</li>
<li>ReLU()</li>
<li>BatchNorm1d()</li>
<li>Linear(output neurons = 128)</li>
<li>ReLU()</li>
<li>BatchNorm1d()</li>
<li>Dropout(probability of zeroing the parameters = 0.4)</li>
<li>Linear(output neurons = 2)</li>
</ul>
<h3 id="heading-lets-head-over-to-the-code">Let's head over to the code</h3>
<p>Now before we start the code, there are a couple of dependencies we need to install. First, you'll need to install PyTorch on your local machine. You can do this using the pip install command in your Python environment. Refer <a target="_blank" href="https://pytorch.org/get-started/locally/">here</a> to install it depending on your machine (whether it has GPU available or not).</p>
<p>Before you move on to the code, I strongly recommend that you actually work through the code yourself. This makes it much easier to understand. With that said, you can access the full code in a Jupyter notebook <a target="_blank" href="https://drive.google.com/file/d/1m_ATQIrNN-dVVZwZjux5305yhuseZ58R/view?usp=sharing">here</a>.</p>
<p>You also need to install Efficientnet support for PyTorch into the same Python environment. Run the command below to install it:</p>
<pre><code class="lang-bash">pip install efficientnet_pytorch
</code></pre>
<p>Apart from this you will need to import some other dependencies at the start of the code.</p>
<p>Now we start building the classification model. To start, we import all the necessary modules:</p>
<pre><code class="lang-python"><span class="hljs-comment">#importing required modules</span>
<span class="hljs-keyword">import</span> gdown
<span class="hljs-keyword">import</span> zipfile
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> glob <span class="hljs-keyword">import</span> glob
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">from</span> torchsummary <span class="hljs-keyword">import</span> summary
<span class="hljs-keyword">from</span> torchvision <span class="hljs-keyword">import</span> datasets, transforms <span class="hljs-keyword">as</span> T
<span class="hljs-keyword">from</span> efficientnet_pytorch <span class="hljs-keyword">import</span> EfficientNet
<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> torch.optim <span class="hljs-keyword">as</span> optim
<span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> ImageFile
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score
</code></pre>
<p>All these modules are essential to perform multiple functions across the model. You can install all the absent modules using the pip command. </p>
<p>Then we download and extract the data we prepared for the model:</p>
<pre><code class="lang-python"><span class="hljs-comment">#importing data</span>
<span class="hljs-comment">#Dataset address</span>
url = <span class="hljs-string">'https://drive.google.com/uc?export=download&amp;id=1B75cOYH7VCaiqdeQYvMuUuy_Mn_5tPMY'</span>
output = <span class="hljs-string">'data.zip'</span>
gdown.download(url, output, quiet=<span class="hljs-literal">False</span>)
<span class="hljs-comment">#giving zip file name</span>
data_dir=<span class="hljs-string">'./data.zip'</span>
<span class="hljs-comment">#Extracting data from zip file</span>
<span class="hljs-keyword">with</span> zipfile.ZipFile(data_dir, <span class="hljs-string">'r'</span>) <span class="hljs-keyword">as</span> zf:
zf.extractall(<span class="hljs-string">'./data/'</span>)
</code></pre>
<p>The <code>gdown.download</code> module downloads the data from the URL provided and the zipfile.extractall extracts the data into the same directory where you currently are (or the same runtime if you are working on Google Colab). </p>
<p>I highly recommend working on Google Colab for this project in case you do not locally have a GPU available. </p>
<p>Next, create a check variable to check the availability of a GPU.</p>
<pre><code class="lang-python"><span class="hljs-comment">#Checking the availability of a GPU</span>
use_cuda = torch.cuda.is_available()
</code></pre>
<p>This module returns ‘True’ if GPU is available and ‘False' if not.</p>
<p>Next, we need to apply pre-processing techniques to the data. Since our data is pre-augmented, we do not need to apply many pre-processing techniques to it. We only resize all the images to a single size of (224,224). We do this because the images in our dataset are all of different dimensions and we need a consistent dimension for the model. </p>
<p>We'll also convert the images to tensors to be processed by PyTorch and then we normalize all the images. This normalize function normalizes all the images with a mean and standard deviation of 0.5. </p>
<p>After that, we create the locations for the train, test and validation sets which will be given as input to the ‘datasets’ module. We do this so that the PyTorch model knows exactly where the data is located and also so that that data can be loaded to the GPU. We keep a batch size of 32.</p>
<pre><code class="lang-python"><span class="hljs-comment">#declaring batch size</span>
batch_size = <span class="hljs-number">32</span>

<span class="hljs-comment">#applying required transformations on the dataset</span>
img_transforms = {
    <span class="hljs-string">'train'</span>:
    T.Compose([
        T.Resize(size=(<span class="hljs-number">224</span>,<span class="hljs-number">224</span>)), 
        T.ToTensor(),
        T.Normalize([<span class="hljs-number">0.5</span>, <span class="hljs-number">0.5</span>, <span class="hljs-number">0.5</span>], [<span class="hljs-number">0.5</span>, <span class="hljs-number">0.5</span>, <span class="hljs-number">0.5</span>]), 
        ]),

    <span class="hljs-string">'valid'</span>:
    T.Compose([
        T.Resize(size=(<span class="hljs-number">224</span>,<span class="hljs-number">224</span>)),
        T.ToTensor(),
        T.Normalize([<span class="hljs-number">0.5</span>, <span class="hljs-number">0.5</span>, <span class="hljs-number">0.5</span>], [<span class="hljs-number">0.5</span>, <span class="hljs-number">0.5</span>, <span class="hljs-number">0.5</span>])
        ]),

    <span class="hljs-string">'test'</span>:
    T.Compose([
        T.Resize(size=(<span class="hljs-number">224</span>,<span class="hljs-number">224</span>)),
        T.ToTensor(),
        T.Normalize([<span class="hljs-number">0.5</span>, <span class="hljs-number">0.5</span>, <span class="hljs-number">0.5</span>], [<span class="hljs-number">0.5</span>, <span class="hljs-number">0.5</span>, <span class="hljs-number">0.5</span>])
        ]),
     }

<span class="hljs-comment"># creating Location of data: train, validation, test</span>
data=<span class="hljs-string">'./data/'</span>

train_path=os.path.join(data,<span class="hljs-string">'train'</span>)
valid_path=os.path.join(data,<span class="hljs-string">'test'</span>)
test_path=os.path.join(data,<span class="hljs-string">'valid'</span>)


<span class="hljs-comment"># creating Datasets to each of  folder created in prev</span>
train_file=datasets.ImageFolder(train_path,transform=img_transforms[<span class="hljs-string">'train'</span>])
valid_file=datasets.ImageFolder(valid_path,transform=img_transforms[<span class="hljs-string">'valid'</span>])
test_file=datasets.ImageFolder(test_path,transform=img_transforms[<span class="hljs-string">'test'</span>])


<span class="hljs-comment">#Creating loaders for the dataset</span>
loaders_transfer={
    <span class="hljs-string">'train'</span>:torch.utils.data.DataLoader(train_file,batch_size,shuffle=<span class="hljs-literal">True</span>),
    <span class="hljs-string">'valid'</span>:torch.utils.data.DataLoader(valid_file,batch_size,shuffle=<span class="hljs-literal">True</span>),
    <span class="hljs-string">'test'</span>: torch.utils.data.DataLoader(test_file,batch_size,shuffle=<span class="hljs-literal">True</span>)
}
</code></pre>
<p>After pre-processing, we move on to building the model.</p>
<pre><code class="lang-python"><span class="hljs-comment">#importing the pretrained EfficientNet model</span>

model_transfer = EfficientNet.from_pretrained(<span class="hljs-string">'efficientnet-b0'</span>)

<span class="hljs-comment"># Freeze weights</span>
<span class="hljs-keyword">for</span> param <span class="hljs-keyword">in</span> model_transfer.parameters():
    param.requires_grad = <span class="hljs-literal">False</span>
in_features = model_transfer._fc.in_features


<span class="hljs-comment"># Defining Dense top layers after the convolutional layers</span>
model_transfer._fc = nn.Sequential(
    nn.BatchNorm1d(num_features=in_features),    
    nn.Linear(in_features, <span class="hljs-number">512</span>),
    nn.ReLU(),
    nn.BatchNorm1d(<span class="hljs-number">512</span>),
    nn.Linear(<span class="hljs-number">512</span>, <span class="hljs-number">128</span>),
    nn.ReLU(),
    nn.BatchNorm1d(num_features=<span class="hljs-number">128</span>),
    nn.Dropout(<span class="hljs-number">0.4</span>),
    nn.Linear(<span class="hljs-number">128</span>, <span class="hljs-number">2</span>),
    )
<span class="hljs-keyword">if</span> use_cuda:
    model_transfer = model_transfer.cuda()
</code></pre>
<p>First, we import the EfficientNet-B0 model with its pre-trained weights. Next, we disable the training of the parameters of the model because we are going to use the pre-trained parameters to extract features from our data. </p>
<p>Then we replace the top fully connected layers of the model with our own classifier. </p>
<p>Batchnorm normalizes the whole batch of data into the number of neurons given as an argument. This reduces the complexity of the model and prevents it from overfitting. Dropout does something similar – it zeroes out some neurons in the model with a probability of the value given as an argument. </p>
<p>The Linear layer is a simple fully-connected neural network layer. </p>
<p>Finally, we transfer our model to the GPU, if available.</p>
<pre><code class="lang-python"><span class="hljs-comment"># selecting loss function</span>
criterion_transfer = nn.CrossEntropyLoss()

<span class="hljs-comment">#using Adam classifier</span>
optimizer_transfer = optim.Adam(model_transfer.parameters(), lr=<span class="hljs-number">0.0005</span>)
</code></pre>
<p>Here, we select the loss function and the optimizer for our training phase. We also define the value of the learning rate for the optimizer. You can change this value to see how different learning rates influence the model in different ways. </p>
<p>Next, we move on to the training of the model.</p>
<pre><code class="lang-python">ImageFile.LOAD_TRUNCATED_IMAGES = <span class="hljs-literal">True</span>

<span class="hljs-comment"># Creating the function for training</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">train</span>(<span class="hljs-params">n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path</span>):</span>
    <span class="hljs-string">"""returns trained model"""</span>
    <span class="hljs-comment"># initialize tracker for minimum validation loss</span>
    valid_loss_min = np.Inf 
    trainingloss = []
    validationloss = []

    <span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, n_epochs+<span class="hljs-number">1</span>):
        <span class="hljs-comment"># initialize the variables to monitor training and validation loss</span>
        train_loss = <span class="hljs-number">0.0</span>
        valid_loss = <span class="hljs-number">0.0</span>

        <span class="hljs-comment">###################</span>
        <span class="hljs-comment"># training the model #</span>
        <span class="hljs-comment">###################</span>
        model.train()
        <span class="hljs-keyword">for</span> batch_idx, (data, target) <span class="hljs-keyword">in</span> enumerate(loaders[<span class="hljs-string">'train'</span>]):
            <span class="hljs-comment"># move to GPU</span>
            <span class="hljs-keyword">if</span> use_cuda:
                data, target = data.cuda(), target.cuda()

            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()

            train_loss = train_loss + ((<span class="hljs-number">1</span> / (batch_idx + <span class="hljs-number">1</span>)) * (loss.data - train_loss))

        <span class="hljs-comment">######################    </span>
        <span class="hljs-comment"># validating the model #</span>
        <span class="hljs-comment">######################</span>
        model.eval()
        <span class="hljs-keyword">for</span> batch_idx, (data, target) <span class="hljs-keyword">in</span> enumerate(loaders[<span class="hljs-string">'valid'</span>]):
            <span class="hljs-keyword">if</span> use_cuda:
                data, target = data.cuda(), target.cuda()

            output = model(data)
            loss = criterion(output, target)
            valid_loss = valid_loss + ((<span class="hljs-number">1</span> / (batch_idx + <span class="hljs-number">1</span>)) * (loss.data - valid_loss))

        train_loss = train_loss/len(train_file)
        valid_loss = valid_loss/len(valid_file)

        trainingloss.append(train_loss)
        validationloss.append(valid_loss)

        <span class="hljs-comment"># printing training/validation statistics </span>
        print(<span class="hljs-string">'Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'</span>.format(
            epoch, 
            train_loss,
            valid_loss
            ))

        <span class="hljs-comment">## saving the model if validation loss has decreased</span>
        <span class="hljs-keyword">if</span> valid_loss &lt; valid_loss_min:
            torch.save(model.state_dict(), save_path)

            valid_loss_min = valid_loss

    <span class="hljs-comment"># return trained model</span>
    <span class="hljs-keyword">return</span> model, trainingloss, validationloss
</code></pre>
<p>We create a function for the training and validation phase of the model. We allow the model to accept truncated images also with fewer than three channels. We initialize the values of the train and validation losses and start the training loop. We import the data batch by batch from the data loaders and perform the training operations. </p>
<p>After the training loop, we start the validation loop where we only compute the loss and the output predictions and do not update the parameters as we did in the training loop. We save the model which has the minimum loss for the validation set.</p>
<pre><code class="lang-python"><span class="hljs-comment"># training the model</span>

n_epochs=<span class="hljs-number">10</span>

model_transfer, train_loss, valid_loss = train(n_epochs, loaders_transfer, model_transfer, optimizer_transfer, criterion_transfer, use_cuda, <span class="hljs-string">'model.pt'</span>)
</code></pre>
<p>We run the model for 10 epochs, that is 10 loops. You can change the number of epochs and test out the loss values. The saved model is saved under the name <code>model.pt</code>. Now we load the model and move on to the testing phase.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Defining the test function</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test</span>(<span class="hljs-params">loaders, model, criterion, use_cuda</span>):</span>

    <span class="hljs-comment"># monitoring test loss and accuracy</span>
    test_loss = <span class="hljs-number">0.</span>
    correct = <span class="hljs-number">0.</span>
    total = <span class="hljs-number">0.</span>
    preds = []
    targets = []

    model.eval()
    <span class="hljs-keyword">for</span> batch_idx, (data, target) <span class="hljs-keyword">in</span> enumerate(loaders[<span class="hljs-string">'test'</span>]):
        <span class="hljs-comment"># moving to GPU</span>
        <span class="hljs-keyword">if</span> use_cuda:
            data, target = data.cuda(), target.cuda()
        <span class="hljs-comment"># forward pass</span>
        output = model(data)
        <span class="hljs-comment"># calculate the loss</span>
        loss = criterion(output, target)
        <span class="hljs-comment"># updating average test loss </span>
        test_loss = test_loss + ((<span class="hljs-number">1</span> / (batch_idx + <span class="hljs-number">1</span>)) * (loss.data - test_loss))
        <span class="hljs-comment"># converting the output probabilities to predicted class</span>
        pred = output.data.max(<span class="hljs-number">1</span>, keepdim=<span class="hljs-literal">True</span>)[<span class="hljs-number">1</span>]
        preds.append(pred)
        targets.append(target)
        <span class="hljs-comment"># compare predictions</span>
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(<span class="hljs-number">0</span>)

    <span class="hljs-keyword">return</span> preds, targets

<span class="hljs-comment"># calling test function</span>
preds, targets = test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)
</code></pre>
<p>We now create a test function to apply our model to our test dataset and evaluate its performance. </p>
<p>We pass the dataset batch by batch as we did in the train and testing phase, but we only do it once here instead of 10 epochs. This is because we just have to test the model and not update the parameters. </p>
<p>The function returns the predictions it computed for the input test set and also the original target values of the test set. </p>
<p>Now we compute the accuracy of the model. First, we need to convert the tensors, that is predictions and targets, into NumPy arrays. We do this by first moving them from the GPU to the CPU and then converting them to NumPy arrays. The following code does this: </p>
<pre><code class="lang-python"><span class="hljs-comment">#converting the tensor object to a list for metric functions</span>

preds2, targets2 = [],[]

<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> preds:
  <span class="hljs-keyword">for</span> j <span class="hljs-keyword">in</span> range(len(i)):
    preds2.append(i.cpu().numpy()[j])
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> targets:
  <span class="hljs-keyword">for</span> j <span class="hljs-keyword">in</span> range(len(i)):
    targets2.append(i.cpu().numpy()[j])
</code></pre>
<p>Now we compute the accuracy using the accuracy metric of the sklearn library.</p>
<pre><code class="lang-python"><span class="hljs-comment">#Computing the accuracy</span>
acc = accuracy_score(targets2, preds2)
print(<span class="hljs-string">"Accuracy: "</span>, acc)
</code></pre>
<p>Our model had an accuracy of 95.45%.</p>
<p><img src="https://lh3.googleusercontent.com/4_gMnxj_l_xGKOPr0Zg5V8IIA78NJIloxe9FNsKwAAW480WUpojW6PQWWgYzT7k839c27hA7svWPi4m_8XuR0ZSWY6TJ0TIc22xtCqqixeSq9mVBZzDIHW0edaueH1IE3VRW68M" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The next image is the confusion matrix for the test run of the classifier. In it, you can see the visual of the model’s performance. The actual labels indicate whether the person had COVID or not, while the predicted labels indicate how our model classified the images. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/confusion-matrix.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>As we can see, our model predicted most of the labels correctly. The small portion of wrongly predicted labels include 7 people who did not have COVID, but our model predicted they did. This is not too alarming. </p>
<p>On the other hand, there were 14 examples where our model predicted that they did not have COVID, but they did. In machine learning, these are called false negatives. This is a very alarming situation because we would've sent home people suffering from COVID-19. This would increase their risk that the disease would get worse. </p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Convolutional neural networks have proved extremely useful in computer vision techniques, and we can also use them efficiently in medical imaging and diagnosis.</p>
<p>Transfer learning is an effective method for using pre-trained architectures to perform efficiently in other applications. </p>
<p>But as we saw above, using these models depends upon what kind of problem we have and what our objectives are. Just like in the detection of COVID-19, we would prefer to have a model that gives us 0 false negatives. But there's still great potential for deep learning to be useful in COVID diagnosis as well as other medical diagnosis techniques.</p>
<p>Thanks for reading! If you enjoyed the article and would like to read more interesting articles around computer science, Python and JavaScript, please follow me on <a target="_blank" href="https://twitter.com/bajcmartinez">Twitter</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Data Science Interview Questions for Beginners ]]>
                </title>
                <description>
                    <![CDATA[ By Davis David In 2012, Harvard Business Review named data science the sexiest job of the 21st century. But if you want to get a job as a data scientist, you'll need to go through a tough interview process. During data science job interviews, the int... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/23-common-data-science-interview-questions-for-beginners/</link>
                <guid isPermaLink="false">66d84ea592b237d9f6a7c50b</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ interview questions ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Wed, 25 Aug 2021 21:39:37 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/08/pexels-tima-miroshnichenko-5439141.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Davis David</p>
<p>In 2012, Harvard Business Review named data science the sexiest job of the 21st century. But if you want to get a job as a data scientist, you'll need to go through a tough interview process.</p>
<p>During data science job interviews, the interviewer will likely ask questions from different data science topics such as statistics, programming, data analysis, data pre-processing, and modeling. </p>
<p>Your skills will be put to the test, and you need to prepare yourself if you want to get through the interview successfully.</p>
<p>In this article, I have compiled a list of common data science interview questions with tips on how you can answer them. I've also shared a list of resources that will help you learn more about the specific topic presented in each interview question.</p>
<h1 id="heading-data-science-interview-questions">Data Science Interview Questions</h1>
<h2 id="heading-what-is-logistic-regression-how-have-you-used-logistic-regression-recently">What is Logistic Regression? How Have You Used Logistic Regression Recently?</h2>
<p>Logistic regression is a popular algorithm used to solve classification problems. In this question, you need to explain what logistic regression is, how it works, and give an example of a data science problem you solved by using logistic regression.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/logistic-regression-the-good-parts-55efa68e11df/">Logistic Regression: The good parts</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/the-least-squares-regression-method-explained/">The Least Squares Regression Method – How to Find the Line of Best Fit</a></li>
</ul>
<h2 id="heading-why-do-we-need-evaluation-metrics-what-is-a-confusion-matrix">Why do we Need Evaluation Metrics? What is a Confusion Matrix?</h2>
<p>Machine learning models must be evaluated to check their performance. In this question, you need to explain how you can use the confusion matrix to evaluate the model's performance. You can further mention other metrics to evaluate regression and classification models.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/a-no-code-intro-to-the-9-most-important-machine-learning-algorithms-today/">9 Key Machine Learning Algorithms Explained in Plain English</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-i-used-deep-learning-to-classify-medical-images-with-fast-ai-cc4cfd64173c/">How I used Deep Learning to classify medical images with Fast.ai</a></li>
</ul>
<h2 id="heading-how-is-data-science-different-from-traditional-application-programming">How is Data Science Different from Traditional Application Programming?</h2>
<p>A good way to answer this question is by using examples of how the program is created in both cases.</p>
<p>Traditional programming approach:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/08/image-1.PNG" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Data science approach:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/08/image-2.PNG" alt="Image" width="600" height="400" loading="lazy"></p>
<p><strong>Here is a good resource to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/data-science-course-for-beginners/">Free 6-Hour Data Science Course for Beginners</a></li>
</ul>
<h2 id="heading-explain-the-difference-between-supervised-and-unsupervised-learning">Explain the Difference Between Supervised and Unsupervised Learning.</h2>
<p>Supervised and unsupervised learning are two types of machine learning techniques. The best way to answer this question is by explaining their differences in terms of the kind of datasets you can use in each technique and examples of algorithms.</p>
<p><strong>Here is a good resource to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/when-to-use-different-machine-learning-algorithms-a-simple-guide-ba615b19fb3b/">When to use different machine learning algorithms: a simple guide</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/want-to-know-how-deep-learning-works-heres-a-quick-guide-for-everyone-1aedeca88076/">Want to know how deep learning works? Here's a quick guide</a></li>
</ul>
<h2 id="heading-what-is-a-decision-tree">What is a Decision Tree?</h2>
<p>A decision tree is another supervised learning algorithm that you can use to solve regression or classification problems. </p>
<p>You should be able to explain how the decision tree algorithm learns from the data and the advantages and disadvantages of using a decision tree algorithm.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-use-the-tree-based-algorithm-for-machine-learning/">How to Use Tree-Based Algorithms in Machine Learning</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/a-no-code-intro-to-the-9-most-important-machine-learning-algorithms-today/">9 Key Machine Learning Algorithms Explained in Plain English</a></li>
</ul>
<h2 id="heading-what-is-cross-validation">What is Cross-Validation?</h2>
<p>The purpose of this question is to determine if you know any techniques used to assess the effectiveness of the machine learning model – for example, when you want to avoid overfitting. </p>
<p>When answering this question, you should explain any methods of cross-validation you have applied in any data science projects.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-get-a-grip-on-cross-validations-bb0ba779e21c/">Get a Grip on Cross-Validation in Machine Learning</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/key-machine-learning-concepts-explained-dataset-splitting-and-random-forest/">Key Machine Learning Concepts Explained</a></li>
</ul>
<h2 id="heading-what-is-a-normal-distribution">What is a Normal Distribution?</h2>
<p>This term is commonly used when you're solving a data science problem. In this question, you can explain the meaning of normal distribution, its properties, and why it is important to check if your data is normally distributed.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/normal-distribution-explained/">Normal Distribution Explained in Plain English</a></li>
<li><a target="_blank" href="https://www.youtube.com/watch?v=rzFX5NWojp0">Normal Distribution Clearly Explained</a></li>
</ul>
<h2 id="heading-what-is-a-random-forest-algorithm">What is a Random Forest Algorithm?</h2>
<p>Random forest is one of the most popular machine learning algorithms. When answering this question, you should explain how the algorithm learns from the data and when you should use the random forest algorithm over other machine learning algorithms.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-use-the-tree-based-algorithm-for-machine-learning/">Random Forest Classifier Tutorial</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/key-machine-learning-concepts-explained-dataset-splitting-and-random-forest/">Dataset Splitting and Random Forest Algorithms</a></li>
<li><a target="_blank" href="https://www.youtube.com/watch?v=eM4uJ6XGnSM">Random Forest Algorithm Explained</a></li>
</ul>
<h2 id="heading-explain-univariate-bivariate-and-multivariate-analyses">Explain Univariate, Bivariate, and Multivariate Analyses</h2>
<p>These three types of analyses are used to summarize variables in the dataset and help you get some insights. You can also talk about how they're different and when you can apply them – just make sure to show some examples.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.youtube.com/watch?v=JG8GRlMjp3c">Univariate, Bivariate and Multivariate Analysis</a> </li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-how-to-select-the-best-performing-linear-regression-for-univariate-models-e9d429c40581/">How to Select the Best Performing Linear Regression for Univariate Models</a></li>
</ul>
<h2 id="heading-how-can-we-handle-missing-data">How can we Handle Missing Data?</h2>
<p>Some datasets may have missing data or values and can cause a problem when training machine learning models. </p>
<p>It is important to mention some techniques that can be used to handle missing data. You can also share your experience of how you handled missing data in your last data science project.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/the-penalty-of-missing-values-in-data-science-91b756f95a32/">The Penalty of Missing Values in Data Science</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/feature-engineering-and-feature-selection-for-beginners/">Feature Engineering and Feature Selection for Beginners</a></li>
<li><a target="_blank" href="https://www.youtube.com/watch?v=P_iMSYQnqac">Handling Missing Data Easily Explained</a></li>
</ul>
<h2 id="heading-what-is-the-benefit-of-dimensionality-reduction">What is the Benefit of Dimensionality Reduction?</h2>
<p>Dimensionality reduction is a technique to reduce the number of features or variables in the dataset. </p>
<p>There are different advantages of dimensionality reduction you can explain when answering this question. You should explain why and when you need to apply this technique.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/an-illustrative-introduction-to-fishers-linear-discriminant-9484efee15ac/">How to use dimensionality reduction</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/the-curse-of-dimensionality-how-we-can-save-big-data-from-itself-d9fa0f872335/">Escaping the curse of dimensionality</a></li>
<li><a target="_blank" href="https://www.youtube.com/watch?v=1eYJKD0TQ8U">Pros and Cons of Dimensionality Reduction</a> </li>
</ul>
<h2 id="heading-how-can-we-deal-with-outliers">How can we deal with Outliers?</h2>
<p>An outlier is a data point that deviates significantly from the rest. In this question, you can explain how one can identify outliers and different techniques used to deal with outliers.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/what-is-an-outlier-definition-and-how-to-find-outliers-in-statistics/">What is an Outlier in Statistics?</a> </li>
<li><a target="_blank" href="https://www.kdnuggets.com/2017/01/3-methods-deal-outliers.html">Three Ways to Deal with Outliers</a></li>
<li><a target="_blank" href="https://www.youtube.com/watch?v=DDpym2j_ILY">How to Remove Outliers from a Dataset</a></li>
</ul>
<h2 id="heading-what-is-ensemble-learning">What is Ensemble Learning?</h2>
<p>In machine learning, ensemble learning is a process of using multiple algorithms to obtain better predictive performance than could be obtained from any one algorithm alone. </p>
<p>When answering this question, you can also share your experience the last time you implemented ensemble methods in a data science project.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.analyticsvidhya.com/blog/2015/08/introduction-ensemble-learning/">Introduction to Ensemble Learning</a></li>
<li><a target="_blank" href="https://www.youtube.com/watch?v=WtWxOhhZWX0">Ensemble Learning in Machine Learning</a></li>
</ul>
<h2 id="heading-explain-how-machine-learning-is-different-from-deep-learning">Explain how Machine Learning is Different from Deep Learning</h2>
<p>The best way to explain the difference between machine learning and deep learning is the way they solve problems. </p>
<p>You can go further by explaining some of the problems that can be solved by either machine learning or deep learning techniques.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/convolutional-neural-network-tutorial-for-beginners/">A beginner's guide to Machine Learning and Deep Learning</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/ai-vs-ml-whats-the-difference/">AI vs ML – What's the Difference between Artificial Intelligence and Machine Learning?</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-scikit-learn/">Machine Learning Crash Course</a> and <a target="_blank" href="https://www.freecodecamp.org/news/deep-learning-crash-course-learn-the-key-concepts-and-terms/">Deep Learning Crash Course</a></li>
</ul>
<h2 id="heading-what-are-the-differences-between-overfitting-and-underfitting">What are the Differences Between Overfitting and Underfitting?</h2>
<p>The best way to explain the difference between overfitting and underfitting is not just with a definition but through examples. </p>
<p>You can also share your personal experience when faced with overfitting or underfitting problems in a data science project.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/handling-overfitting-in-deep-learning-models/">How to Handle Overfitting in Deep Learning Models</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-build-better-machine-learning-models/">How to Build Better Machine Learning Models</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/free-deep-learning-with-pytorch-live-course/">Deep Learning with PyTorch Course</a></li>
</ul>
<h2 id="heading-what-is-regularisation-why-is-it-useful">What is Regularisation? Why is it Useful?</h2>
<p>When answering this question, you can also go further by explaining the two common regularization techniques L1 norm and L2 norm.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-build-your-first-neural-network-to-predict-house-prices-with-keras-f8db83049159/">How to Build your First Neural Network</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/deep-learning-crash-course-learn-the-key-concepts-and-terms/">Deep Learning Crash Course</a></li>
</ul>
<h2 id="heading-what-is-selection-bias">What is Selection Bias?</h2>
<p>It is not enough just to define Selection Bias. If possible you should explain different types of bias, their effects, and how to avoid them.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://imotions.com/blog/selection-bias/">What is Selection Bias?</a></li>
<li><a target="_blank" href="https://academy4sc.org/video/selection-bias-dont-forget-about-me/">Selection Bias – Don't forget about me!</a></li>
</ul>
<h2 id="heading-can-you-explain-the-difference-between-a-validation-set-and-a-test-set">Can you Explain the Difference Between a Validation Set and a Test Set?</h2>
<p>In this question, after explaining their differences, you can explain the advantage of having a validation set and a test set in a data science project.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/key-machine-learning-concepts-explained-dataset-splitting-and-random-forest/">Key Machine Learning Concepts Explained</a></li>
<li><a target="_blank" href="https://machinelearningmastery.com/difference-test-validation-datasets/">Difference between Test Sets and Validation Sets</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/what-to-do-when-your-training-and-testing-data-come-from-different-distributions-d89674c6ecd8/">What to do when your training and testing data come from different distributions</a></li>
<li><a target="_blank" href="https://www.youtube.com/watch?v=pGlQLMPI46g">Machine Learning – Validation vs Testing</a></li>
</ul>
<h2 id="heading-what-is-the-difference-between-regression-and-classification-ml-techniques">What is the Difference Between Regression and Classification ML Techniques?</h2>
<p>We all know that regression and classification are supervised learning and the only difference is their output. When you answer this question, you can mention a few algorithms that can be used to solve regression problems or classification problems. Also, try to share how their models are evaluated.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-build-and-train-linear-and-logistic-regression-ml-models-in-python/">How to Build and Train Linear and Logistic Regression ML Models</a></li>
<li><a target="_blank" href="https://www.javatpoint.com/regression-vs-classification-in-machine-learning">Regression vs Classification in Machine Learning</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/machine-learning-basics-for-developers/">Machine Learning Basics for Developers</a></li>
<li><a target="_blank" href="https://www.youtube.com/watch?v=TJveOYsK6MY">Classification and Regression in Machine Learning</a></li>
</ul>
<h2 id="heading-what-are-artificial-neural-networks">What are Artificial Neural Networks?</h2>
<p>In this question don't just define Artificial Neural Networks but also explain their advantages and where you can use them.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://hackernoon.com/overview-of-artificial-neural-networks-and-its-applications-2525c1addff7">Overview of Artificial Neural Networks and their Applications</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/deep-learning-neural-networks-explained-in-plain-english/">Deep Learning Neural Networks Explained in Plain English</a></li>
</ul>
<h2 id="heading-what-tools-and-devices-do-you-plan-to-use-in-your-role-as-a-data-scientist">What Tools and Devices do you Plan to use in Your Role as a Data Scientist?</h2>
<p>This question is straightforward but you should mention tools you have used before or you are planning to use in the future project. </p>
<p>You can also share your experience of how various tools help you implement data science projects successfully.</p>
<p>Keep in mind that you will use different tools for different projects. For example, some tools can be used for an NLP project and others for a Time-series project.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://sdacademy.dev/13-tools-every-data-scientist-needs-to-know/">13 Tools Every Data Scientist Needs to Know</a></li>
</ul>
<h2 id="heading-what-is-natural-language-processing-state-some-real-life-examples-of-nlp">What is Natural Language Processing? State some Real-Life Examples of NLP.</h2>
<p>You have to define Natural language processing in a simple way and how it can be used to solve business problems. Then share some real-life examples. If possible you can also share some of the NLP projects you have done or collaborate with others.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/what-is-natural-language-processing-an-nlp-definition-and-tutorial-for-beginners/">What is Natural Language Processing? A tutorial for beginners</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-natural-language-processing-no-experience-required/">Learn Natural Language Processing with Python and TensorFlow</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/natural-language-processing-basics-for-developers/">What Every Developer Needs to Know about NLP</a></li>
<li><a target="_blank" href="https://intellipaat.com/blog/applications-of-nlp/">Applications of NLP</a></li>
</ul>
<h2 id="heading-what-is-normalisation-difference-between-normalisation-and-standardization">What is Normalisation? Difference between Normalisation and Standardization?</h2>
<p>Normalization and standardization are techniques used to pre-process the data before applying machine learning algorithms. </p>
<p>The purpose of the question is to explain the differences between these two techniques and at what condition of the dataset you should apply one over another.</p>
<p><strong>Here are resources to help you get started crafting your response:</strong></p>
<ul>
<li><a target="_blank" href="https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/">The Difference Between Normalization and Standardization</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/all-you-need-to-know-about-text-preprocessing-for-nlp-and-machine-learning-bc1c5765ff67/">Text Preprocessing for NLP and Machine Learning</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/feature-engineering-and-feature-selection-for-beginners/">Feature Engineering and Feature Selection for Beginners</a></li>
<li><a target="_blank" href="https://www.youtube.com/watch?v=mnKm3YP56PY">Standardization vs Normalization – Feature Scaling</a> </li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/https-medium-com-hadrienj-preprocessing-for-deep-learning-9e2b9c75165c/">Preprocessing for Deep Learning</a></li>
</ul>
<h2 id="heading-final-thoughts-on-data-science-interview-questions">Final Thoughts on Data Science Interview Questions</h2>
<p>Reviewing these common data science interview questions will actually boost your confidence during the interview. </p>
<p>Don't expect the interviewer to ask you all questions mentioned in this article. But most of the interview questions will come from the same topics.</p>
<p>For example, instead of asking "<strong>Explain the difference between supervised and unsupervised learning</strong>", the interviewer can ask you to “<strong>Explain some supervised learning algorithms and how they learn from the data</strong>”.</p>
<p>If you are interested in learning and reading more data science interview questions, take your time and read through <a target="_blank" href="https://www.springboard.com/blog/data-science/data-science-interview-questions/">these</a> <a target="_blank" href="https://huyenchip.com/ml-interviews-book/">additional</a> <a target="_blank" href="https://hackernoon.com/160-data-science-interview-questions-415s3y2a">resources</a> for inspiration.</p>
<p>And don't forget to practice your coding skills because some questions during the interview require you to code the solution.</p>
<p>I hope these data science interview questions will help you prepare for your interview and I wish you the best of luck in your data science career.</p>
<p>If you learned something new or enjoyed reading this article, please share it so that others can see it. Until then, see you in the next post!</p>
<p>You can also find me on Twitter <a target="_blank" href="https://twitter.com/Davis_McDavid">@Davis_McDavid</a>.</p>
<p>And you can read more articles like this <a target="_blank" href="https://hackernoon.com/u/davisdavid">here</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build Better Machine Learning Models ]]>
                </title>
                <description>
                    <![CDATA[ By Rishit Dagli Hello developers 👋. If you have built Deep Neural Networks before, you might know that it can involve a lot of experimentation.  In this article, I will share with you some useful tips and guidelines that you can use to better build ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-better-machine-learning-models/</link>
                <guid isPermaLink="false">66d460c9d14641365a050965</guid>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Fri, 23 Apr 2021 16:22:43 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/04/pexels-pixabay-373543.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Rishit Dagli</p>
<p>Hello developers 👋. If you have built Deep Neural Networks before, you might know that it can involve a lot of experimentation. </p>
<p>In this article, I will share with you some useful tips and guidelines that you can use to better build better deep learning models. These tricks should make it a lot easier for you to develop a good network.</p>
<p>You can pick and choose which tips you use, as some will be more helpful for the projects you are working on. Not everything mentioned in this article will straight up improve your models’ performance.</p>
<h2 id="heading-a-high-level-approach-to-hyperparameter-tuning">A high-level approach to Hyperparameter tuning🕹️</h2>
<p>One of the more painful things about training Deep Neural Networks is the large number of hyperparameters you have to deal with. </p>
<p>These could be your learning rate <strong>α</strong>, the discounting factor <strong>ρ</strong>, and epsilon <strong>ε</strong> if you are using the RMSprop optimizer (<a target="_blank" href="https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf">Hinton et al.</a>) or the exponential decay rates <strong>β₁</strong> and <strong>β₂</strong> if you are using the Adam optimizer (<a target="_blank" href="https://arxiv.org/abs/1412.6980">Kingma et al.</a>). </p>
<p>You also need to choose the number of layers in the network or the number of hidden units for the layers. You might be using learning rate schedulers and would want to configure those features and a lot more 😩! We definitely need ways to better organize our hyperparameter tuning process.</p>
<p>A common algorithm I tend to use to organize my hyperparameter search process is Random Search. Though there are other algorithms that might be better, I usually end up using it anyway. </p>
<p>Let’s say for the purpose of this example you want to tune two hyperparameters and you suspect that the optimal values for both would be somewhere between one and five. </p>
<p>The idea here is that instead of picking twenty-five values to try out like (1, 1) (1, 2) and so on systematically, it would be more effective to select twenty-five points at random.  </p>
<p><img src="https://lh3.googleusercontent.com/MLzfMgeWASsgXEsq2XUGxo8QFl99R-4TA_--azr_k7F9rkEhh31esm47zemiPDTIPrjNWQjmlpEXtstqgcopQnEgF0R2CsNDPTuwaPq-_54IgaGp0Dkjd7TCMe3oWe-gjiVnrc2Y" alt="Image" width="1360" height="686" loading="lazy">
<em>Based on Lecture Notes of <a target="_blank" href="https://www.andrewng.org/">Andrew Ng</a><span class="-mobiledoc-kit__atom">‌‌</span></em></p>
<p>Here is a simple example with TensorFlow where I try to use Random Search on the Fashion MNIST Dataset for the learning rate and the number of units in the first Dense layer:</p>
<pre><code class="lang-py"><span class="hljs-keyword">import</span> kerastuner <span class="hljs-keyword">as</span> kt
<span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">model_builder</span>(<span class="hljs-params">hp</span>):</span>
  model = tf.keras.Sequential()
  model.add(tf.keras.layers.Flatten(input_shape=(<span class="hljs-number">28</span>, <span class="hljs-number">28</span>)))

  <span class="hljs-comment"># Tune the number of units in the first Dense layer</span>
  <span class="hljs-comment"># Choose an optimal value between 32-512</span>
  hp_units = hp.Int(<span class="hljs-string">'units'</span>, min_value = <span class="hljs-number">32</span>, max_value = <span class="hljs-number">512</span>, step = <span class="hljs-number">32</span>)
  model.add(tf.keras.layers.Dense(units = hp_units, activation = <span class="hljs-string">'relu'</span>))
  model.add(tf.keras.layers.Dense(<span class="hljs-number">10</span>))

  <span class="hljs-comment"># Tune the learning rate for the optimizer </span>
  <span class="hljs-comment"># Choose an optimal value from 0.01, 0.001, or 0.0001</span>
  hp_learning_rate = hp.Choice(<span class="hljs-string">'learning_rate'</span>, values = [<span class="hljs-number">1e-2</span>, <span class="hljs-number">1e-3</span>, <span class="hljs-number">1e-4</span>]) 

  model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = hp_learning_rate),
                loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits = <span class="hljs-literal">True</span>), 
                metrics = [<span class="hljs-string">'accuracy'</span>])

  <span class="hljs-keyword">return</span> model

tuner = kt.RandomSearch(model_builder,
                        objective = <span class="hljs-string">'val_accuracy'</span>, 
                        max_trials = <span class="hljs-number">10</span>,
                        directory = <span class="hljs-string">'random_search_starter'</span>,
                        project_name = <span class="hljs-string">'intro_to_kt'</span>) 

tuner.search(img_train, label_train, epochs = <span class="hljs-number">10</span>, validation_data = (img_test, label_test))

<span class="hljs-comment"># Which was the best model?</span>
best_model = tuner.get_best_models(<span class="hljs-number">1</span>)[<span class="hljs-number">0</span>]

<span class="hljs-comment"># What were the best hyperparameters?</span>
best_hyperparameters = tuner.get_best_hyperparameters(<span class="hljs-number">1</span>)[<span class="hljs-number">0</span>]
</code></pre>
<p>Here I suspect that an optimal number of units in the first Dense layer would be somewhere between 32 and 512, and my learning rate would be one of 1e-2, 1e-3, or 1e-4. </p>
<p>Consequently, as shown in this example, I set my minimum value for the number of units to be 32 and the maximum value to be 512 and have a step size of 32. Then, instead of hardcoding a value for the number of units, I specify a range to try out.</p>
<pre><code class="lang-py">hp_units = hp.Int(<span class="hljs-string">'units'</span>, min_value = <span class="hljs-number">32</span>, max_value = <span class="hljs-number">512</span>, step = <span class="hljs-number">32</span>)
model.add(tf.keras.layers.Dense(units = hp_units, activation = <span class="hljs-string">'relu'</span>))
</code></pre>
<p>We do the same for our learning rate, but our learning rate is simply one of 1e-2, 1e-3, or 1e-4 rather than a range.</p>
<pre><code class="lang-py">hp_learning_rate = hp.Choice(<span class="hljs-string">'learning_rate'</span>, values = [<span class="hljs-number">1e-2</span>, <span class="hljs-number">1e-3</span>, <span class="hljs-number">1e-4</span>])
optimizer = tf.keras.optimizers.Adam(learning_rate = hp_learning_rate)
</code></pre>
<p>Finally, we perform Random Search and specify that among all the models we build, the model with the highest validation accuracy would be called the best model. Or simply that getting a good validation accuracy is the goal.</p>
<pre><code class="lang-py">tuner = kt.RandomSearch(model_builder,
                        objective = <span class="hljs-string">'val_accuracy'</span>, 
                        max_trials = <span class="hljs-number">10</span>,
                        directory = <span class="hljs-string">'random_search_starter'</span>,
                        project_name = <span class="hljs-string">'intro_to_kt'</span>) 

tuner.search(img_train, label_train, epochs = <span class="hljs-number">10</span>, validation_data = (img_test, label_test))
</code></pre>
<p>After doing so, I also want to retrieve the best model and the best hyperparameter choice. Though I would like to point out that using the <code>get_best_models</code> is usually considered a shortcut. </p>
<p>To get the best performance you should retrain your model with the best hyperparameters you get on the full dataset.</p>
<pre><code class="lang-py"><span class="hljs-comment"># Which was the best model?</span>
best_model = tuner.get_best_models(<span class="hljs-number">1</span>)[<span class="hljs-number">0</span>]

<span class="hljs-comment"># What were the best hyperparameters?</span>
best_hyperparameters = tuner.get_best_hyperparameters(<span class="hljs-number">1</span>)[<span class="hljs-number">0</span>]
</code></pre>
<p>I won't be talking about this code in detail in this article, but you can read about it in <a target="_blank" href="https://towardsdatascience.com/the-art-of-hyperparameter-tuning-in-deep-neural-nets-by-example-685cb5429a38">this article</a> I wrote some time back if you want.</p>
<h2 id="heading-use-mixed-precision-training-for-large-networks">Use Mixed Precision Training for large networks🎨</h2>
<p>The bigger your neural network is, the more accurate your results (in general). As model sizes grow, the memory and compute requirements for training these models also increase. </p>
<p>The idea with using Mixed Precision Training (NVIDIA, <a target="_blank" href="https://arxiv.org/abs/1710.03740">Micikevicius et al.</a>) is to train deep neural networks using half-precision floating-point numbers which let you train large neural networks a lot faster with no or negligible decrease in the performance of the networks. </p>
<p>But, I'd like to point out that this technique should only be used for large models with more than 100 million parameters or so.</p>
<p>While mixed-precision would run on most hardware, it will only speed up models on recent NVIDIA GPUs (for example Tesla V100 and Tesla T4) and Cloud TPUs. </p>
<p>I want to give you an idea of the performance gains when using Mixed Precision. When I trained a ResNet model on my GCP Notebook instance (consisting of a Tesla V100) it was almost three times better in the training time and almost 1.5 times on a Cloud TPU instance with almost no difference in accuracy. The code to measure the above speed-ups was taken from <a target="_blank" href="https://www.tensorflow.org/guide/mixed_precision">this example</a>. </p>
<p>To further increase your training throughput, you could also consider using a larger batch size – and since we are using float16 tensors you should not run out of memory.</p>
<p>It is also rather easy to implement Mixed Precision with TensorFlow. With TensorFlow you could easily use the <a target="_blank" href="https://www.freecodecamp.org/news/p/d63b23cb-c1f8-4997-87c1-6c5c44ea9e14/tf.keras.mixed_precision">tf.keras.mixed_precision</a> Module that allows you to set up a data type policy (to use float16) and also apply loss scaling to prevent underflow. </p>
<p>Here is a minimalistic example of using Mixed Precision Training on a network:</p>
<pre><code class="lang-py"><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf

policy = tf.keras.mixed_precision.Policy(<span class="hljs-string">'mixed_float16'</span>)
tf.keras.mixed_precision.set_global_policy(policy)

inputs = keras.Input(shape=(<span class="hljs-number">784</span>,))
x = tf.keras.layers.Dense(<span class="hljs-number">4096</span>, activation=<span class="hljs-string">'relu'</span>)(inputs)
x = tf.keras.layers.Dense(<span class="hljs-number">4096</span>, activation=<span class="hljs-string">'relu'</span>)(x)
x = layers.Dense(<span class="hljs-number">10</span>)(x)
outputs = layers.Activation(<span class="hljs-string">'softmax'</span>, dtype=<span class="hljs-string">'float32'</span>)(x)
model = keras.Model(inputs=inputs, outputs=outputs)

model.compile(...)
model.fit(...)
</code></pre>
<p>In this example we first set the dtype policy to be float16 which implies that all of our model layers will automatically use float16. </p>
<p>After doing so we build a model, but we override the data type for the last or the output layer to be float32 to prevent any numeric issues. Ideally your output layers should be float32.</p>
<p>Note: I've built a model with so many units so we can see some difference in the training time with Mixed Precision Training since it works well for large models.</p>
<p>If you are looking for more inspiration to use Mixed Precision Training, here is an image demonstrating speedup for multiple models by Google Cloud on a TPU:</p>
<p><img src="https://lh6.googleusercontent.com/jDx-lq4Ll6Ihre2G5_JIYRDr1ogkMUCHiNcQ8g_WXz3cpGeylmICsQtQkV5JE9wcwZswzImi57AfNzWPEqBuLWfabl405AbH4HsZH6eOKs8kEF_zjZRKkQ6qQjLGk-JSca3rCGU7" alt="Image" width="991" height="612" loading="lazy">
<em>Speedups on a Cloud TPU</em></p>
<h2 id="heading-use-grad-check-for-backpropagation">Use Grad Check for backpropagation ✔️</h2>
<p>In multiple scenarios, I have had to custom implement a neural network. And implementing backpropagation is typically the aspect that's prone to mistakes and is also difficult to debug. </p>
<p>With incorrect backpropagation your model could learn something which might look reasonable, which makes it even more difficult to debug. So, how cool would it be if we could implement something which could allow us to debug our neural nets easily?</p>
<p>I often use Gradient Check when implementing backpropagation to help me debug it. The idea here is to approximate the gradients using a numerical approach. If it is close to the calculated gradients by the backpropagation algorithm, then you can be more confident that the backpropagation was implemented correctly.</p>
<p>As of now, you can use this expression in standard terms to get a vector which we will call <code>dθ[approx]</code>:</p>
<p><img src="https://lh5.googleusercontent.com/BMyeu-1N1INBGjyDzdc_MNpRVToTt6lmidWN5CYualOQ67wvF_rki1axuSeCGkWNxr4dHnp1kA0zP6E3HmUw3SeofkUHhwsElB0kEvtst2220ycNfQCZGoumHnNQzWb8r_mST8Ep" alt="Image" width="667" height="147" loading="lazy">
<em>Calculate approx gradients<span class="-mobiledoc-kit__atom">‌‌</span></em></p>
<p>In case you are looking for the reasoning behind this, you can find more about it in <a target="_blank" href="https://towardsdatascience.com/debugging-your-neural-nets-and-checking-your-gradients-f4d7f55da167">this article</a> I wrote. </p>
<p>So, now we have two vectors <code>dθ[approx]</code> and <code>dθ</code> (calculated by backprop). And these should be almost equal to each other. You could simply compute the Euclidean distance between these two vectors and use this reference table to help you debug your nets:</p>
<p><img src="https://lh5.googleusercontent.com/R-vrp1hq3psZmldrPYkupqofV7KOSWi0URLihhHAN5etHlR8U2kHdGE1XEAu-A9E_4w2Q8OmLXBZoYyyxJzIYwxG50dDPUSGL2gYw8U_lKCQtHXauUIUMa62H0mYp4eUO1LiJNnP" alt="Image" width="789" height="291" loading="lazy">
<em>Reference table</em></p>
<h2 id="heading-cache-your-datasets">Cache Your Datasets 💾</h2>
<p>Caching datasets is a simple idea but it's not one I have seen used much. The idea here is to go over the dataset in its entirety and cache it either in a file or in memory (if it is a small dataset). </p>
<p>This should save you from performing some expensive CPU operations like file opening and data reading during every single epoch. </p>
<p>This does also means that your first epoch would comparatively take more time📉 since you would ideally be performing all operations like opening files and reading data in the first epoch and then caching them. But the subsequent epochs should be a lot faster since you would be using the cached data.</p>
<p>This definitely seems like a very simple to implement idea, right? Here is an example with TensorFlow showing how you can very easily cache datasets. It also shows the speedup 🚀 from implementing this idea. Find the complete code for the below example in <a target="_blank" href="https://gist.github.com/Rishit-dagli/5d06c69c69e990f9e15249e15002bb07">this gist</a> of mine.  </p>
<p><img src="https://lh5.googleusercontent.com/uMIS-r7tn2VD85nNQ1mNTyqaDwcTUeyV2mY47q1UkJvEEGoemFcuYPVgcyVDyG3E2a0iz9rrdimRGG9m9mOOEVZai_UiS1IRmiuvWYwOrmxHNuh711H0UVYum3o4u-8sWqcHrmvt" alt="Image" width="1600" height="699" loading="lazy">
<em>A simple example of caching datasets and the speedup with it</em></p>
<h2 id="heading-how-to-tackle-overfitting">How to tackle overfitting ⭐</h2>
<p>When you're working with neural networks, overfitting and underfitting might be two of the most common problems you face. This section talks about some common approaches that I use when tackling these problems. </p>
<p>You might know this, but high bias will cause you to miss a relationship between features and labels (underfitting) and high variance will cause the model to capture the noise and overfit to the training data.</p>
<p>I believe the most effective way to solve overfitting is to get more data – though you could also augment your data. A benefit of deep neural networks is that their performance improves as they are fed more and more data. </p>
<p>But in a lot of situations, it might be too expensive to get more data or it simply might not be possible to do so. In that case, let's talk about a couple of other methods you could use to tackle overfitting.</p>
<p>Apart from getting more data or augmenting your data, you could also tackle overfitting either by changing the architecture of the network or by applying some modifications to the network’s weights. Let's look at these two methods.</p>
<h3 id="heading-changing-the-model-architecture">Changing the Model Architecture</h3>
<p>A simple way to change the architecture such that it doesn’t overfit would be to use Random Search to stumble upon a good architecture. Or you could try pruning nodes from your model, essentially lowering the capacity of your model. </p>
<p>We already talked about Random Search, but in case you want to see an example of pruning you could take a look at the <a target="_blank" href="https://www.tensorflow.org/model_optimization/guide/pruning">TensorFlow Model Optimization Pruning Guide</a>.</p>
<h3 id="heading-modifying-network-weights">Modifying Network Weights</h3>
<p>In this section we will see some methods I commonly use to prevent overfitting by modifying a network's weights.</p>
<h4 id="heading-weight-regularization">Weight Regularization</h4>
<p>Iterating back on what we discussed, "simpler models are less likely to overfit than complex ones". We try to keep a bar on the complexity of the network by forcing its weights only to take small values. </p>
<p>To do so we will add to our loss function a term that can penalize our model if it has large weights. Often L₁ and L₂ regularizations are used, the difference being:</p>
<ul>
<li>L1 - The penalty added is ∝ to |weight coefficients|</li>
<li>L2 - The penalty added is ∝ to |weight coefficients|<strong>²</strong></li>
</ul>
<p>where |x| represents absolute values. </p>
<p>Do you notice the difference between L1 and L2, the square term? Due to this, L1 might push weights to be equal to zero whereas L2 would have weights tending to zero but not zero. </p>
<p>In case you are curious about exploring this further, <a target="_blank" href="https://towardsdatascience.com/solving-overfitting-in-neural-nets-with-regularization-301c31a7735f">this article</a> goes deep into regularizations and might help. </p>
<p>This is also the exact reason why I tend to use L2 more than L1 regularization. Let's see an example of this with TensorFlow. </p>
<p>Here I show some code to create a simple Dense layer with 3 units and the L2 regularization:</p>
<pre><code class="lang-py"><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf
tf.keras.layers.Dense(<span class="hljs-number">3</span>, kernel_regularizer = tf.keras.regularizers.L2(<span class="hljs-number">0.1</span>))
</code></pre>
<p>To provide more clarity on what this does, as we discussed above this would add a term (0.1 × weight_coefficient_value²) to the loss function which works as a penalty to very big weights. Also, it is as easy as replacing L2 to L1 in the above code to implement L1 for your layer.</p>
<h4 id="heading-dropouts">Dropouts</h4>
<p>The first thing I do when I am building a model and face overfitting is try using dropouts (<a target="_blank" href="https://jmlr.org/papers/v15/srivastava14a.html">Srivastava et al.</a>). The idea here is to randomly dropout or set to zero (ignore) x% of output features of the layer during training. </p>
<p>We do this to stop individual nodes from relying on the output of other nodes and prevent them from co-adapting from other nodes too much.</p>
<p>Dropouts are rather easy to implement with TensorFlow since they are available as layers. Here is an example of me trying to build a model to differentiate images of dogs and cats with Dropout to reduce overfitting:</p>
<pre><code class="lang-py">model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(<span class="hljs-number">32</span>, (<span class="hljs-number">3</span>,<span class="hljs-number">3</span>), padding=<span class="hljs-string">'same'</span>, activation=<span class="hljs-string">'relu'</span>,input_shape=(IMG_HEIGHT, IMG_WIDTH ,<span class="hljs-number">3</span>)),
    tf.keras.layers.MaxPooling2D(<span class="hljs-number">2</span>,<span class="hljs-number">2</span>),
    tf.keras.layers.Dropout(<span class="hljs-number">0.2</span>),
    tf.keras.layers.Conv2D(<span class="hljs-number">128</span>, (<span class="hljs-number">3</span>,<span class="hljs-number">3</span>), padding=<span class="hljs-string">'same'</span>, activation=<span class="hljs-string">'relu'</span>),
    tf.keras.layers.MaxPooling2D(<span class="hljs-number">2</span>,<span class="hljs-number">2</span>),
    tf.keras.layers.Dropout(<span class="hljs-number">0.2</span>),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(<span class="hljs-number">512</span>, activation=<span class="hljs-string">'relu'</span>),
    tf.keras.layers.Dense(<span class="hljs-number">1</span>, activation=<span class="hljs-string">'sigmoid'</span>)
])
</code></pre>
<p>As you could see in the code above, you could directly use <code>tf.keras.layers.dropout</code> to implement the dropout, passing it the fraction of output features to ignore (here 20% of the output features).</p>
<h4 id="heading-early-stopping">Early stopping</h4>
<p>Early stopping is another regularization method I often use. The idea here is to monitor the performance of the model at every epoch on a validation set and terminate the training when you meet some specified condition for the validation performance (like stop training when loss &lt; 0.5)</p>
<p>It turns out that the basic condition like we talked about above works like a charm if your training error and validation error look something like in this image. In this case, Early Stopping would just stop training when it reaches the red box (for demonstration) and would straight up prevent overfitting.</p>
<blockquote>
<p>It (Early stopping) is such a simple and efficient regularization technique that Geoffrey Hinton called it a "beautiful free lunch". – Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurelien Geron</p>
</blockquote>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/04/image-99.png" alt="Image" width="600" height="400" loading="lazy">
_Adapted from <a target="_blank" href="https://link.springer.com/chapter/10.1007/978-3-642-35289-8_5">Lutz Prechelt</a>_</p>
<p>However, for some cases you would not end up with such straightforward choices for identifying the criterion or knowing when Early Stopping should stop training the model. </p>
<p>For the scope of this article we will not be talking about more criteria here, but I would recommend that you check out "<a target="_blank" href="https://link.springer.com/chapter/10.1007/978-3-642-35289-8_5">Early Stopping — But When, Lutz Prechelt</a>" which I use a lot to help decide criteria.</p>
<p>Let's see an example of Early Stopping in action with TensorFlow:</p>
<pre><code class="lang-py"><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf

callback = tf.keras.callbacks.EarlyStopping(monitor=<span class="hljs-string">'loss'</span>, patience=<span class="hljs-number">3</span>)
model = tf.keras.models.Sequential([...])
model.compile(...)
model.fit(..., callbacks = [callback])
</code></pre>
<p>In the above example we create an Early Stopping Callback and specify that we want to monitor our loss values. We also specify that it should stop training if it does not see noticeable improvements in loss values in 3 epochs. Finally, while training the model, we specify that it should use this callback. </p>
<p>Also, for the purpose of this example I show a Sequential model – but this could work in the exact same manner with a model created with the functional API or sub classed models, too.</p>
<h2 id="heading-thank-you-for-reading">Thank you for reading!</h2>
<p>Thank you for sticking with me until the end. I hope you will benefit from this article and incorporate these tips in your own experiments. </p>
<p>I am excited to see if they help you improve the performance of your neural nets, too. If you have any feedback or suggestions for me please feel free to <a target="_blank" href="https://twitter.com/rishit_dagli">reach out to me on Twitter</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Automate Machine Learning Model Publishing with the Gitlab Package Registry ]]>
                </title>
                <description>
                    <![CDATA[ By Yacine Mahdid In this tutorial we'll learn how to automatically publish machine learning models in a Gitlab package registry and make them available for your teammates to use. You can also use this technique to share a packaged version of your cod... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/ml-model-publishing-with-gitlab-package-registry/</link>
                <guid isPermaLink="false">66d4617636c45a88f96b7d11</guid>
                
                    <category>
                        <![CDATA[ automation ]]>
                    </category>
                
                    <category>
                        <![CDATA[ CI/CD ]]>
                    </category>
                
                    <category>
                        <![CDATA[ GitLab ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Thu, 15 Apr 2021 16:33:05 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/04/photo-1510380290144-9e40d2438af5.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Yacine Mahdid</p>
<p>In this tutorial we'll learn how to automatically publish machine learning models in a Gitlab package registry and make them available for your teammates to use. You can also use this technique to share a packaged version of your code as a binary.</p>
<p>If you are a beginner Gitlab user and are unfamiliar with CI/CD techniques, this tutorial is for you! A basic understanding of how machine-learning and deep learning is a plus, but it isn't a requirement to understand the CI/CD publishing part.</p>
<h3 id="heading-heres-what-well-cover">Here's what we'll cover:</h3>
<ul>
<li>Gitlab Code Setup</li>
<li>Deep Convolutional Neural Network Code</li>
<li>Image Recognition Code</li>
<li>Branching Methodology</li>
<li>CI/CD Uploading</li>
<li>Conclusion</li>
</ul>
<h2 id="heading-first-some-background">First, Some Background</h2>
<p>At some point during your machine learning engineer career you might need to share a model you've trained with other developers. There are multiple ways of doing this.</p>
<h3 id="heading-give-access-to-the-repository">Give access to the repository</h3>
<p>If you don't mind showing your whole code, this is a very viable option. </p>
<p>If you use a good branching methodology your colleagues will only need to look at the main branch in order to figure out what's the most up to date model they can use. Then they can check the README.md to learn how to use it. </p>
<p>However, giving full access to the repository might not be a viable option for you.</p>
<h3 id="heading-share-the-latest-model-manually">Share the latest model manually</h3>
<p>Another way would be to extract the relevant code that you want to make public and send it to them manually. </p>
<p>This can become a bit of a mess if you are working with more than one person because the model you send might not be up to date. It also puts the burden on you to make sure that people are always using the latest version of your model. </p>
<h3 id="heading-share-the-latest-model-automatically">Share the latest model automatically</h3>
<p>A simpler solution, even in the case where the repository code is available, is to put the packaging burden on a CI/CD pipeline. </p>
<p>This is the topic of this tutorial, and our setup will look like this:</p>
<ul>
<li>The code repository, CI/CD tool set, and package registry will be on Gitlab</li>
<li>The code we'll be packaging will be a simple trained PyTorch neural network on the MNIST dataset for digit recognition.</li>
<li>All the instructions and the requirements will be available in the package.</li>
</ul>
<p>🚨 <strong>Disclaimer</strong> 🚨: This isn't how you should deploy a PyTorch production-ready model! To learn how to do this, check out this tutorial on <a target="_blank" href="https://pytorch.org/tutorials/advanced/cpp_export.html">TorchScript</a>.</p>
<p>Let's get started.</p>
<h2 id="heading-gitlab-code-setup">Gitlab Code Setup</h2>
<p>For this tutorial we will bundle four files:</p>
<ul>
<li><strong>model.pth</strong>: which is a pickled version of the latest version of the trained model.</li>
<li><strong>run_mnist.py</strong>: simple Python script to run the model to detect a digit from a png image.</li>
<li><strong>requirements.txt</strong>: text file containing all the dependencies required to run the model.</li>
<li><strong>INSTRUCTION.md</strong>: step by step instructions to use the package.</li>
</ul>
<p>The package can then be used freely by anyone who has access to the package registry and will be automatically updated.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/04/package.png" alt="Image" width="600" height="400" loading="lazy">
<em>The package will then look like this on Gitlab Package Registry!</em></p>
<p>Let's jump into the neural network code, which is a modified version of this <a target="_blank" href="https://nextjournal.com/gkoehler/pytorch-mnist">comprehensive article on digit recognition</a>. The modified code can be found over at <a target="_blank" href="https://gitlab.com/yacineg4/example-ml-packaging-pipeline">my public Gitlab repository</a>.</p>
<h2 id="heading-deep-convolutional-neural-network-code">Deep Convolutional Neural Network Code</h2>
<p>In the section below, you will see quite a lot of terminology about deep neural networks. This isn't a tutorial on neural networks, so if you feel a bit overwhelmed by the specifics you can jump directly to the <strong>Branching Methodology</strong> section. </p>
<p>Just keep in mind that we've trained some sort of image recognition program that, given a <code>.png</code> file representing a digit, will be able to tell you what number it contains.</p>
<p>However, for those that want to get a better understanding about how Deep Neural Networks work under the hood, you can take a look at <a target="_blank" href="https://youtu.be/b_w4eEiogaE">my tutorial where I build one from scratch</a> or checkout the <a target="_blank" href="https://github.com/yacineMahdid/artificial-intelligence-and-machine-learning">code directly in my Github</a>.</p>
<h3 id="heading-neural-network-definition">Neural Network Definition</h3>
<p>The network definition code is very straightforward since the network we will use is simple. It has the following characteristics:</p>
<ul>
<li>2 convolutional layers.</li>
<li><a target="_blank" href="https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/">Dropout</a> is applied on the second convolutional layer.</li>
<li><a target="_blank" href="https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/">Relu</a> activation functions applied on all neurons.</li>
<li>2 fully connected layers at the end for inference.</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torchvision

<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">import</span> torch.nn.functional <span class="hljs-keyword">as</span> F
<span class="hljs-keyword">import</span> torch.optim <span class="hljs-keyword">as</span> optim


<span class="hljs-comment"># Define the network</span>
<span class="hljs-comment"># It's a 2 convolutional layer with dropout at the 2nd and finally 2 fully connected layer</span>
<span class="hljs-comment"># All layers use relu</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Net</span>(<span class="hljs-params">nn.Module</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(<span class="hljs-number">1</span>, <span class="hljs-number">10</span>, kernel_size=<span class="hljs-number">5</span>)
        self.conv2 = nn.Conv2d(<span class="hljs-number">10</span>, <span class="hljs-number">20</span>, kernel_size=<span class="hljs-number">5</span>)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(<span class="hljs-number">320</span>, <span class="hljs-number">50</span>)
        self.fc2 = nn.Linear(<span class="hljs-number">50</span>, <span class="hljs-number">10</span>)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self, x</span>):</span>
        x = F.relu(F.max_pool2d(self.conv1(x), <span class="hljs-number">2</span>))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), <span class="hljs-number">2</span>))
        x = x.view(<span class="hljs-number">-1</span>, <span class="hljs-number">320</span>)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        <span class="hljs-keyword">return</span> F.log_softmax(x, dim=<span class="hljs-number">1</span>)
</code></pre>
<h3 id="heading-training-function">Training Function</h3>
<p>We then created a utility training function in order to iteratively improve our defined network using gradient descent. If you want to learn more about how gradient descent works check out <a target="_blank" href="https://youtu.be/IH9kqpMORLM">my short tutorial on it</a>.</p>
<p>This training regimen will do the following:</p>
<ul>
<li>Iterate on batches of training data representing 28 by 28 digits.</li>
<li>Use the <a target="_blank" href="https://medium.com/deeplearningmadeeasy/negative-log-likelihood-6bd79b55d8b6">negative log likelihood cost function</a> to calculate the loss.</li>
<li>Calculate gradients.</li>
<li>Optimize the weights of the network using gradient descent.</li>
<li>Save the model at fixed intervals.</li>
</ul>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">train</span>(<span class="hljs-params">network, optimizer, train_loader, epoch_id, log_interval=<span class="hljs-number">10</span></span>):</span>
  <span class="hljs-string">"""Run the training regiment on the training set using train_loader

    Args:
        network: The instantiated network.
        optimizer: The optimizer used to change the weights.
        train_loader: the loader for the training set already setup
        epoch_id: the current id of the epoch used for cosmetic reason.
        log_interval: interval at which we print an output

    Returns:
        nothing, will save directly at root level the model and the optimizer state

  """</span>

  <span class="hljs-comment"># Set the network in training mode</span>
  network.train()

  <span class="hljs-comment"># Iterate over the full training set</span>
  <span class="hljs-keyword">for</span> batch_idx, (data, target) <span class="hljs-keyword">in</span> enumerate(train_loader):

    <span class="hljs-comment"># Calculate the gradients for this batch of data</span>
    optimizer.zero_grad()
    output = network(data)
    loss = F.nll_loss(output, target)
    loss.backward()

    <span class="hljs-comment"># Optimize the network</span>
    optimizer.step()

    <span class="hljs-comment"># Log and save every selected interval</span>
    <span class="hljs-keyword">if</span> batch_idx % log_interval == <span class="hljs-number">0</span>:

      print(<span class="hljs-string">'Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'</span>.format(
        epoch_id, batch_idx * len(data), len(train_loader.dataset),
        <span class="hljs-number">100.</span> * batch_idx / len(train_loader), loss.item()))

      <span class="hljs-comment"># This will save the state as a pickled object</span>
      torch.save(network.state_dict(), <span class="hljs-string">'./model.pth'</span>)
      torch.save(optimizer.state_dict(), <span class="hljs-string">'./optimizer.pth'</span>)
</code></pre>
<p>The data for training can be found over here on the <a target="_blank" href="http://yann.lecun.com/exdb/mnist/">Yan LeCun website</a>. Here we are using the datasets formatted as 28 by 28 PyTorch tensors for training.</p>
<h3 id="heading-testing-function">Testing Function</h3>
<p>The next function we create is a testing function to validate if our network has learned something without reusing the same training data. This function is simple in the sense that it will just tally the correct and incorrect predictions.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test</span>(<span class="hljs-params">network, test_loader</span>):</span>
  <span class="hljs-string">"""Run the testing regiment on the test set using test_loader

    Args:
        network: The instantiated and trained network.
        test_loader: the loader for the testing set already setup

    Returns:
        nothing, will only print result

  """</span>

  <span class="hljs-comment"># Variable instantiation</span>
  test_loss = <span class="hljs-number">0</span>
  correct = <span class="hljs-number">0</span>

  <span class="hljs-comment"># Move the network to evaluate mode instead of training</span>
  network.eval()

  <span class="hljs-comment"># setup torch so to not track any  gradient</span>
  <span class="hljs-keyword">with</span> torch.no_grad():

    <span class="hljs-comment"># Iterate on all the test data and accumulate the loss</span>
    <span class="hljs-keyword">for</span> data, target <span class="hljs-keyword">in</span> test_loader:
      output = network(data)
      test_loss += F.nll_loss(output, target, size_average=<span class="hljs-literal">False</span>).item()
      pred = output.data.max(<span class="hljs-number">1</span>, keepdim=<span class="hljs-literal">True</span>)[<span class="hljs-number">1</span>]
      correct += pred.eq(target.data.view_as(pred)).sum()

  <span class="hljs-comment"># Average loss calculation and printing   </span>
  test_loss /= len(test_loader.dataset)
  print(<span class="hljs-string">'\nTest set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'</span>.format(
    test_loss, correct, len(test_loader.dataset),
    <span class="hljs-number">100.</span> * correct / len(test_loader.dataset)))
</code></pre>
<p>This function will be useful to check how well our network has learned after each training iteration.</p>
<h3 id="heading-training-regimen">Training Regimen</h3>
<p>Finally, we can tie all of the above together with the main body of the training script! A few things are happening, but the most important points are the following:</p>
<ul>
<li>We set our hyper parameters statically. A better way to define them would be to use a validation set to figure them out based on the data.</li>
<li>We create our data loader which will ingest data and spit out tensors in the right shape for the network. These loader will transform the data by normalizing them with the global mean and standard deviation for the MNIST datasets.</li>
<li>We use <a target="_blank" href="https://youtu.be/7EuiXb6hFAM">stochastic gradient descent with momentum</a> as the optimization method, which is one of the many flavors of gradient descent we can use.</li>
<li>We loop through the full training dataset's "epoch", the amount of time to train the network while testing on the held-out test datasets.</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># Experimental Parameters that we can tweak</span>
n_epochs = <span class="hljs-number">3</span>
batch_size_train = <span class="hljs-number">64</span>
batch_size_test = <span class="hljs-number">1000</span>
learning_rate = <span class="hljs-number">0.01</span>
momentum = <span class="hljs-number">0.5</span>

<span class="hljs-comment"># Variable from the dataset that should stay as is</span>
global_mean_mnist = <span class="hljs-number">0.1307</span>
global_std_mnist = <span class="hljs-number">0.3081</span>


<span class="hljs-comment"># Random Seed for Reproducible Experimentation</span>
random_seed = <span class="hljs-number">42</span>
torch.backends.cudnn.enabled = <span class="hljs-literal">False</span>
torch.manual_seed(random_seed)


<span class="hljs-comment"># Data Loader to gather the data and then normalize them</span>
train_loader = torch.utils.data.DataLoader(
  torchvision.datasets.MNIST(<span class="hljs-string">'./data/'</span>, train=<span class="hljs-literal">True</span>, download=<span class="hljs-literal">True</span>,
                             transform=torchvision.transforms.Compose([
                               torchvision.transforms.ToTensor(),
                               torchvision.transforms.Normalize(
                                 (global_mean_mnist,), (global_std_mnist,))
                             ])),
  batch_size=batch_size_train, shuffle=<span class="hljs-literal">True</span>)

test_loader = torch.utils.data.DataLoader(
  torchvision.datasets.MNIST(<span class="hljs-string">'./data/'</span>, train=<span class="hljs-literal">False</span>, download=<span class="hljs-literal">True</span>,
                             transform=torchvision.transforms.Compose([
                               torchvision.transforms.ToTensor(),
                               torchvision.transforms.Normalize(
                                 (global_mean_mnist,), (global_std_mnist,))
                             ])),
  batch_size=batch_size_test, shuffle=<span class="hljs-literal">True</span>)

<span class="hljs-comment"># Initialize network and optimizer</span>
network = Net()
optimizer = optim.SGD(network.parameters(), lr=learning_rate,
                      momentum=momentum)

<span class="hljs-comment"># Test first to show that the model didn't learn a thing</span>
test(network, test_loader)

<span class="hljs-comment"># Train on the whole dataset multiple time and test</span>
<span class="hljs-keyword">for</span> epoch_id <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, n_epochs + <span class="hljs-number">1</span>):
  train(network, optimizer, train_loader, epoch_id)
  test(network, test_loader)
</code></pre>
<p>Note that it's very important to test your network on a held-out set to avoid over-fitting on the training data.</p>
<p>All of the above scripts can be found in the file <a target="_blank" href="https://gitlab.com/yacineg4/example-ml-packaging-pipeline/-/blob/master/train_mnist.py">train_mnist.py in the repository</a>. </p>
<p>At this point, we can train a model and have it saved at regular intervals in a pickle format.</p>
<p>We can now use that saved trained mode to evaluate a digit in a <code>.png</code> file.</p>
<h2 id="heading-image-recognition-code">Image Recognition Code</h2>
<p>Let's say we have as an input the following image:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/01/test_image_0.png" alt="Image" width="600" height="400" loading="lazy">
<em>a small 0 digit</em></p>
<p>or this one:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/01/test_image_7.png" alt="Image" width="600" height="400" loading="lazy">
<em>a bigger 7 digit</em></p>
<p>How can we make our network, which works on a 28 by 28 PyTorch tensor, evaluate the numbers?</p>
<p>It's fairly straightforward if we follow roughly the same process that the training datasets went through, which is:</p>
<ul>
<li>Have grayscale images (no color or alpha channels)</li>
<li>Resize the images to be 28 by 28 pixels</li>
<li>Normalize the images using the mean and standard deviation of the MNIST datasets.</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:

    <span class="hljs-comment"># Variable iniatilization</span>
    global_mean_mnist = <span class="hljs-number">0.1307</span>
    global_std_mnist = <span class="hljs-number">0.3081</span>

    <span class="hljs-comment"># Loading of the network with right weight</span>
    result_path = <span class="hljs-string">'./model.pth'</span>
    model = Net()
    model.load_state_dict(torch.load(result_path))
    model.eval()

    <span class="hljs-comment"># Setup the transform from image to normalized tensors</span>
    transform = transforms.Compose([
                        transforms.Resize((<span class="hljs-number">28</span>,<span class="hljs-number">28</span>)),
                        transforms.ToTensor(),
                        transforms.Normalize(
                            (global_mean_mnist,), (global_std_mnist,))
                        ])

    <span class="hljs-comment"># Parse the input from the user which should be a filename with the --image flag</span>
    parser = OptionParser()
    parser.add_option(<span class="hljs-string">"--image"</span>, dest = <span class="hljs-string">"input_image_path"</span>,
                      help = <span class="hljs-string">"Input Image Path"</span>)
    (options, args) = parser.parse_args()

    <span class="hljs-comment"># Get the path to the image to decode</span>
    input_image_path = str(options.input_image_path)

    <span class="hljs-comment"># Open the image(s) and do the inference</span>
    images=glob.glob(input_image_path)
    <span class="hljs-keyword">for</span> image <span class="hljs-keyword">in</span> images:

        <span class="hljs-comment"># Convert the image to grayscale</span>
        img = Image.open(image).convert(<span class="hljs-string">'L'</span>)

        <span class="hljs-comment"># Transform the image to a normalized tensor</span>
        img_tensor = transform(img).unsqueeze(<span class="hljs-number">0</span>)

        <span class="hljs-comment"># Make and print the prediction</span>
        output = model(img_tensor).data.max(<span class="hljs-number">1</span>, keepdim=<span class="hljs-literal">True</span>)[<span class="hljs-number">1</span>][<span class="hljs-number">0</span>][<span class="hljs-number">0</span>]
        print(<span class="hljs-string">f"Image is a <span class="hljs-subst">{int(output)}</span>"</span>)
</code></pre>
<p>As you can see, we use a parser to accept an image path on the command line before applying our transformations. Once they are applied we can feed that to our loaded model and collect the output prediction.</p>
<p>⚠️ Don't forget to include the definition of the network in the script (by importing or copy pasting), otherwise the pickled model will not be able to load properly.</p>
<p>We can now run our code like this:</p>
<pre><code class="lang-bash">python run_mnist.py --image NAME_OF_IMAGE.png
</code></pre>
<p>This will simply print the model's inference about what that particular image contains.</p>
<p>Now that we have the basic training and evaluation code set up, let's discuss a bit more about how to use git branching to our advantage to publish this model to the package registry.</p>
<h2 id="heading-branching-methodology">Branching Methodology</h2>
<p>If you are working alone on a project, it is very tempting to simply commit to master/main and be done with it. However, this way of working is very difficult to maintain and it makes incorporating proper CI/CD tools a pain. </p>
<p>A main / develop branch strategy as shown below is more maintainable:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/01/image-122.png" alt="Image" width="600" height="400" loading="lazy">
<em>Image from: https://nvie.com/posts/a-successful-git-branching-model/</em></p>
<p>By always keeping the main branch clean, we can easily flag our CI/CD pipeline to be triggered as soon as we push to the main. We will be also free to commit as much as we need in the develop branch while we improve our models. </p>
<p>When we are ready for a new deploy we will only need to merge with the main branch (or better yet do a merge-request / pull-request and then merge). </p>
<p>This merge to main should trigger Gitlab to upload the new version of our model to the package registry.</p>
<p>Let's take a look at the simple way to automate publishing to the package registry using the <code>.gitlab-ci.yml</code> file.</p>
<h2 id="heading-cicd-pipeline">CI/CD Pipeline</h2>
<p>The <code>.gitlab-ci.yml</code> file is a special file in your repository used by Gitlab to define what the Gitlab server should do when you push to a repository.</p>
<p>To learn more about how CI/CD works in Gitlab, head over to this <a target="_blank" href="https://medium.com/faun/gitlab-ci-cd-crash-course-6e7bcf696940">Gitlab CI/CD crash course</a>.</p>
<p>In this tutorial our <code>.gitlab-ci.yml</code> file looks like this:</p>
<pre><code class="lang-yml"><span class="hljs-attr">image:</span> <span class="hljs-string">pytorch/pytorch</span>

<span class="hljs-attr">variables:</span>
  <span class="hljs-attr">VERSION:</span> <span class="hljs-string">"0.0.4"</span> <span class="hljs-comment"># To Change if needs be</span>

<span class="hljs-attr">stages:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">upload</span>

<span class="hljs-attr">upload:</span>
  <span class="hljs-attr">stage:</span> <span class="hljs-string">upload</span>
  <span class="hljs-attr">only:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">master</span>
  <span class="hljs-attr">script:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">apt-get</span> <span class="hljs-string">update</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">apt-get</span> <span class="hljs-string">install</span> <span class="hljs-string">-y</span> <span class="hljs-string">curl</span> <span class="hljs-string">wget</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">'curl --header "JOB-TOKEN: $CI_JOB_TOKEN" --upload-file ./model.pth "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/example-ml-packaging-pipeline/${VERSION}/model.pth"'</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">'curl --header "JOB-TOKEN: $CI_JOB_TOKEN" --upload-file ./run_mnist.py "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/example-ml-packaging-pipeline/${VERSION}/run_mnist.py"'</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">'curl --header "JOB-TOKEN: $CI_JOB_TOKEN" --upload-file ./requirements.txt "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/example-ml-packaging-pipeline/${VERSION}/requirements.txt"'</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">'curl --header "JOB-TOKEN: $CI_JOB_TOKEN" --upload-file ./INSTRUCTION.md "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/example-ml-packaging-pipeline/${VERSION}/INSTRUCTION.md"'</span>
</code></pre>
<p>The anatomy of this <code>.yml</code> file is very bare bones. We have only one stage in our pipeline which is the <code>upload</code> stage. </p>
<p>In the upload stage, we will run the <code>script</code> section only when the <code>master</code> branch gets updated. The script that we ran is simply using <code>curl</code> to transfer the data from this repository (4 files) into the package registry.</p>
<p>Let's take a look at the anatomy of the <code>curl</code> command we are using:</p>
<pre><code class="lang-python"> - <span class="hljs-string">'curl --header "JOB-TOKEN: $CI_JOB_TOKEN" --upload-file ./NAME_OF_FILE "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/example-ml-packaging-pipeline/${VERSION}/NAME_OF_FILE"'</span>
</code></pre>
<ul>
<li><code>--header</code> is used to tell curl that you will be including an <a target="_blank" href="https://curl.se/docs/manpage.html#-H">extra header to the request</a>.</li>
<li><code>JOB-TOKEN</code> is our header and <code>$CI_JOB_TOKEN</code> is its value. It's a variable that lives within Gitlab servers when a job is created</li>
<li><code>--upload-file</code> is a flag to tell that we will transfer a <a target="_blank" href="https://curl.se/docs/manpage.html#-T">local file to the remote URL</a>.</li>
<li><code>./NAME_OF_FILE</code> is the name of the local file we want to transfer.</li>
<li><code>${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/example-ml-packaging-pipeline/${VERSION}/NAME_OF_FILE</code> is the location of the remote URL that we want to transfer a file. </li>
</ul>
<p>Here <code>$CI_API_V4_URL</code> is the URL of the Gitlab API we are using, <code>$CI_PROJECT_ID</code> is defined within Gitlab CI as the id for our project, and finally <code>VERSION</code> is the version number we defined at the top of the <code>.yml</code> file.</p>
<p>That's it! When you update the main branch to the remote repository on Gitlab it will fire up a pipeline that will run your packaging job.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/04/gitlab-ci.png" alt="Image" width="600" height="400" loading="lazy">
<em>The job will then be available and you will be able to check the trace on Gitlab!</em></p>
<p>You and your teammates will be able to see the document in the package registry section and get the right versioned files in the package:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/04/package-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>This is our v.0.0.5 of the example package!</em></p>
<p>To get a more complete idea of what is possible with the Packages API, head over to the <a target="_blank" href="https://docs.gitlab.com/ee/api/packages.html">official documentation</a>.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial you've learn how to bundle, upload, and automatize a machine learning model packaging using Gitlab CI/CD. </p>
<p>Congratulation! 🎉🎉🎉</p>
<p>There is still a lot more you can do with Gitlab CI/CD, for instance:</p>
<ul>
<li>Add a testing stage before the bundling in order to make sure that there is no regression in the code.</li>
<li>Add a testing stage after the bundling to make sure that the performance of your model is satisfactory in terms of inference latency.</li>
<li>Use a more optimized version of the model with TorchScript.</li>
<li>Add automatic social notification of new release after the upload step.</li>
</ul>
<p>To learn more about Gitlab CI/CD the official docs is a great place to start out, and the <a target="_blank" href="https://docs.gitlab.com/ee/ci/quick_start/">get started section is very beginner friendly</a>.</p>
<p>If you want to read more of this type of content, check out my <a target="_blank" href="https://grad4.com/en/category/blog/grad4-engineering-blog/">mechanical/software engineering articles</a>. If you want to discuss any of this feel free to send me a DM on <a target="_blank" href="https://www.linkedin.com/in/yacine-mahdid-809425163/">LinkedIn</a> or <a target="_blank" href="https://twitter.com/CodeThisCodeTh1">Twitter</a> :) </p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
