<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ keras - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ keras - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Mon, 25 May 2026 05:06:45 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/keras/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Convert a Keras SavedModel into a Browser-based Web App ]]>
                </title>
                <description>
                    <![CDATA[ By Suchandra Datta If you're a Python developer who works with Keras SavedModels, this article is for you.  Perhaps you're not sure how to use SavedModels to leverage the power of machine learning in browser-based web apps. But don't worry – we'll co... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/convert-keras-savedmodel-into-browser-based-webapp/</link>
                <guid isPermaLink="false">66d8525e9fbd5815f63ef040</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ keras ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Tue, 18 May 2021 16:36:22 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/05/beach-photo.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Suchandra Datta</p>
<p>If you're a Python developer who works with Keras SavedModels, this article is for you. </p>
<p>Perhaps you're not sure how to use SavedModels to leverage the power of machine learning in browser-based web apps. But don't worry – we'll cover all the basic steps you need to get started. </p>
<p>Along with that, we'll go over some important concepts that'll help make it easier for you to transition to JavaScript from Python.</p>
<p>Before we dive into the process, let's address some questions that are likely to pop into your mind at this point.</p>
<h2 id="heading-what-is-a-keras-savedmodel">What is a Keras SavedModel?</h2>
<p>A Keras model is made up of the network architecture, model weights, and an optimizer for your loss function. </p>
<p>The default format for saving models on disk is the SavedModel format. This format allows us to save models with custom objects with minimum hassle. </p>
<p>SavedModel stores the optimizer, loss, and network architecture in the saved_model.pb file while the weights are stored in the variables directory. </p>
<p>For more detailed information on the SavedModel format, check out the official docs <a target="_blank" href="https://www.tensorflow.org/guide/keras/save_and_serialize">here</a>.</p>
<h2 id="heading-how-do-i-train-a-keras-savedmodel-if-i-dont-have-a-gpu">How do I train a Keras SavedModel if I don't have a GPU?</h2>
<p>Most machine learning enthusiasts without access to GPU facilities start off with model development on Google Colaboratory. </p>
<p>I've been an avid admirer of Google Colab and its features ever since I first became interested in the field of machine learning. It offers a Jupyter Notebook environment with free access to GPU's with a maximum training time of 12 hours. </p>
<p>If you've got any questions regarding Google Colaboratory, head over to their FAQ section linked <a target="_blank" href="https://research.google.com/colaboratory/faq.html#:~:text=How%20long%20can%20notebooks%20run,or%20based%20on%20your%20usage.">here</a>. </p>
<h2 id="heading-why-would-i-want-to-convert-a-savedmodel-into-a-web-app">Why would I want to convert a SavedModel into a web app?</h2>
<p>Web-based products are everywhere, and they're generally pretty easy to use. You're probably reading this article from a browser right now, either from your phone, desktop, or laptop. </p>
<p>Machine learning models, at the end of the day, are meant to be used in the real-world not kept inside a glass box. So what better way to bring your model to users than through a web-based medium? </p>
<p>On top of that, browser-based apps don't require any installation overhead and can be accessed uniformly from multiple devices.</p>
<h2 id="heading-okay-then-lets-get-started">Okay then, let's get started</h2>
<p>I had built a simple emotion detection CNN model that could predict 7 emotions (happy, sad, neutral, angry, surprise, fear and disgust) using Python and the Keras API. </p>
<p>Trying to convert it into a format suitable for the web without prior experience proved to be a bit difficult. The entire process, which I'll describe next, is thanks to the wonderful documentation of <a target="_blank" href="https://www.tensorflow.org/js/tutorials/setup">Tensorflow.js</a>, the <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/File/Using_files_from_web_applications">MDN Web docs</a>, and <a target="_blank" href="https://firebase.google.com/docs/hosting">Firebase hosting documentation</a>.</p>
<p>Using these resources, I was able to narrow down the process to the following steps:</p>
<ul>
<li>Convert Keras SavedModel to the Tensorflow.js Layers Format</li>
<li>Load the model via JavaScript and Promises</li>
<li>Access an image uploaded by a user</li>
<li>Preprocess the uploaded image</li>
<li>Model inference in browser and display output via a user interface</li>
</ul>
<p>Let's look at each of these steps in greater detail.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/05/image-15.png" alt="Image" width="600" height="400" loading="lazy">
<em>Photo from Unsplash</em></p>
<h2 id="heading-how-to-convert-a-keras-savedmodel-to-the-tensorflowjs-layers-format">How to Convert a Keras SavedModel to the Tensorflow.js Layers Format</h2>
<p>To convert a Keras SavedModel to the Tensorflow.js layers format, we'd need to use the tensorflowjs_converter script. We can also use the Python API as described in their official docs <a target="_blank" href="https://www.tensorflow.org/js/tutorials/conversion/import_keras">here</a>. </p>
<p>I ran into a frustrating error with the former, as for some reason the tensorflowjs_converter did not seem to work on Google Colab. </p>
<p>I had saved the model on drive and the "My Drive" part of the file path, specifically the space, seemed to be causing trouble. I found it mentioned in this GitHub issue #3618 <a target="_blank" href="https://github.com/tensorflow/tfjs/issues/3618">here</a>. </p>
<p>Using the Python API worked seamlessly, which gave me a model.json file for the model architecture and binary files for the weights. Now I was ready to use it on the web!</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/05/pic.png" alt="Image" width="600" height="400" loading="lazy">
<em>Code to convert SavedModel Format to Layers Format</em></p>
<p>But wait! Why do we need to convert? Why don't we just train our model using Tensorflow.js itself? </p>
<p>Well, you need to do this conversion if you've already spent a lot of time training your Keras models on large datasets and don't want to rewrite and retrain it using JavaScript.</p>
<h2 id="heading-how-to-load-the-model-via-javascript-and-promises">How to Load the Model via JavaScript and Promises</h2>
<p><a target="_blank" href="https://www.tensorflow.org/js/tutorials">Tensorflow.js</a> is a JavaScript-based library for machine learning model development. You can use it in the browser as well as through the popular JavaScript runtime Node.js. </p>
<p>You can set it up in two different ways: either by including it using a script tag or using it through Node.js. </p>
<p>Since the CNN model I trained is fairly straight-forward, I opted for the script tag approach.</p>
<pre><code class="lang-html"><span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
</code></pre>
<p>Now that we've included the Tensorflow.js library, the next step is to load the model. We can load the model in the following ways:</p>
<ul>
<li>Browser's local storage</li>
<li>Browser's IndexedDB storage</li>
<li>From an HTTP or HTTPS endpoint</li>
<li>From native file system using Node.js</li>
</ul>
<p>Loading the model from an HTTPS endpoint seemed to be the most feasible way for me. So I hosted the model files on Firebase Hosting and loaded the model using the following code:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> model = <span class="hljs-keyword">await</span> tf.loadLayersModel(<span class="hljs-string">'model.json'</span>);
</code></pre>
<p>Tensorflow uses the <code>fetch</code> method to load resources using a Promise-based approach. Fetch returns a Promise which resolves to the response containing the requested resources. </p>
<p>A Promise in JavaScript is a proxy for a value which you don't know at this current instant in time, but that will maybe be known at some later point in time. </p>
<p>For example, when requesting for URL-based resources, we don't know immediately if we'll actually get those resources – we'll have to wait for some time until the server responds (or doesn't). </p>
<p>But waiting in any form is detrimental to responsiveness and continued user interaction, which is critical for web pages. So JavaScript allows you to use asynchronous calls via Promises. These let you request resources AND continue with subsequent statements irrespective of the server's response. </p>
<p>To allow cleaner and easier error handling with Promises, async/await was introduced. Await blocks control flow until a Promise returns and the functions with await statements are declared async. </p>
<h2 id="heading-how-to-access-an-image-uploaded-by-a-user">How to Access an Image Uploaded by a User</h2>
<p>Let's create a simple file upload functionality using an HTML input tag and another button that'll start the prediction computations when clicked.</p>
<pre><code class="lang-html"><span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"container"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"tray"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"uploadFile"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"custombutton"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">i</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"fa fa-file"</span> <span class="hljs-attr">style</span>=<span class="hljs-string">"font-size:25px;color: #1ab5e3"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">i</span>&gt;</span><span class="hljs-tag">&lt;<span class="hljs-name">br</span>/&gt;</span><span class="hljs-tag">&lt;<span class="hljs-name">br</span>/&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"file"</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"fileupload"</span> <span class="hljs-attr">accept</span>=<span class="hljs-string">"image/*"</span> <span class="hljs-attr">onchange</span>=<span class="hljs-string">"display(event)"</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"custombutton"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">i</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"fa fa-bar-chart"</span> <span class="hljs-attr">style</span>=<span class="hljs-string">"font-size:25px;color: #1ab5e3"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">i</span>&gt;</span><span class="hljs-tag">&lt;<span class="hljs-name">br</span>/&gt;</span><span class="hljs-tag">&lt;<span class="hljs-name">br</span>/&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"button"</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"predict"</span> <span class="hljs-attr">onclick</span>=<span class="hljs-string">"predict_emotion()"</span> <span class="hljs-attr">value</span>=<span class="hljs-string">"PREDICT"</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
</code></pre>
<p>The file upload and predict buttons look like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/05/image-24.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Next, we access the image file uploaded and display it using object URLs as described in the MDN Web docs linked <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/File/Using_files_from_web_applications">here</a>.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">let</span> input_image = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">"input_image"</span>)
input_image.src = URL.createObjectURL(event.target.files[<span class="hljs-number">0</span>]);
<span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">"input_image_container"</span>).style.display = <span class="hljs-string">"block"</span>;

<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"input_image_container"</span>&gt;</span><span class="hljs-tag">&lt;<span class="hljs-name">img</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"#"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"input_image"</span> <span class="hljs-attr">style</span>=<span class="hljs-string">"top:5vh;"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>
</code></pre>
<p>After uploading an image, it looks like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/05/image-25.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-to-preprocess-the-uploaded-image">How to Preprocess the Uploaded Image</h2>
<p>This is model domain-specific, and requires different steps for different applications. </p>
<p>For my model, I didn't have to do much, just some simple normalization and resizing which I easily performed using Tensorflow.js functions. </p>
<p>Do check out their <a target="_blank" href="https://js.tensorflow.org/api/latest/">official API reference</a> for a thorough understanding of the functions offered and their use cases.</p>
<pre><code class="lang-javascript"><span class="hljs-comment">//Preprocessing steps </span>
        <span class="hljs-comment">/*
        (1)Resize to 48*48
        (2)Convert to grayscale using simple mean
        (3)Convert to float
        (4)Reshape to (1,48,48,1)
        (5)Normalize by dividing by 255.0
        */</span>
<span class="hljs-keyword">let</span> step1 = tf.browser.fromPixels(input)
.resizeNearestNeighbor([<span class="hljs-number">48</span>,<span class="hljs-number">48</span>])
.mean(<span class="hljs-number">2</span>)
.toFloat()
.expandDims(<span class="hljs-number">0</span>)
.expandDims(<span class="hljs-number">-1</span>)
.div(<span class="hljs-number">255.0</span>)
</code></pre>
<h2 id="heading-model-inference-in-the-browser-and-displaying-the-output-via-a-user-interface">Model Inference in the Browser and Displaying the Output via a User Interface</h2>
<p>The predict function returns the predictions – in our case, a tensor with 7 probability values for the 7 emotions. </p>
<p>We scale up the probabilities for displaying in the browser using one div for each emotion and the div's width to specify the scaled up probability value.</p>
<pre><code class="lang-javascript">pred = model.predict(step1)
pred.data()
    .then(<span class="hljs-function">(<span class="hljs-params">data</span>) =&gt;</span> {<span class="hljs-built_in">console</span>.log(data)
                   output = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">"output_chart"</span>)
                output.innerHTML = <span class="hljs-string">""</span>
                max_val = <span class="hljs-number">-1</span>
                max_val_index = <span class="hljs-number">-1</span>
                <span class="hljs-keyword">for</span>(<span class="hljs-keyword">let</span> i=<span class="hljs-number">0</span>;i&lt;data.length;i++)
                {
                    style_text = <span class="hljs-string">"width: "</span>+data[i]*<span class="hljs-number">150</span>+<span class="hljs-string">"px; height: 25px; position:relative; margin-top: 3vh; background-color: violet; "</span>
                    output.innerHTML+=<span class="hljs-string">"&lt;div style = '"</span> +style_text+ <span class="hljs-string">"'&gt;&lt;/div&gt;"</span>
                    <span class="hljs-keyword">if</span>(data[i] &gt; max_val)
                    {
                        max_val = data[i]
                        max_val_index = i
                    }
                }
                EMOTION_DETECTED = emotions[max_val_index]
                <span class="hljs-built_in">document</span>.getElementsByClassName(<span class="hljs-string">"output_screen"</span>)[<span class="hljs-number">0</span>].style.display=<span class="hljs-string">"flex"</span>;
<span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">"output_text"</span>).innerHTML=<span class="hljs-string">""</span>
<span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">"output_text"</span>).innerHTML = <span class="hljs-string">"&lt;p&gt;Emotions and corresponding scaled up probability&lt;/p&gt;&lt;p&gt;Emotion detected: "</span> + EMOTION_DETECTED + <span class="hljs-string">"("</span> + (max_val*<span class="hljs-number">100</span>).toFixed(<span class="hljs-number">2</span>) + <span class="hljs-string">"% probability)&lt;/p&gt;"</span>
</code></pre>
<p>Great – we've got all the building blocks ready! Now let's put it all together. We'll integrate the following parts:</p>
<ul>
<li>The HTML markup which serves as a simple UI</li>
<li>Script tag for accessing Tensorflow.js</li>
<li>Script tag for our Font Awesome icons</li>
<li>JavaScript code for model loading, inference, and output</li>
</ul>
<p>Here is the final JavaScript code:</p>
<pre><code class="lang-javascript"><span class="hljs-comment">//Display image uploaded by user</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">display</span>(<span class="hljs-params">event</span>)
    </span>{
        <span class="hljs-keyword">let</span> input_image = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">"input_image"</span>)
        input_image.src = URL.createObjectURL(event.target.files[<span class="hljs-number">0</span>]);
        <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">"input_image_container"</span>).style.display = <span class="hljs-string">"block"</span>;
    }

<span class="hljs-comment">//Predict emotion and display output</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">predict_emotion</span>(<span class="hljs-params"></span>)
    </span>{
        <span class="hljs-keyword">let</span> input = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">"input_image"</span>);
        <span class="hljs-comment">//Preprocessing steps </span>
        <span class="hljs-comment">/*
        (1)Resize to 48*48
        (2)Convert to grayscale using simple mean
        (3)Convert to float
        (4)Reshape to (1,48,48,1)
        (5)Normalize by dividing by 255.0
        */</span>
        <span class="hljs-keyword">let</span> step1 = tf.browser.fromPixels(input).resizeNearestNeighbor([<span class="hljs-number">48</span>,<span class="hljs-number">48</span>]).mean(<span class="hljs-number">2</span>).toFloat().expandDims(<span class="hljs-number">0</span>).expandDims(<span class="hljs-number">-1</span>).div(<span class="hljs-number">255.0</span>)
        <span class="hljs-keyword">const</span> model = <span class="hljs-keyword">await</span> tf.loadLayersModel(<span class="hljs-string">'model.json'</span>);
        pred = model.predict(step1)
        pred.print()
        <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"End of predict function"</span>)
        <span class="hljs-comment">//This array is encoded with index i = corresponding emotion. In dataset, 0 = Angry, 1 = Disgust, 2 = Fear, 3 = Happy, 4 = Sad, 5 = Surprise and 6 = Neutral</span>
        emotions = [<span class="hljs-string">"Angry"</span>, <span class="hljs-string">"Disgust"</span>, <span class="hljs-string">"Fear"</span>, <span class="hljs-string">"Happy"</span>, <span class="hljs-string">"Sad"</span>, <span class="hljs-string">"Surprise"</span>, <span class="hljs-string">"Neutral"</span>]
        <span class="hljs-comment">//At which index in tensor we get the largest value ?</span>
        pred.data()
            .then(<span class="hljs-function">(<span class="hljs-params">data</span>) =&gt;</span> {<span class="hljs-built_in">console</span>.log(data)
                output = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">"output_chart"</span>)
                output.innerHTML = <span class="hljs-string">""</span>
                max_val = <span class="hljs-number">-1</span>
                max_val_index = <span class="hljs-number">-1</span>
                <span class="hljs-keyword">for</span>(<span class="hljs-keyword">let</span> i=<span class="hljs-number">0</span>;i&lt;data.length;i++)
                {
                    style_text = <span class="hljs-string">"width: "</span>+data[i]*<span class="hljs-number">150</span>+<span class="hljs-string">"px; height: 25px; position:relative; margin-top: 3vh; background-color: violet; "</span>
                    output.innerHTML+=<span class="hljs-string">"&lt;div style = '"</span> +style_text+ <span class="hljs-string">"'&gt;&lt;/div&gt;"</span>
                    <span class="hljs-keyword">if</span>(data[i] &gt; max_val)
                    {
                        max_val = data[i]
                        max_val_index = i
                    }
                }
                EMOTION_DETECTED = emotions[max_val_index]
                <span class="hljs-built_in">document</span>.getElementsByClassName(<span class="hljs-string">"output_screen"</span>)[<span class="hljs-number">0</span>].style.display=<span class="hljs-string">"flex"</span>;
                <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">"output_text"</span>).innerHTML=<span class="hljs-string">""</span>
                <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">"output_text"</span>).innerHTML = <span class="hljs-string">"&lt;p&gt;Emotions and corresponding scaled up probability&lt;/p&gt;&lt;p&gt;Emotion detected: "</span> + EMOTION_DETECTED + <span class="hljs-string">"("</span> + (max_val*<span class="hljs-number">100</span>).toFixed(<span class="hljs-number">2</span>) + <span class="hljs-string">"% probability)&lt;/p&gt;"</span>
        })    

    }
</code></pre>
<p>Here's the final HTML and script tags:</p>
<pre><code class="lang-html"><span class="hljs-meta">&lt;!DOCTYPE <span class="hljs-meta-keyword">html</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">html</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">head</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">title</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">title</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">meta</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"viewport"</span> <span class="hljs-attr">content</span>=<span class="hljs-string">"width=device-width, initial-scale=1"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">link</span> <span class="hljs-attr">rel</span>=<span class="hljs-string">"stylesheet"</span> <span class="hljs-attr">href</span>=<span class="hljs-string">"https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">link</span> <span class="hljs-attr">rel</span>=<span class="hljs-string">"stylesheet"</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"text/css"</span> <span class="hljs-attr">href</span>=<span class="hljs-string">"styles/page_styling.css"</span>&gt;</span>

<span class="hljs-tag">&lt;/<span class="hljs-name">head</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">body</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"input_image_container"</span>&gt;</span><span class="hljs-tag">&lt;<span class="hljs-name">img</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"#"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"input_image"</span> <span class="hljs-attr">style</span>=<span class="hljs-string">"top:5vh;"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"container"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"tray"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"uploadFile"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"custombutton"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">i</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"fa fa-file"</span> <span class="hljs-attr">style</span>=<span class="hljs-string">"font-size:25px;color: #1ab5e3"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">i</span>&gt;</span><span class="hljs-tag">&lt;<span class="hljs-name">br</span>/&gt;</span><span class="hljs-tag">&lt;<span class="hljs-name">br</span>/&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"file"</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"fileupload"</span> <span class="hljs-attr">accept</span>=<span class="hljs-string">"image/*"</span> <span class="hljs-attr">onchange</span>=<span class="hljs-string">"display(event)"</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"custombutton"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">i</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"fa fa-bar-chart"</span> <span class="hljs-attr">style</span>=<span class="hljs-string">"font-size:25px;color: #1ab5e3"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">i</span>&gt;</span><span class="hljs-tag">&lt;<span class="hljs-name">br</span>/&gt;</span><span class="hljs-tag">&lt;<span class="hljs-name">br</span>/&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"button"</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"predict"</span> <span class="hljs-attr">onclick</span>=<span class="hljs-string">"predict_emotion()"</span> <span class="hljs-attr">value</span>=<span class="hljs-string">"PREDICT"</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"container output_screen"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"emotion_tags"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">ul</span>&gt;</span>
                <span class="hljs-tag">&lt;<span class="hljs-name">li</span>&gt;</span>Angry<span class="hljs-tag">&lt;/<span class="hljs-name">li</span>&gt;</span>
                <span class="hljs-tag">&lt;<span class="hljs-name">li</span>&gt;</span>Disgust<span class="hljs-tag">&lt;/<span class="hljs-name">li</span>&gt;</span>
                <span class="hljs-tag">&lt;<span class="hljs-name">li</span>&gt;</span>Fear<span class="hljs-tag">&lt;/<span class="hljs-name">li</span>&gt;</span>
                <span class="hljs-tag">&lt;<span class="hljs-name">li</span>&gt;</span>Happy<span class="hljs-tag">&lt;/<span class="hljs-name">li</span>&gt;</span>
                <span class="hljs-tag">&lt;<span class="hljs-name">li</span>&gt;</span>Sad<span class="hljs-tag">&lt;/<span class="hljs-name">li</span>&gt;</span>
                <span class="hljs-tag">&lt;<span class="hljs-name">li</span>&gt;</span>Surprise<span class="hljs-tag">&lt;/<span class="hljs-name">li</span>&gt;</span>
                <span class="hljs-tag">&lt;<span class="hljs-name">li</span>&gt;</span>Neutral<span class="hljs-tag">&lt;/<span class="hljs-name">li</span>&gt;</span>
            <span class="hljs-tag">&lt;/<span class="hljs-name">ul</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"output_chart"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"output_text"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"scripts/script.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">body</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">html</span>&gt;</span>
</code></pre>
<p>Here's a sample output, where the top three predicted emotions are sad, happy, and neutral:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/05/image-8.png" alt="Image" width="600" height="400" loading="lazy">
<em>Predictions and UI</em></p>
<h2 id="heading-wrapping-up">Wrapping up</h2>
<p>In this article, we went through the basic steps you need to go through to convert a Keras SavedModel to a web-friendly format. We learned how to load, preprocess, and infer in the browser using Tensorflow.js and display output via a user interface. </p>
<p>I hope you enjoyed reading this article and found it helpful. Have a good day and I wish you good luck in your coding journey!</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/05/image-16.png" alt="Image" width="600" height="400" loading="lazy">
<em>Photo from Unsplash</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Keras Course – Learn Python Deep Learning and Neural Networks ]]>
                </title>
                <description>
                    <![CDATA[ Keras is a neural network API written in Python and integrated with TensorFlow. You can learn how to use Keras in a new video course on the freeCodeCamp.org YouTube channel. In this course from deeplizard, you will learn how to prepare and process da... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/keras-video-course-python-deep-learning/</link>
                <guid isPermaLink="false">66b203d9a8b92c9329236478</guid>
                
                    <category>
                        <![CDATA[ keras ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Thu, 18 Jun 2020 20:51:08 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2020/10/keras.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Keras is a neural network API written in Python and integrated with TensorFlow. You can learn how to use Keras in a new <a target="_blank" href="https://www.youtube.com/watch?v=qFJeN9V1ZsI">video course on the freeCodeCamp.org YouTube channel</a>.</p>
<p>In this course from <a target="_blank" href="https://deeplizard.com">deeplizard</a>, you will learn how to prepare and process data for artificial neural networks, build and train  artificial neural networks from scratch, build and train convolutional neural networks (CNNs), implement fine-tuning and transfer learning, and more.</p>
<p><img src="https://deeplizard.com/images/keras%20logo%20with%20text.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Each section of the course focuses on a specific concept, and shows how the full implementation is done in code using Keras and Python.</p>
<p>You will learn to build some networks from scratch. Others will be pre-trained state-of-the-art models that you'll get to fine-tune to the data. Then you'll learn how to deploy models using both front-end and back-end deployment techniques.</p>
<p>Here's the full course syllabus:</p>
<h3 id="heading-part-1-artificial-neural-network-basics">Part 1: Artificial Neural Network Basics</h3>
<p>Section 1: Intro to Keras and neural networks</p>
<ul>
<li>Processing data</li>
<li>Building and training neural networks</li>
<li>Validation and inference</li>
<li>Saving and loading models</li>
</ul>
<p>Section 2: Convolutional Neural Networks (CNNs)</p>
<ul>
<li>Image processing</li>
<li>Building and training CNNs</li>
<li>Using CNNs for inference</li>
</ul>
<p>Section 3: Fine-tuning and transfer learning</p>
<ul>
<li>Intro to fine-tuning and VGG16 model</li>
<li>Implement fine-tuning on VGG16 model</li>
<li>Using fine-tuned models for inference</li>
<li>Intro to MobileNet</li>
<li>Fine-tuning MobileNet on subset of data</li>
</ul>
<p>Section 4: Additional topics</p>
<ul>
<li>Data augmentation</li>
<li>Keras' image labeling implementation</li>
<li>Achieving reproducible results</li>
<li>Learnable parameters</li>
</ul>
<h3 id="heading-part-2-neural-network-model-deployment">Part 2: Neural network model deployment</h3>
<p>Section 1: Deployment with Flask</p>
<ul>
<li>Introduction to Flask and web services</li>
<li>Build a simple Flask app and web app</li>
<li>Send and receive data with Flask</li>
<li>Host neural network with Flask</li>
<li>Build neural network web app to interact with Flask service</li>
<li>Integrating data visualization with D3, DC, Crossfilter</li>
<li>Alternative ways to access neural network from Powershell and Curl</li>
<li>Information privacy and data protection</li>
</ul>
<p>Section 2: Deployment with TensorFlow.js</p>
<ul>
<li>Introduction to client-side neural networks</li>
<li>Convert Keras model to TFJS model</li>
<li>Set up Node.js and Express</li>
<li>Build UI for neural network web app</li>
<li>Host a neural network with TFJS</li>
<li>Explore tensor operations through image processing</li>
<li>Examine tensor operations with debugger</li>
<li>Broadcasting tensors</li>
<li>Efficiency of hosting MobileNet in the browser</li>
</ul>
<p><a target="_blank" href="https://www.youtube.com/watch?v=qFJeN9V1ZsI">You can watch the full course on the freeCodeCamp.org YouTube channel</a> (3 hour watch).</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Handle Overfitting in Deep Learning Models ]]>
                </title>
                <description>
                    <![CDATA[ By Bert Carremans Overfitting occurs when you achieve a good fit of your model on the training data, but it does not generalize well on new, unseen data. In other words, the model learned patterns specific to the training data, which are irrelevant i... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/handling-overfitting-in-deep-learning-models/</link>
                <guid isPermaLink="false">66d45dd9bc9760a197a10351</guid>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ keras ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Overfitting ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ regularization ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Sun, 05 Jan 2020 22:36:48 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2020/01/1_XXtWMdH-YVL8z1VtnfG_iw.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Bert Carremans</p>
<p>Overfitting occurs when you achieve a good fit of your model on the training data, but it does not generalize well on new, unseen data. In other words, the model learned patterns specific to the training data, which are irrelevant in other data.</p>
<p>We can identify overfitting by looking at validation metrics like loss or accuracy. Usually, the validation metric stops improving after a certain number of epochs and begins to decrease afterward. The training metric continues to improve because the model seeks to find the best fit for the training data.</p>
<p>There are several manners in which we can reduce overfitting in deep learning models. The best option is to <strong>get more training data</strong>. Unfortunately, in real-world situations, you often do not have this possibility due to time, budget, or technical constraints.</p>
<p>Another way to reduce overfitting is to <strong>lower the capacity of the model to memorize the training data</strong>. As such, the model will need to focus on the relevant patterns in the training data, which results in better generalization. In this post, we’ll discuss three options to achieve this.</p>
<h1 id="heading-set-up-the-project">Set up the project</h1>
<p>We start by importing the necessary packages and configuring some parameters. We will use <a target="_blank" href="https://keras.io/">Keras</a> to fit the deep learning models. The training data is the <a target="_blank" href="https://www.kaggle.com/crowdflower/twitter-airline-sentiment">Twitter US Airline Sentiment data set from Kaggle</a>.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Basic packages</span>
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd 
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> re
<span class="hljs-keyword">import</span> collections
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">from</span> pathlib <span class="hljs-keyword">import</span> Path
<span class="hljs-comment"># Packages for data preparation</span>
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
<span class="hljs-keyword">from</span> nltk.corpus <span class="hljs-keyword">import</span> stopwords
<span class="hljs-keyword">from</span> keras.preprocessing.text <span class="hljs-keyword">import</span> Tokenizer
<span class="hljs-keyword">from</span> keras.utils.np_utils <span class="hljs-keyword">import</span> to_categorical
<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> LabelEncoder
<span class="hljs-comment"># Packages for modeling</span>
<span class="hljs-keyword">from</span> keras <span class="hljs-keyword">import</span> models
<span class="hljs-keyword">from</span> keras <span class="hljs-keyword">import</span> layers
<span class="hljs-keyword">from</span> keras <span class="hljs-keyword">import</span> regularizers
NB_WORDS = <span class="hljs-number">10000</span>  <span class="hljs-comment"># Parameter indicating the number of words we'll put in the dictionary</span>
NB_START_EPOCHS = <span class="hljs-number">20</span>  <span class="hljs-comment"># Number of epochs we usually start to train with</span>
BATCH_SIZE = <span class="hljs-number">512</span>  <span class="hljs-comment"># Size of the batches used in the mini-batch gradient descent</span>
MAX_LEN = <span class="hljs-number">20</span>  <span class="hljs-comment"># Maximum number of words in a sequence</span>
root = Path(<span class="hljs-string">'../'</span>)
input_path = root / <span class="hljs-string">'input/'</span> 
ouput_path = root / <span class="hljs-string">'output/'</span>
source_path = root / <span class="hljs-string">'source/'</span>
</code></pre>
<h1 id="heading-some-helper-functions">Some helper functions</h1>
<p>We will use some helper functions throughout this post.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">deep_model</span>(<span class="hljs-params">model, X_train, y_train, X_valid, y_valid</span>):</span>
    <span class="hljs-string">'''
    Function to train a multi-class model. The number of epochs and 
    batch_size are set by the constants at the top of the
    notebook. 

    Parameters:
        model : model with the chosen architecture
        X_train : training features
        y_train : training target
        X_valid : validation features
        Y_valid : validation target
    Output:
        model training history
    '''</span>
    model.compile(optimizer=<span class="hljs-string">'rmsprop'</span>
                  , loss=<span class="hljs-string">'categorical_crossentropy'</span>
                  , metrics=[<span class="hljs-string">'accuracy'</span>])

    history = model.fit(X_train
                       , y_train
                       , epochs=NB_START_EPOCHS
                       , batch_size=BATCH_SIZE
                       , validation_data=(X_valid, y_valid)
                       , verbose=<span class="hljs-number">0</span>)
    <span class="hljs-keyword">return</span> history
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">eval_metric</span>(<span class="hljs-params">model, history, metric_name</span>):</span>
    <span class="hljs-string">'''
    Function to evaluate a trained model on a chosen metric. 
    Training and validation metric are plotted in a
    line chart for each epoch.

    Parameters:
        history : model training history
        metric_name : loss or accuracy
    Output:
        line chart with epochs of x-axis and metric on
        y-axis
    '''</span>
    metric = history.history[metric_name]
    val_metric = history.history[<span class="hljs-string">'val_'</span> + metric_name]
    e = range(<span class="hljs-number">1</span>, NB_START_EPOCHS + <span class="hljs-number">1</span>)
    plt.plot(e, metric, <span class="hljs-string">'bo'</span>, label=<span class="hljs-string">'Train '</span> + metric_name)
    plt.plot(e, val_metric, <span class="hljs-string">'b'</span>, label=<span class="hljs-string">'Validation '</span> + metric_name)
    plt.xlabel(<span class="hljs-string">'Epoch number'</span>)
    plt.ylabel(metric_name)
    plt.title(<span class="hljs-string">'Comparing training and validation '</span> + metric_name + <span class="hljs-string">' for '</span> + model.name)
    plt.legend()
    plt.show()
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_model</span>(<span class="hljs-params">model, X_train, y_train, X_test, y_test, epoch_stop</span>):</span>
    <span class="hljs-string">'''
    Function to test the model on new data after training it
    on the full training data with the optimal number of epochs.

    Parameters:
        model : trained model
        X_train : training features
        y_train : training target
        X_test : test features
        y_test : test target
        epochs : optimal number of epochs
    Output:
        test accuracy and test loss
    '''</span>
    model.fit(X_train
              , y_train
              , epochs=epoch_stop
              , batch_size=BATCH_SIZE
              , verbose=<span class="hljs-number">0</span>)
    results = model.evaluate(X_test, y_test)
    print()
    print(<span class="hljs-string">'Test accuracy: {0:.2f}%'</span>.format(results[<span class="hljs-number">1</span>]*<span class="hljs-number">100</span>))
    <span class="hljs-keyword">return</span> results

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">remove_stopwords</span>(<span class="hljs-params">input_text</span>):</span>
    <span class="hljs-string">'''
    Function to remove English stopwords from a Pandas Series.

    Parameters:
        input_text : text to clean
    Output:
        cleaned Pandas Series 
    '''</span>
    stopwords_list = stopwords.words(<span class="hljs-string">'english'</span>)
    <span class="hljs-comment"># Some words which might indicate a certain sentiment are kept via a whitelist</span>
    whitelist = [<span class="hljs-string">"n't"</span>, <span class="hljs-string">"not"</span>, <span class="hljs-string">"no"</span>]
    words = input_text.split() 
    clean_words = [word <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> words <span class="hljs-keyword">if</span> (word <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> stopwords_list <span class="hljs-keyword">or</span> word <span class="hljs-keyword">in</span> whitelist) <span class="hljs-keyword">and</span> len(word) &gt; <span class="hljs-number">1</span>] 
    <span class="hljs-keyword">return</span> <span class="hljs-string">" "</span>.join(clean_words) 

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">remove_mentions</span>(<span class="hljs-params">input_text</span>):</span>
    <span class="hljs-string">'''
    Function to remove mentions, preceded by @, in a Pandas Series

    Parameters:
        input_text : text to clean
    Output:
        cleaned Pandas Series 
    '''</span>
    <span class="hljs-keyword">return</span> re.sub(<span class="hljs-string">r'@\w+'</span>, <span class="hljs-string">''</span>, input_text)
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">compare_models_by_metric</span>(<span class="hljs-params">model_1, model_2, model_hist_1, model_hist_2, metric</span>):</span>
    <span class="hljs-string">'''
    Function to compare a metric between two models 

    Parameters:
        model_hist_1 : training history of model 1
        model_hist_2 : training history of model 2
        metrix : metric to compare, loss, acc, val_loss or val_acc

    Output:
        plot of metrics of both models
    '''</span>
    metric_model_1 = model_hist_1.history[metric]
    metric_model_2 = model_hist_2.history[metric]
    e = range(<span class="hljs-number">1</span>, NB_START_EPOCHS + <span class="hljs-number">1</span>)

    metrics_dict = {
        <span class="hljs-string">'acc'</span> : <span class="hljs-string">'Training Accuracy'</span>,
        <span class="hljs-string">'loss'</span> : <span class="hljs-string">'Training Loss'</span>,
        <span class="hljs-string">'val_acc'</span> : <span class="hljs-string">'Validation accuracy'</span>,
        <span class="hljs-string">'val_loss'</span> : <span class="hljs-string">'Validation loss'</span>
    }

    metric_label = metrics_dict[metric]
    plt.plot(e, metric_model_1, <span class="hljs-string">'bo'</span>, label=model_1.name)
    plt.plot(e, metric_model_2, <span class="hljs-string">'b'</span>, label=model_2.name)
    plt.xlabel(<span class="hljs-string">'Epoch number'</span>)
    plt.ylabel(metric_label)
    plt.title(<span class="hljs-string">'Comparing '</span> + metric_label + <span class="hljs-string">' between models'</span>)
    plt.legend()
    plt.show()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">optimal_epoch</span>(<span class="hljs-params">model_hist</span>):</span>
    <span class="hljs-string">'''
    Function to return the epoch number where the validation loss is
    at its minimum

    Parameters:
        model_hist : training history of model
    Output:
        epoch number with minimum validation loss
    '''</span>
    min_epoch = np.argmin(model_hist.history[<span class="hljs-string">'val_loss'</span>]) + <span class="hljs-number">1</span>
    print(<span class="hljs-string">"Minimum validation loss reached in epoch {}"</span>.format(min_epoch))
    <span class="hljs-keyword">return</span> min_epoch
</code></pre>
<h1 id="heading-data-preparation">Data preparation</h1>
<h2 id="heading-data-cleaning">Data cleaning</h2>
<p>We load the CSV with the tweets and perform a random shuffle. It’s a good practice to shuffle the data before splitting between a train and test set. That way the sentiment classes are equally distributed over the train and test sets. We’ll only keep the <strong>text</strong> column as input and the <strong>airline_sentiment</strong> column as the target.</p>
<p>The next thing we’ll do is <strong>remove </strong>stopwords<strong>**. Stopwords do not have any value for predicting the sentiment. Furthermore, as we want to build a model that can be used for other airline companies as well, we </strong>remove the mentions**.</p>
<pre><code class="lang-python">df = pd.read_csv(input_path / <span class="hljs-string">'Tweets.csv'</span>)
df = df.reindex(np.random.permutation(df.index))  
df = df[[<span class="hljs-string">'text'</span>, <span class="hljs-string">'airline_sentiment'</span>]]
df.text = df.text.apply(remove_stopwords).apply(remove_mentions)
</code></pre>
<h2 id="heading-train-test-split">Train-Test split</h2>
<p>The evaluation of the model performance needs to be done on a separate test set. As such, we can estimate how well the model generalizes. This is done with the <strong>train_test_split</strong> method of scikit-learn.</p>
<pre><code>X_train, X_test, y_train, y_test = train_test_split(df.text, df.airline_sentiment, test_size=<span class="hljs-number">0.1</span>, random_state=<span class="hljs-number">37</span>)
</code></pre><h2 id="heading-converting-words-to-numbers">Converting words to numbers</h2>
<p>To use the text as input for a model, we first need to convert the words into tokens, which simply means converting the words into integers that refer to an index in a dictionary. Here we will only keep the most frequent words in the training set.</p>
<p>We clean up the text by applying <strong>filters</strong> and putting the words to <strong>lowercase</strong>. Words are separated by spaces.</p>
<pre><code class="lang-python">tk = Tokenizer(num_words=NB_WORDS,
               filters=<span class="hljs-string">'!"#$%&amp;()*+,-./:;&lt;=&gt;?@[\\]^_`{"}~\t\n'</span>,
               lower=<span class="hljs-literal">True</span>,
               char_level=<span class="hljs-literal">False</span>,
               split=<span class="hljs-string">' '</span>)
tk.fit_on_texts(X_train)
</code></pre>
<p>After having created the dictionary we can convert the text of a tweet to a vector with NB_WORDS values. With <strong>mode=binary</strong>, it contains an indicator whether the word appeared in the tweet or not. This is done with the <strong>texts_to_matrix</strong> method of the Tokenizer.</p>
<pre><code class="lang-python">X_train_oh = tk.texts_to_matrix(X_train, mode=<span class="hljs-string">'binary'</span>)
X_test_oh = tk.texts_to_matrix(X_test, mode=<span class="hljs-string">'binary'</span>)
</code></pre>
<h2 id="heading-converting-the-target-classes-to-numbers">Converting the target classes to numbers</h2>
<p>We need to convert the target classes to numbers as well, which in turn are one-hot-encoded with the <strong>to_categorical</strong> method in Keras.</p>
<pre><code class="lang-python">le = LabelEncoder()
y_train_le = le.fit_transform(y_train)
y_test_le = le.transform(y_test)
y_train_oh = to_categorical(y_train_le)
y_test_oh = to_categorical(y_test_le)
</code></pre>
<h2 id="heading-splitting-off-a-validation-set">Splitting off a validation set</h2>
<p>Now that our data is ready, we split off a validation set. This validation set will be used to evaluate the model performance when we tune the parameters of the model.</p>
<pre><code class="lang-python">X_train_rest, X_valid, y_train_rest, y_valid = train_test_split(X_train_oh, y_train_oh, test_size=<span class="hljs-number">0.1</span>, random_state=<span class="hljs-number">37</span>)
</code></pre>
<h1 id="heading-deep-learning">Deep learning</h1>
<h2 id="heading-creating-a-model-that-overfits">Creating a model that overfits</h2>
<p>We start with a model that overfits. It has 2 densely connected layers of 64 elements. The <strong>input_shape</strong> for the first layer is equal to the number of words we kept in the dictionary and for which we created one-hot-encoded features.</p>
<p>As we need to predict 3 different sentiment classes, the last layer has 3 elements. The <strong>softmax</strong> activation function makes sure the three probabilities sum up to 1.</p>
<p>The number of parameters to train is computed as <strong>(nb inputs x nb elements in hidden layer) + nb bias terms</strong>. The number of inputs for the first layer equals the number of words in our corpus. The subsequent layers have the number of outputs of the previous layer as inputs. So the number of parameters per layer are:</p>
<ul>
<li>First layer : (10000 x 64) + 64 = 640064</li>
<li>Second layer : (64 x 64) + 64 = 4160</li>
<li>Last layer : (64 x 3) + 3 = 195</li>
</ul>
<pre><code class="lang-python">base_model = models.Sequential()
base_model.add(layers.Dense(<span class="hljs-number">64</span>, activation=<span class="hljs-string">'relu'</span>, input_shape=(NB_WORDS,)))
base_model.add(layers.Dense(<span class="hljs-number">64</span>, activation=<span class="hljs-string">'relu'</span>))
base_model.add(layers.Dense(<span class="hljs-number">3</span>, activation=<span class="hljs-string">'softmax'</span>))
base_model.name = <span class="hljs-string">'Baseline model'</span>
</code></pre>
<p>Because this project is a multi-class, single-label prediction, we use <strong>categorical_crossentropy</strong> as the loss function and <strong>softmax</strong> as the final activation function. We fit the model on the train data and validate on the validation set. We run for a predetermined number of epochs and will see when the model starts to overfit.</p>
<pre><code class="lang-python">base_history = deep_model(base_model, X_train_rest, y_train_rest, X_valid, y_valid)
base_min = optimal_epoch(base_history)
eval_metric(base_model, base_history, <span class="hljs-string">'loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_ZwKhGQkYF3FqQlhe.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>In the beginning, the <strong>validation loss</strong> goes down. But at epoch 3 this stops and the validation loss starts increasing rapidly. This is when the models begin to overfit.</p>
<p>The <strong>training loss</strong> continues to go down and almost reaches zero at epoch 20. This is normal as the model is trained to fit the train data as well as possible.</p>
<h1 id="heading-handling-overfitting">Handling overfitting</h1>
<p>Now, we can try to do something about the overfitting. There are different options to do that.</p>
<ul>
<li><strong>Reduce the network’s capacity</strong> by removing layers or reducing the number of elements in the hidden layers</li>
<li>Apply <strong>regularization</strong>, which comes down to adding a cost to the loss function for large weights</li>
<li>Use <strong>Dropout layers</strong>, which will randomly remove certain features by setting them to zero</li>
</ul>
<h2 id="heading-reducing-the-networks-capacity">Reducing the network’s capacity</h2>
<p>Our first model has a large number of trainable parameters. The higher this number, the easier the model can memorize the target class for each training sample. Obviously, this is not ideal for generalizing on new data.</p>
<p>By lowering the capacity of the network, you force it to learn the patterns that matter or that minimize the loss. On the other hand, reducing the network’s capacity too much will lead to <strong>underfitting</strong>. The model will not be able to learn the relevant patterns in the train data.</p>
<p>We reduce the network’s capacity by removing one hidden layer and lowering the number of elements in the remaining layer to 16.</p>
<pre><code class="lang-python">reduced_model = models.Sequential()
reduced_model.add(layers.Dense(<span class="hljs-number">16</span>, activation=<span class="hljs-string">'relu'</span>, input_shape=(NB_WORDS,)))
reduced_model.add(layers.Dense(<span class="hljs-number">3</span>, activation=<span class="hljs-string">'softmax'</span>))
reduced_model.name = <span class="hljs-string">'Reduced model'</span>
reduced_history = deep_model(reduced_model, X_train_rest, y_train_rest, X_valid, y_valid)
reduced_min = optimal_epoch(reduced_history)
eval_metric(reduced_model, reduced_history, <span class="hljs-string">'loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_ZDi9EJn6dORo4LCY.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>We can see that it takes more epochs before the reduced model starts overfitting. The validation loss also goes up slower than our first model.</p>
<pre><code class="lang-python">compare_models_by_metric(base_model, reduced_model, base_history, reduced_history, <span class="hljs-string">'val_loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_W8RSZtaBR-SDIGn5.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>When we compare the validation loss of the baseline model, it is clear that the reduced model starts overfitting at a later epoch. The validation loss stays lower much longer than the baseline model.</p>
<h2 id="heading-applying-regularization">Applying regularization</h2>
<p>To address overfitting, we can apply weight regularization to the model. This will add a cost to the loss function of the network for large weights (or parameter values). As a result, you get a simpler model that will be forced to learn only the relevant patterns in the train data.</p>
<p>There are <strong>L1 regularization and L2 regularization</strong>.</p>
<ul>
<li>L1 regularization will add a cost with regards to the <strong>absolute value of the parameters</strong>. It will result in some of the weights to be equal to zero.</li>
<li>L2 regularization will add a cost with regards to the <strong>squared value of the parameters</strong>. This results in smaller weights.</li>
</ul>
<p>Let’s try with L2 regularization.</p>
<pre><code class="lang-python">reg_model = models.Sequential()
reg_model.add(layers.Dense(<span class="hljs-number">64</span>, kernel_regularizer=regularizers.l2(<span class="hljs-number">0.001</span>), activation=<span class="hljs-string">'relu'</span>, input_shape=(NB_WORDS,)))
reg_model.add(layers.Dense(<span class="hljs-number">64</span>, kernel_regularizer=regularizers.l2(<span class="hljs-number">0.001</span>), activation=<span class="hljs-string">'relu'</span>))
reg_model.add(layers.Dense(<span class="hljs-number">3</span>, activation=<span class="hljs-string">'softmax'</span>))
reg_model.name = <span class="hljs-string">'L2 Regularization model'</span>
reg_history = deep_model(reg_model, X_train_rest, y_train_rest, X_valid, y_valid)
reg_min = optimal_epoch(reg_history)
</code></pre>
<p>For the regularized model we notice that it starts overfitting in the same epoch as the baseline model. However, the loss increases much slower afterward.</p>
<pre><code class="lang-python">eval_metric(reg_model, reg_history, <span class="hljs-string">'loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_4UV4tgegrn6UOdRX.png" alt="Image" width="600" height="400" loading="lazy"></p>
<pre><code class="lang-python">compare_models_by_metric(base_model, reg_model, base_history, reg_history, <span class="hljs-string">'val_loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_5ybMcTqkOXyC7xc4.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-adding-dropout-layers">Adding dropout layers</h2>
<p>The last option we’ll try is to add dropout layers. A dropout layer will randomly set output features of a layer to zero.</p>
<pre><code class="lang-python">drop_model = models.Sequential()
drop_model.add(layers.Dense(<span class="hljs-number">64</span>, activation=<span class="hljs-string">'relu'</span>, input_shape=(NB_WORDS,)))
drop_model.add(layers.Dropout(<span class="hljs-number">0.5</span>))
drop_model.add(layers.Dense(<span class="hljs-number">64</span>, activation=<span class="hljs-string">'relu'</span>))
drop_model.add(layers.Dropout(<span class="hljs-number">0.5</span>))
drop_model.add(layers.Dense(<span class="hljs-number">3</span>, activation=<span class="hljs-string">'softmax'</span>))
drop_model.name = <span class="hljs-string">'Dropout layers model'</span>
drop_history = deep_model(drop_model, X_train_rest, y_train_rest, X_valid, y_valid)
drop_min = optimal_epoch(drop_history)
eval_metric(drop_model, drop_history, <span class="hljs-string">'loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_B0SuqLpCTYGP4Bwz.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The model with dropout layers starts overfitting later than the baseline model. The loss also increases slower than the baseline model.</p>
<pre><code class="lang-python">compare_models_by_metric(base_model, drop_model, base_history, drop_history, <span class="hljs-string">'val_loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_ykk8IT9v3wgsq1Wf.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The model with the dropout layers starts overfitting later. Compared to the baseline model the loss also remains much lower.</p>
<h1 id="heading-training-on-the-full-train-data-and-evaluation-on-test-data">Training on the full train data and evaluation on test data</h1>
<p>At first sight, the reduced model seems to be the best model for generalization. But let’s check that on the test set.</p>
<pre><code class="lang-python">base_results = test_model(base_model, X_train_oh, y_train_oh, X_test_oh, y_test_oh, base_min)
reduced_results = test_model(reduced_model, X_train_oh, y_train_oh, X_test_oh, y_test_oh, reduced_min)
reg_results = test_model(reg_model, X_train_oh, y_train_oh, X_test_oh, y_test_oh, reg_min)
drop_results = test_model(drop_model, X_train_oh, y_train_oh, X_test_oh, y_test_oh, drop_min)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/1_faz4dZBCsh3yB6tAZzSXkg.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>As shown above, all three options help to reduce overfitting. We manage to increase the accuracy on the test data substantially. Among these three options, the model with the dropout layers performs the best on the test data.</p>
<p>You can find the notebook on <a target="_blank" href="https://github.com/bertcarremans/TwitterUSAirlineSentiment/blob/master/source/Handling%20overfitting%20in%20deep%20learning%20models.ipynb">GitHub</a>. Have fun with it!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ A Deep Dive into Word Embeddings for Sentiment Analysis ]]>
                </title>
                <description>
                    <![CDATA[ By Bert Carremans When applying one-hot encoding to words, we end up with sparse (containing many zeros) vectors of high dimensionality. On large data sets, this could cause performance issues.  Additionally, one-hot encoding does not take into accou... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/word-embeddings-for-sentiment-analysis/</link>
                <guid isPermaLink="false">66d45de6c7632f8bfbf1e411</guid>
                
                    <category>
                        <![CDATA[ keras ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Sentiment analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ text mining ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Sun, 05 Jan 2020 14:27:33 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2020/01/1_u9pwb9JShvDIU7j1G9iszQ.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Bert Carremans</p>
<p>When applying one-hot encoding to words, we end up with sparse (containing many zeros) vectors of high dimensionality. On large data sets, this could cause performance issues. </p>
<p>Additionally, one-hot encoding does not take into account the semantics of the words. So words like <em>airplane</em> and <em>aircraft</em> are considered to be two different features while we know that they have a very similar meaning. Word embeddings address these two issues.</p>
<p>Word embeddings are dense vectors with much lower dimensionality. Secondly, the semantic relationships between words are reflected in the distance and direction of the vectors.</p>
<p>We will work with the <a target="_blank" href="https://www.kaggle.com/crowdflower/twitter-airline-sentiment">TwitterAirlineSentiment data set on Kaggle</a>. This data set contains roughly 15K tweets with 3 possible classes for the sentiment (positive, negative and neutral). In my previous post, we tried to <a target="_blank" href="https://www.freecodecamp.org/news/sentiment-analysis-with-text-mining/">classify the tweets</a> by tokenizing the words and applying two classifiers. Let’s see if word embeddings can outperform that.</p>
<p>After reading this tutorial you will know how to compute task-specific word embeddings with the Embedding layer of <strong>Keras</strong>. Secondly, we will investigate whether word embeddings trained on a larger corpus can improve the accuracy of our model.</p>
<p>The structure of this tutorial is:</p>
<ul>
<li>Intuition behind word embeddings</li>
<li>Project set-up</li>
<li>Data preparation</li>
<li>Keras and its Embedding layer</li>
<li>Pre-trained word embeddings — GloVe</li>
<li>Training word embeddings with more dimensions</li>
</ul>
<h1 id="heading-intuition-behind-word-embeddings">Intuition behind word embeddings</h1>
<p>Before we can use words in a classifier, we need to convert them into numbers. One way to do that is to simply map words to integers. Another way is to one-hot encode words. Each tweet could then be represented as a vector with a dimension equal to (a limited set of) the words in the corpus. The words occurring in the tweet have a value of 1 in the vector. All other vector values equal zero.</p>
<p>Word embeddings are computed differently. Each word is positioned into a <strong><em>multi-dimensional space</em></strong>. The number of dimensions in this space is chosen by the data scientist. You can experiment with different dimensions and see what provides the best result.</p>
<p>The <strong><em>vector values for a word represent its position</em></strong> in this embedding space. Synonyms are found close to each other while words with opposite meanings have a large distance between them. You can also apply mathematical operations on the vectors which should produce semantically correct results. A typical example is that the sum of the word embeddings of <em>king</em> and <em>female</em> produces the word embedding of <em>queen</em>.</p>
<h1 id="heading-project-set-up">Project set-up</h1>
<p>Let’s start by importing all packages for this project.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> re
<span class="hljs-keyword">import</span> collections
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">from</span> pathlib <span class="hljs-keyword">import</span> Path
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
<span class="hljs-keyword">from</span> nltk.corpus <span class="hljs-keyword">import</span> stopwords
<span class="hljs-keyword">from</span> keras.preprocessing.text <span class="hljs-keyword">import</span> Tokenizer
<span class="hljs-keyword">from</span> keras.preprocessing.sequence <span class="hljs-keyword">import</span> pad_sequences
<span class="hljs-keyword">from</span> keras.utils.np_utils <span class="hljs-keyword">import</span> to_categorical
<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> LabelEncoder
<span class="hljs-keyword">from</span> keras <span class="hljs-keyword">import</span> models
<span class="hljs-keyword">from</span> keras <span class="hljs-keyword">import</span> layers
</code></pre>
<p>We define some parameters and paths used throughout the project. Most of them are self-explanatory. But others will be explained further in the code.</p>
<pre><code class="lang-python">NB_WORDS = <span class="hljs-number">10000</span>  <span class="hljs-comment"># Parameter indicating the number of words we'll put in the dictionary</span>
VAL_SIZE = <span class="hljs-number">1000</span>  <span class="hljs-comment"># Size of the validation set</span>
NB_START_EPOCHS = <span class="hljs-number">10</span>  <span class="hljs-comment"># Number of epochs we usually start to train with</span>
BATCH_SIZE = <span class="hljs-number">512</span>  <span class="hljs-comment"># Size of the batches used in the mini-batch gradient descent</span>
MAX_LEN = <span class="hljs-number">24</span>  <span class="hljs-comment"># Maximum number of words in a sequence</span>
GLOVE_DIM = <span class="hljs-number">100</span>  <span class="hljs-comment"># Number of dimensions of the GloVe word embeddings</span>
root = Path(<span class="hljs-string">'../'</span>)
input_path = root / <span class="hljs-string">'input/'</span>
ouput_path = root / <span class="hljs-string">'output/'</span>
source_path = root / <span class="hljs-string">'source/'</span>
</code></pre>
<p>Throughout this code, we will also use some helper functions for data preparation, modeling and visualization. These function definitions are not shown here to keep the blog post clutter free. You can always refer to the <a target="_blank" href="https://github.com/bertcarremans/TwitterUSAirlineSentiment/blob/master/source/Using%20Word%20Embeddings%20for%20Sentiment%20Analysis.ipynb">notebook in Github</a> to look at the code.</p>
<h1 id="heading-data-preparation">Data preparation</h1>
<h2 id="heading-reading-the-data-and-cleaning">Reading the data and cleaning</h2>
<p>We read in the CSV file with the tweets and apply a random shuffle on its indexes. After that, we remove stop words and @ mentions. A test set of 10% is split off to evaluate the model on new data.</p>
<pre><code class="lang-python">df = pd.read_csv(input_path / <span class="hljs-string">'Tweets.csv'</span>)
df = df.reindex(np.random.permutation(df.index))
df = df[[<span class="hljs-string">'text'</span>, <span class="hljs-string">'airline_sentiment'</span>]]
df.text = df.text.apply(remove_stopwords).apply(remove_mentions)
X_train, X_test, y_train, y_test = train_test_split(df.text, df.airline_sentiment, test_size=<span class="hljs-number">0.1</span>, random_state=<span class="hljs-number">37</span>)
</code></pre>
<h2 id="heading-convert-words-into-integers">Convert words into integers</h2>
<p>With the <strong><em>Tokenizer</em></strong> from Keras, we convert the tweets into sequences of integers. We limit the number of words to the <strong>_NB<em>WORDS</em></strong> most frequent words. Additionally, the tweets are cleaned with some filters, set to lowercase and split on spaces.</p>
<pre><code class="lang-python">tk = Tokenizer(num_words=NB_WORDS,
filters=<span class="hljs-string">'!"#$%&amp;()*+,-./:;&lt;=&gt;?@[\]^_`{"}~\t\n'</span>,lower=<span class="hljs-literal">True</span>, split=<span class="hljs-string">" "</span>)
tk.fit_on_texts(X_train)
X_train_seq = tk.texts_to_sequences(X_train)
X_test_seq = tk.texts_to_sequences(X_test)
</code></pre>
<h2 id="heading-equal-length-of-sequences">Equal length of sequences</h2>
<p>Each batch needs to provide sequences of equal length. We achieve this with the <strong>_pad<em>sequences</em></strong> method. By specifying <strong><em>maxlen</em></strong>, the sequences or padded with zeros or truncated.</p>
<pre><code class="lang-python">X_train_seq_trunc = pad_sequences(X_train_seq, maxlen=MAX_LEN)
X_test_seq_trunc = pad_sequences(X_test_seq, maxlen=MAX_LEN)
</code></pre>
<h2 id="heading-encoding-the-target-variable">Encoding the target variable</h2>
<p>The target classes are strings which need to be converted into numeric vectors. This is done with the <strong><em>LabelEncoder</em></strong> from Sklearn and the <strong>_to<em>categorical</em></strong> method from Keras.</p>
<pre><code class="lang-python">le = LabelEncoder()
y_train_le = le.fit_transform(y_train)
y_test_le = le.transform(y_test)
y_train_oh = to_categorical(y_train_le)
y_test_oh = to_categorical(y_test_le)
</code></pre>
<h2 id="heading-splitting-off-the-validation-set">Splitting off the validation set</h2>
<p>From the training data, we split off a validation set of 10% to use during training.</p>
<pre><code class="lang-python">X_train_emb, X_valid_emb, y_train_emb, y_valid_emb = train_test_split(X_train_seq_trunc, y_train_oh, test_size=<span class="hljs-number">0.1</span>, random_state=<span class="hljs-number">37</span>)
</code></pre>
<h1 id="heading-modeling">Modeling</h1>
<h2 id="heading-keras-and-the-embedding-layer">Keras and the Embedding layer</h2>
<p>Keras provides a convenient way to convert each word into a multi-dimensional vector. This can be done with the <strong><em>Embedding</em></strong> layer. It will compute the word embeddings (or use pre-trained embeddings) and look up each word in a dictionary to find its vector representation. Here we will train word embeddings with 8 dimensions.</p>
<pre><code class="lang-python">emb_model = models.Sequential()
emb_model.add(layers.Embedding(NB_WORDS, <span class="hljs-number">8</span>, input_length=MAX_LEN))
emb_model.add(layers.Flatten())
emb_model.add(layers.Dense(<span class="hljs-number">3</span>, activation=<span class="hljs-string">'softmax'</span>))
emb_history = deep_model(emb_model, X_train_emb, y_train_emb, X_valid_emb, y_valid_emb)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_-XjJ4DTQ5RQ8jZOF.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>We have a validation accuracy of about 74%. The number of words in the tweets is rather low, so this result is quite good. By comparing the training and validation loss, we see that the model starts <strong>overfitting</strong> from epoch 6.</p>
<p>In a previous article, I discussed how we can <a target="_blank" href="https://www.freecodecamp.org/news/handling-overfitting-in-deep-learning-models/">avoid overfitting</a>. You might want to read that if you want to deep dive on that topic.</p>
<p>When we train the model on all data (including the validation data, but excluding the test data) and set the number of epochs to 6, we get a test accuracy of 78%. This test result is OK, but let’s see if we can improve with pre-trained word embeddings.</p>
<pre><code class="lang-python">emb_results = test_model(emb_model, X_train_seq_trunc, y_train_oh, X_test_seq_trunc, y_test_oh, <span class="hljs-number">6</span>)
print(<span class="hljs-string">'/n'</span>)
print(<span class="hljs-string">'Test accuracy of word embeddings model: {0:.2f}%'</span>.format(emb_results[<span class="hljs-number">1</span>]*<span class="hljs-number">100</span>))
</code></pre>
<h2 id="heading-pre-trained-word-embeddings-glove">Pre-trained word embeddings — Glove</h2>
<p>Because the training data is not so large, the model might not be able to learn good embeddings for the sentiment analysis. Alternatively, we can load pre-trained word embeddings built on a much larger training data.</p>
<p>The <a target="_blank" href="https://nlp.stanford.edu/projects/glove/">GloVe database</a> contains multiple pre-trained word embeddings, and more specific <strong><em>embeddings trained on tweets</em></strong>. So this might be useful for the task at hand.</p>
<p>First, we put the word embeddings in a dictionary where the keys are the words and the values the word embeddings.</p>
<pre><code class="lang-python">glove_file = <span class="hljs-string">'glove.twitter.27B.'</span> + str(GLOVE_DIM) + <span class="hljs-string">'d.txt'</span>
emb_dict = {}
glove = open(input_path / glove_file)
<span class="hljs-keyword">for</span> line <span class="hljs-keyword">in</span> glove:
    values = line.split()
    word = values[<span class="hljs-number">0</span>]
    vector = np.asarray(values[<span class="hljs-number">1</span>:], dtype=<span class="hljs-string">'float32'</span>)
    emb_dict[word] = vector
glove.close()
</code></pre>
<p>With the GloVe embeddings loaded in a dictionary, we can look up the embedding for each word in the corpus of the airline tweets. These will be stored in a matrix with a shape of <strong>_NB<em>WORDS</em></strong> and <strong>_GLOVE<em>DIM</em></strong>. If a word is not found in the GloVe dictionary, the word embedding values for the word are zero.</p>
<pre><code class="lang-python">emb_matrix = np.zeros((NB_WORDS, GLOVE_DIM))
<span class="hljs-keyword">for</span> w, i <span class="hljs-keyword">in</span> tk.word_index.items():
    <span class="hljs-keyword">if</span> i &lt; NB_WORDS:
        vect = emb_dict.get(w)
        <span class="hljs-keyword">if</span> vect <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>:
        emb_matrix[i] = vect
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">break</span>
</code></pre>
<p>Then we specify the model just like we did with the model above.</p>
<pre><code class="lang-python">glove_model = models.Sequential()
glove_model.add(layers.Embedding(NB_WORDS, GLOVE_DIM, input_length=MAX_LEN))
glove_model.add(layers.Flatten())
glove_model.add(layers.Dense(<span class="hljs-number">3</span>, activation=<span class="hljs-string">'softmax'</span>))
</code></pre>
<p>In the Embedding layer (which is layer 0 here) we <strong><em>set the weights</em></strong> for the words to those found in the GloVe word embeddings. By setting <strong><em>trainable</em></strong> to False we make sure that the GloVe word embeddings cannot be changed. After that, we run the model.</p>
<pre><code class="lang-python">glove_model.layers[<span class="hljs-number">0</span>].set_weights([emb_matrix])
glove_model.layers[<span class="hljs-number">0</span>].trainable = <span class="hljs-literal">False</span>
glove_history = deep_model(glove_model, X_train_emb, y_train_emb, X_valid_emb, y_valid_emb)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_uhsGcl8UG_JYUycb.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The model overfits fast after 3 epochs. Furthermore, the validation accuracy is lower compared to the embeddings trained on the training data.</p>
<pre><code class="lang-python">glove_results = test_model(glove_model, X_train_seq_trunc, y_train_oh, X_test_seq_trunc, y_test_oh, <span class="hljs-number">3</span>)
print(<span class="hljs-string">'/n'</span>)
print(<span class="hljs-string">'Test accuracy of word glove model: {0:.2f}%'</span>.format(glove_results[<span class="hljs-number">1</span>]*<span class="hljs-number">100</span>))
</code></pre>
<p>As a final exercise, let’s see what results we get when we train the embeddings with the same number of dimensions as the GloVe data.</p>
<h2 id="heading-training-word-embeddings-with-more-dimensions">Training word embeddings with more dimensions</h2>
<p>We will train the word embeddings with the same number of dimensions as the GloVe embeddings (i.e. GLOVE_DIM).</p>
<pre><code class="lang-python">emb_model2 = models.Sequential()
emb_model2.add(layers.Embedding(NB_WORDS, GLOVE_DIM, input_length=MAX_LEN))
emb_model2.add(layers.Flatten())
emb_model2.add(layers.Dense(<span class="hljs-number">3</span>, activation=<span class="hljs-string">'softmax'</span>))
emb_history2 = deep_model(emb_model2, X_train_emb, y_train_emb, X_valid_emb, y_valid_emb)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_boJxTu7msbxWzexm.png" alt="Image" width="600" height="400" loading="lazy"></p>
<pre><code class="lang-python">emb_results2 = test_model(emb_model2, X_train_seq_trunc, y_train_oh, X_test_seq_trunc, y_test_oh, <span class="hljs-number">3</span>)
print(<span class="hljs-string">'/n'</span>)
print(<span class="hljs-string">'Test accuracy of word embedding model 2: {0:.2f}%'</span>.format(emb_results2[<span class="hljs-number">1</span>]*<span class="hljs-number">100</span>))
</code></pre>
<p>On the test data we get good results, but we do not outperform the LogisticRegression with the CountVectorizer. So there is still room for improvement.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>The best result is achieved with 100-dimensional word embeddings that are trained on the available data. This even outperforms the use of word embeddings that were trained on a much larger Twitter corpus.</p>
<p>Until now we have just put a Dense layer on the flattened embeddings. By doing this, <strong><em>we do not take into account the relationships between the words</em></strong> in the tweet. This can be achieved with a recurrent neural network or a 1D convolutional network. But that’s something for a future post :)</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to classify butterflies with deep learning in Keras ]]>
                </title>
                <description>
                    <![CDATA[ By Bert Carremans A while ago I read an interesting blog post on the website of the Dutch organization Vlinderstichting. Every year they organize a count of butterflies. Volunteers help in determining the different butterfly species in their garden. ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/classify-butterfly-images-deep-learning-keras/</link>
                <guid isPermaLink="false">66d45dd5a326133d124409d7</guid>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image classification ]]>
                    </category>
                
                    <category>
                        <![CDATA[ keras ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Thu, 08 Aug 2019 18:59:32 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2019/08/1_K4agkAxY1R6zPzK8s_CqbQ-1.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Bert Carremans</p>
<p>A while ago I read an interesting blog post on the website of the Dutch organization <a target="_blank" href="https://www.vlinderstichting.nl/actueel/nieuws/nieuwsbericht/?bericht=1492">Vlinderstichting</a>. Every year they organize a count of butterflies. Volunteers help in determining the different butterfly species in their garden. The Vlinderstichting gathers and analyses the results.</p>
<p>As the determination of the butterfly species is done by the volunteers, inevitably this process is prone to errors. As a result, the Vlinderstichting has to manually check  the submissions, which is time-consuming.</p>
<p>Specifically, there are three butterflies for which the Vlinderstichting receives many wrong determinations. These are</p>
<ul>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/Meadow_brown">Meadow brown</a> or Maniola jurtina</li>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/Gatekeeper_(butterfly)">Gatekeeper</a> or Pyronia tithonus</li>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/Small_heath_(butterfly)">Small heath</a> or Coenonympha pamphilus</li>
</ul>
<p>In this article, I will describe the steps to fit a deep learning model that helps to make the distinction between the first two butterflies.</p>
<h1 id="heading-downloading-images-with-the-flickr-api">Downloading images with the Flickr API</h1>
<p>To train a convolutional neural network I need to find images of butterflies with the correct label. Surely I could take pictures myself of the butterflies that I want to classify. They sometimes fly around in my garden…</p>
<p>Just kidding, that would take ages. For this, I need an automated way to get the images. To do that I use the Flickr API via Python.</p>
<h2 id="heading-setting-up-the-flickr-api">Setting up the Flickr API</h2>
<p>Firstly, I install the <a target="_blank" href="https://pypi.python.org/pypi/flickrapi/2.3">flickrapi package</a> with pip. Then I create the necessary <a target="_blank" href="https://www.flickr.com/services/api/misc.api_keys.html">API keys on the Flickr website</a> to connect to the Flickr API.</p>
<p>Besides the flickrapi package, I import the os and urllib packages for downloading the images and setting up the directories.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> flickrapi <span class="hljs-keyword">import</span> FlickrAPI
<span class="hljs-keyword">import</span> urllib
<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> config
</code></pre>
<p>In the config module, I define the public and secret keys for the Flickr API. So this is simply a Python script (config.py) with the code below:</p>
<pre><code class="lang-python">API_KEY = <span class="hljs-string">'XXXXXXXXXXXXXXXXX'</span>  // replace <span class="hljs-keyword">with</span> your key
API_SECRET = <span class="hljs-string">'XXXXXXXXXXXXXXXXX'</span>  // replace <span class="hljs-keyword">with</span> your secret
IMG_FOLDER = <span class="hljs-string">'XXXXXXXXXXXXXXXXX'</span>  // replace <span class="hljs-keyword">with</span> your folder to store the images
</code></pre>
<p>I keep these keys in a separate file for security reasons. As a result, you can save the code in a public repository like GitHub or BitBucket and putting the config.py in .gitignore. Consequently, you can share your code with others while not having to worry about someone having access to your credentials.</p>
<p>To extract images of different butterfly species, I wrote a function download_flickr_photos. I will explain this function step by step. In addition, I’ve made the full code available on <a target="_blank" href="https://github.com/bertcarremans/Vlindervinder/tree/master/flickr">GitHub</a>.</p>
<h2 id="heading-input-parameters">Input parameters</h2>
<p>First of all, I check if the input parameters are of the correct type or values. If not, I raise an error. The explanation of the parameters can be found in the docstring of the function.</p>
<pre><code class="lang-python"><span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> (isinstance(keywords, str) <span class="hljs-keyword">or</span> isinstance(keywords, list)):
    <span class="hljs-keyword">raise</span> AttributeError(<span class="hljs-string">'keywords must be a string or a list of strings'</span>)
<span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> (size <span class="hljs-keyword">in</span> [<span class="hljs-string">'thumbnail'</span>, <span class="hljs-string">'square'</span>, <span class="hljs-string">'medium'</span>, <span class="hljs-string">'original'</span>]):
    <span class="hljs-keyword">raise</span> AttributeError(<span class="hljs-string">'size must be "thumbnail", "square", "medium" or "original"'</span>)
<span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> (max_nb_img == <span class="hljs-number">-1</span> <span class="hljs-keyword">or</span> (max_nb_img &gt; <span class="hljs-number">0</span> <span class="hljs-keyword">and</span> isinstance(max_nb_img, int))):
    <span class="hljs-keyword">raise</span> AttributeError(<span class="hljs-string">'max_nb_img must be an integer greater than zero or equal to -1'</span>)
</code></pre>
<p>Secondly, I define some of the parameters that will be used in the walk method later on. I create a list for the keywords and determine from which URL the images need to be downloaded.</p>
<pre><code class="lang-python"><span class="hljs-keyword">if</span> isinstance(keywords, str):
    keywords_list = []
    keywords_list.append(keywords)
<span class="hljs-keyword">else</span>:
    keywords_list = keywords
<span class="hljs-keyword">if</span> size == <span class="hljs-string">'thumbnail'</span>:
    size_url = <span class="hljs-string">'url_t'</span>
<span class="hljs-keyword">elif</span> size == <span class="hljs-string">'square'</span>:
    size_url = <span class="hljs-string">'url_q'</span>
<span class="hljs-keyword">elif</span> size == <span class="hljs-string">'medium'</span>:
    size_url = <span class="hljs-string">'url_c'</span>
<span class="hljs-keyword">elif</span> size == <span class="hljs-string">'original'</span>:
    size_url = <span class="hljs-string">'url_o'</span>
</code></pre>
<h2 id="heading-connecting-to-the-flickr-api">Connecting to the Flickr API</h2>
<p>Next, I connect to the Flickr API. In the FlickrAPI call I use the API keys defined in the config module.</p>
<pre><code class="lang-python">flickr = FlickrAPI(config.API_KEY, config.API_SECRET)
</code></pre>
<h2 id="heading-creating-subfolders-per-butterfly-species">Creating subfolders per butterfly species</h2>
<p>I save the images of each butterfly species in a separate subfolder. The name of each subfolder is the butterfly species’ name, given by the keyword. If the subfolder does not exist yet, I create it.</p>
<pre><code class="lang-python">results_folder = config.IMG_FOLDER + keyword.replace(<span class="hljs-string">" "</span>, <span class="hljs-string">"_"</span>) + <span class="hljs-string">"/"</span>
<span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> os.path.exists(results_folder):
    os.makedirs(results_folder)
</code></pre>
<h2 id="heading-walking-around-in-the-flickr-library">Walking around in the Flickr library</h2>
<pre><code class="lang-python">photos = flickr.walk(
    text=keyword,
    extras=<span class="hljs-string">'url_m'</span>,
    license=<span class="hljs-string">'1,2,4,5'</span>,
    per_page=<span class="hljs-number">50</span>)
</code></pre>
<p>I use the walk method of the Flickr API to search for images for the specified keyword. This walk method has the same parameters as the <a target="_blank" href="http://www.flickr.com/services/api/flickr.photos.search.html">search method</a> in the Flickr API.</p>
<p>In the text parameter<strong><em>,</em></strong> I use the keyword to search for images related to this keyword. Secondly, in the extras parameter<strong><em>,</em></strong> I specify url_m for a small, medium size of the images. More explanation on the image sizes and their respective URL is given in this <a target="_blank" href="http://librdf.org/flickcurl/api/flickcurl-searching-search-extras.html">Flickcurl C library</a>.</p>
<p>Thirdly, in the license parameter<strong><em>,</em></strong> I select images with a non-commercial license. More on the license codes and their meaning can be found on the Flickr <a target="_blank" href="https://www.flickr.com/services/api/flickr.photos.licenses.getInfo.html">API platform</a>. Finally, the per_page parameter specifies how many images I allow per page.</p>
<p>As a result, I have a generator called photos to download the images.</p>
<h2 id="heading-downloading-flickr-images">Downloading Flickr images</h2>
<p>With the photos generator, I can download all the images found for the search query. First I get the specific URL at which I will download the image. Then I increment the count variable and use this counter to create the image filenames.</p>
<p>With the urlretrieve method, I download the image and save it in the folder for the butterfly species. If an error occurs I print out the error message.</p>
<pre><code class="lang-python"><span class="hljs-keyword">for</span> photo <span class="hljs-keyword">in</span> photos:
    <span class="hljs-keyword">try</span>:
        url=photo.get(<span class="hljs-string">'url_m'</span>)
        print(url)
        count += <span class="hljs-number">1</span>
        urllib.request.urlretrieve(url,  results_folder + str(count) +<span class="hljs-string">".jpg"</span>)
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        print(e, <span class="hljs-string">'Download failure'</span>)
</code></pre>
<p>To download multiple butterfly species, I create a list and call the function download_flickr_photos in a for loop. For simplicity, I only download two butterfly species of the three mentioned above.</p>
<pre><code class="lang-python">butterflies = [<span class="hljs-string">'meadow brown butterfly'</span>, <span class="hljs-string">'gatekeeper butterfly'</span>]
<span class="hljs-keyword">for</span> butterfly <span class="hljs-keyword">in</span> butterflies:
    download_flickr_photos(butterfly)
</code></pre>
<h1 id="heading-data-augmentation-of-images">Data augmentation of images</h1>
<p>Training a convnet on a small number of images will result in overfitting. Consequently, the model will make errors in classifying new, unseen images. Data augmentation can help to avoid this. Luckily Keras has some nice tools to transform images easily.</p>
<p>I’d like to compare it with how my son classifies cars on the road. At the moment he’s only 2 years old and hasn’t seen as many cars as an adult. So you could say his training set of images is rather small. Therefore he’s more likely to misclassify cars. For instance, he sometimes takes an ambulance mistakenly for a police van.</p>
<p>As he will grow older, he will see more ambulances and police vans, with the corresponding label that I will give him. So his training set will become larger and thus he will classify them more correctly.</p>
<p>For that reason, we need to provide the convnet with more butterfly images than we have at the moment. An easy solution for that is <em>data augmentation</em>. In short, this means applying a set of transformations to the Flickr images.</p>
<p>Keras provides a <a target="_blank" href="https://keras.io/preprocessing/image/">wide range of image transformations</a>. But first, we’ll have to convert the images so that Keras can work with them.</p>
<h2 id="heading-converting-an-image-to-numbers">Converting an image to numbers</h2>
<p>We start by importing the Keras module. We will demonstrate the image transformations with one example image. For that purpose, we use the load_img method.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> keras.preprocessing.image <span class="hljs-keyword">import</span> ImageDataGenerator, array_to_img, img_to_array, load_img
i = load_img(<span class="hljs-string">'data/train/maniola_jurtina/1.jpg'</span> )
x = img_to_array(i)
x = x.reshape((<span class="hljs-number">1</span>,) + x.shape)
</code></pre>
<p>The load_img method creates a Python Image Library file. We’ll need to convert this to a Numpy array to use it in the ImageDataGenerator method later on. That’s done with the handy img_to_array method. As a result, we have an array of shape 75x75x3. These dimensions reflect the width, height and RGB values.</p>
<p>In fact, each pixel of the image has 3 RGB values. These range between 0 and 255 and represent the intensity of Red, Green and Blue. A lower value stands for higher intensity and a higher value for lower intensity. For instance, one pixel can be represented as a list of these three values [ 78, 136, 60]. Black would represented as [0, 0, 0].</p>
<p>Finally, we need to add an extra dimension to avoid a ValueError when applying the transformations. This is done with the reshape function.</p>
<p>Alright, now we have something to work with. Let’s continue with the transformations.</p>
<h2 id="heading-rotation">Rotation</h2>
<p>By specifying a value between 0 and 180, Keras will randomly choose an angle to rotate the image. It will do this clockwise or counter-clockwise. In our example, the image will be rotated with maximum of 90 degrees.</p>
<p>ImageDataGenerator also has a parameter fill_mode. The default value is ‘nearest’. By rotating the image within the width and height of the original image we end up with “empty” pixels. The fill_mode then uses the nearest pixels to fill this empty space.</p>
<pre><code class="lang-python">imgGen = ImageDataGenerator(rotation_range = <span class="hljs-number">90</span>)
i = <span class="hljs-number">1</span>
<span class="hljs-keyword">for</span> batch <span class="hljs-keyword">in</span> imgGen.flow(x, batch_size=<span class="hljs-number">1</span>, save_to_dir=<span class="hljs-string">'example_transformations'</span>, save_format=<span class="hljs-string">'jpeg'</span>, save_prefix=<span class="hljs-string">'trsf'</span>):
    i += <span class="hljs-number">1</span>
    <span class="hljs-keyword">if</span> i &amp;gt; <span class="hljs-number">3</span>:
        <span class="hljs-keyword">break</span>
</code></pre>
<p>In the flow method, we specify where to save the transformed images. Make sure this directory exists! We also prefix the newly created images for convenience. The flow method would run infinitely, but for this example, we only generate three images. So when our counter reaches this value, we break the for loop. You can see the result below.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/image-102.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-width-shift">Width shift</h2>
<p>In the width_shift_range parameter, you specify the ratio of the original width by which the image can be shifted to the left or right. Again, the fill_mode will fill up the newly created empty pixels. For the remaining examples, I will only show how to instantiate the ImageDataGenerator with the respective parameter. The code to generate the images is the same as in the rotation example.</p>
<pre><code class="lang-python">imgGen = ImageDataGenerator(width_shift_range = <span class="hljs-number">90</span>)
</code></pre>
<p>In the transformed images we see that the image is shifted to the right. The empty pixels are filled which gives it a bit of a stretched look.</p>
<p>The same can be done for shifting up or down by specifying a value for the height_shift_range parameter.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/image-103.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-rescale">Rescale</h2>
<p>Rescaling an image will multiply the RGB values of each pixel by a chosen value before any other preprocessing. In our example, we apply min-max scaling to the values. As a result, these values will range between 0 and 1. This makes the values smaller and easier for the model to process.</p>
<pre><code class="lang-python">imgGen = ImageDataGenerator(rescale = <span class="hljs-number">1.</span>/<span class="hljs-number">255</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/image-104.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-shear">Shear</h2>
<p>With the shear_range parameter, we can specify how the shearing transformations must be applied. This transformation can produce rather weird images when the value is set too high. So don’t set it too high.</p>
<pre><code class="lang-python">imgGen = ImageDataGenerator(shear_range = <span class="hljs-number">0.2</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/image-106.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-zoom">Zoom</h2>
<p>This transformation will zoom inside the picture. Just like the shearing parameter, this value should not be exaggerated to keep the images realistic.</p>
<pre><code class="lang-python">imgGen = ImageDataGenerator(zoom_range = <span class="hljs-number">0.2</span>)
</code></pre>
<h2 id="heading-horizontal-flip">Horizontal flip</h2>
<p>This transformation flips an image horizontally. Life can be simple sometimes…</p>
<pre><code class="lang-python">imgGen = ImageDataGenerator(horizontal_flip = <span class="hljs-literal">True</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/image-107.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-all-transformations-combined">All transformations combined</h2>
<p>Now that we have seen the effect of each transformation separately, we apply all the combinations together.</p>
<pre><code class="lang-python">imgGen = ImageDataGenerator(
    rotation_range = <span class="hljs-number">40</span>,
    width_shift_range = <span class="hljs-number">0.2</span>,
    height_shift_range = <span class="hljs-number">0.2</span>,
    rescale = <span class="hljs-number">1.</span>/<span class="hljs-number">255</span>,
    shear_range = <span class="hljs-number">0.2</span>,
    zoom_range = <span class="hljs-number">0.2</span>,
    horizontal_flip = <span class="hljs-literal">True</span>)
i = <span class="hljs-number">1</span>
<span class="hljs-keyword">for</span> batch <span class="hljs-keyword">in</span> imgGen.flow(x, batch_size=<span class="hljs-number">1</span>, save_to_dir=<span class="hljs-string">'example_transformations'</span>, save_format=<span class="hljs-string">'jpeg'</span>, save_prefix=<span class="hljs-string">'all'</span>):
    i += <span class="hljs-number">1</span>
    <span class="hljs-keyword">if</span> i &amp;gt; <span class="hljs-number">3</span>:
        <span class="hljs-keyword">break</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/image-108.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-setting-up-the-folder-structure">Setting up the folder structure</h2>
<p>We need to store these images in a specific folder structure. As such we can use the method flow_from_directory to augment the images and create the corresponding labels. This folder structure needs to look like this:</p>
<ul>
<li><strong>train</strong></li>
<li>maniola_jurtina</li>
<li>0.jpg</li>
<li>1.jpg</li>
<li>…</li>
<li>pyronia_tithonus</li>
<li>0.jpg</li>
<li>1.jpg</li>
<li>…</li>
<li><strong>validation</strong></li>
<li>maniola_jurtina</li>
<li>0.jpg</li>
<li>1.jpg</li>
<li>…</li>
<li>pyronia_tithonus</li>
<li>0.jpg</li>
<li>1.jpg</li>
<li>…</li>
</ul>
<p>To create this folder structure I created a gist <a target="_blank" href="https://gist.github.com/bertcarremans/679624f369ed9270472e37f8333244f5">img_train_test_split.py</a>. Feel free to use it in your projects.</p>
<h2 id="heading-creating-the-generators">Creating the generators</h2>
<p>Just as before, we specify the configuration parameters for the training generator. The validation images will not be transformed as the training images. We only divide the RGB values to make them smaller.</p>
<p>The flow_from_directory method takes the images from the train or validation folder and generates batches of 32 transformed images. By setting the class_mode to ‘binary’ a one-dimensional label is created based on the image’s folder name.</p>
<pre><code class="lang-python">train_datagen = ImageDataGenerator(
    rotation_range = <span class="hljs-number">40</span>,
    width_shift_range = <span class="hljs-number">0.2</span>,
    height_shift_range = <span class="hljs-number">0.2</span>,
    rescale = <span class="hljs-number">1.</span>/<span class="hljs-number">255</span>,
    shear_range = <span class="hljs-number">0.2</span>,
    zoom_range = <span class="hljs-number">0.2</span>,
    horizontal_flip = <span class="hljs-literal">True</span>)
validation_datagen = ImageDataGenerator(rescale=<span class="hljs-number">1.</span>/<span class="hljs-number">255</span>)
train_generator = train_datagen.flow_from_directory(
    <span class="hljs-string">'data/train'</span>,
    batch_size=<span class="hljs-number">32</span>,
    class_mode=<span class="hljs-string">'binary'</span>)
validation_generator = validation_datagen.flow_from_directory(
    <span class="hljs-string">'data/validation'</span>,
    batch_size=<span class="hljs-number">32</span>,
    class_mode=<span class="hljs-string">'binary'</span>)
</code></pre>
<h2 id="heading-what-about-different-image-sizes">What about different image sizes?</h2>
<p>The Flickr API lets you download images of specific sizes. However, in real-world applications the image sizes are not always constant. If the aspect ratio of the images is the same, we can simply resize the images. Otherwise, we can crop the images. Unfortunately, it is difficult to crop the image while keeping the object we want to classify intact.</p>
<p>Keras can deal with different image sizes. When configuring the model you can specify None for the width and height in input_shape.</p>
<pre><code class="lang-python">input_shape=(<span class="hljs-number">3</span>, <span class="hljs-literal">None</span>, <span class="hljs-literal">None</span>)  <span class="hljs-comment"># Theano</span>
input_shape=(<span class="hljs-literal">None</span>, <span class="hljs-literal">None</span>, <span class="hljs-number">3</span>)  <span class="hljs-comment"># Tensorflow</span>
</code></pre>
<p>I wanted to show that it is possible to work with different image sizes, however, it has some drawbacks.</p>
<ul>
<li>not all layers (e.g. Flatten) will work with None as an input dimension</li>
<li>it can be computationally heavy to run</li>
</ul>
<h1 id="heading-building-the-deep-learning-model">Building the deep learning model</h1>
<p>For the remainder of this article, I will discuss the structure of a convolutional neural network, illustrated with some examples for our butterfly project. At the end of this article, we’ll have our first classification results.</p>
<h2 id="heading-what-layers-does-a-convolutional-neural-network-consist-of">What layers does a convolutional neural network consist of?</h2>
<p>Of course, you can choose how many layers and their type to add to your convolutional neural network (also called CNN or convnet). In this project we will start with the following structure:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/image-111.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Let’s understand what each layer does and how we create them with Keras.</p>
<h2 id="heading-input-layer">Input layer</h2>
<p>These different versions of the images were modified via several transformations. Then, these images are converted into a numerical representation or a matrix.</p>
<p>The dimensions of this matrix will be width x height x number of (color) channels<em>.</em> For RGB images the number of channels will be three. For grayscale images, this is equal to one. Below you can see a numerical representation of a 7×7 RGB image.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/image-112.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>As our images are of size 75×75, we need to specify that in the input_shape parameter when adding the first convolutional layer.</p>
<pre><code class="lang-python">cnn = Sequential()
cnn.add(Conv2D(<span class="hljs-number">32</span>,(<span class="hljs-number">3</span>,<span class="hljs-number">3</span>), input_shape = (<span class="hljs-number">3</span> ,<span class="hljs-number">75</span> ,<span class="hljs-number">75</span>)))
</code></pre>
<h2 id="heading-convolutional-layer">Convolutional layer</h2>
<p>In the first layers, the convolutional neural network will look for lower-level features, like horizontal or vertical edges. The further we go in the network it will look for higher-level features, such as a wing of a butterfly, for example. But how does it detect features when it gets only numbers as input? That’s where filters come in.</p>
<h2 id="heading-filters-or-kernels">Filters (or kernels)</h2>
<p>You can think of a filter as a searchlight of a specific size that scans over the image. The filter example below has dimensions of 3x3x3 and contains weights that will detect a vertical edge. For a grayscale image, the dimensions would have been 3x3x1. Usually, a filter has smaller dimensions than the image we want to classify. 3×3, 5×5 or 7×7 are typically used. The third dimension should always be equal to the number of channels.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/image-113.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>While scanning the image, the RGB values are transformed. It does this transformation by multiplying the RGB values with the filter’s weights. Finally, the multiplied values are then summed over all channels. In our 7x7x3 image example and the 3x3x3 filter, this would result in a 5x5x1 outcome.</p>
<p>The animation below illustrates this convolutional operation. For simplicity, we only look for a vertical edge in the Red channel. Thus, the weights for the Green and Blue channels are all equal to zero. But you should keep in mind that the multiplication results for these channels are added to the result of the Red channel.</p>
<p>As shown below the convolutional layer will produce numerical outcomes. When you have higher numbers, this means that the filter came across the feature it was looking for. In our example, a vertical edge.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/0_ykXVTApvty9Q0lAX-1.gif" alt="Image" width="600" height="400" loading="lazy"></p>
<p>We can specify that we want more than one filter. These filters could have their own feature to look for in an image. Suppose we use 32 filters of size 3x3x3. The result of all filters is stacked and we end up with a 5x5x32 volume in our example. In the code snippet above we added 32 filters of size 3x3x3.</p>
<h2 id="heading-stride">Stride</h2>
<p>In the example above we saw that the filter moves up one pixel at a time. This is the so-called stride. We could increase the number of pixels the filter moves up. Increasing the stride will reduce the dimensions of the original image much faster. In the example below, you see how the filter moves around with a stride of 2, which would result in a 3x3x1 outcome for a 3x3x3 filter and a 7x7x3 image.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/0_Ds4PLixAjvOMPF9j-1.gif" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-padding">Padding</h2>
<p>By applying a filter, the dimensions of the original image are quickly reduced. Especially the pixels at the edges of the image are only used once in the convolutional operation. This results in a loss of information. If you want to avoid that, you can specify padding. Padding adds “extra pixels” around the image.</p>
<p>Suppose we add padding of one pixel around the 7x7x3 image. This results in a 9x9x3 image. If we apply a 3x3x3 filter and a stride of 1, we end up with a 7x7x1 outcome. So, in that case, we preserve the dimensions of the original image and the outer pixels are used more than once.</p>
<p>You can calculate the resulting outcome of the convolutional operation with specific padding and stride as follows:</p>
<p><strong>1 + [(original dimension + padding x 2 — filter dimension) / stride size]</strong></p>
<p>For example, suppose we have this set-up of our conv layer:</p>
<ul>
<li>7x7x3 image</li>
<li>3x3x3 filter</li>
<li>padding of 1 pixel</li>
<li>stride of 2 pixels</li>
</ul>
<p>That will give 1 + [(7 + 1 x 2–3) / 2] = 4</p>
<h2 id="heading-why-do-we-need-convolutional-layers">Why do we need convolutional layers?</h2>
<p>A benefit of using conv layers is that the number of parameters to estimate is much lower. Much lower compared to having a normal hidden layer. Suppose we continue with our example image of 7x7x3 and a filter of 3x3x3 with no padding and stride of 1. The convolutional layer would have 5x5x1 + 1 bias = 26 weights to estimate. In a neural network with 7x7x3 inputs and 5x5x1 neurons in the hidden layer, we would need to estimate 3.675 weights. Imagine what this number is when you have larger images…</p>
<h2 id="heading-relu-layer">ReLu layer</h2>
<p>Or Rectified Linear unit layer. This layer adds nonlinearity to the network. The convolutional layer is a linear layer as it sums up the multiplications of the filter weights and RGB values.</p>
<p>The outcome of a ReLu function is equal to zero for all values of x &lt;= 0. Otherwise, it is equal to the value of x. The code in Keras to add a ReLu layer is:</p>
<pre><code class="lang-python">cnn.add(Activation(‘relu’))
</code></pre>
<h2 id="heading-pooling">Pooling</h2>
<p>Pooling aggregates the input volume in order to reduce the dimensions further. This speeds up computation time as the number of parameters to be estimated are reduced. Besides that, it helps to avoid overfitting by making the network more robust. Below we illustrate max pooling with a size of 2×2 and stride of 2.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/image-115.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The code in Keras to add pooling with a size of 2×2 is:</p>
<pre><code class="lang-python">cnn.add(MaxPooling2D(pool_size = (<span class="hljs-number">2</span> ,<span class="hljs-number">2</span>)))
</code></pre>
<h2 id="heading-fully-connected-layer">Fully connected layer</h2>
<p>At the end, the convnet is able to detect higher level features in the input images. This can then serve as an input for a fully connected layer. Before we can do that, we will flatten the output of the last ReLu layer. Flattening means we convert it to a vector. The vector values are then connected to all neurons in the fully connected layer. To do that in Python we use the following Keras functions:</p>
<pre><code class="lang-python">cnn.add(Flatten())        
cnn.add(Dense(<span class="hljs-number">64</span>))
</code></pre>
<h2 id="heading-dropout">Dropout</h2>
<p>Just like pooling, dropout can help to avoid overfitting. It randomly sets a specified fraction of the inputs to zero, during the training of the model. A dropout rate between 20 and 50% is considered to work well.</p>
<pre><code class="lang-python">cnn.add(Dropout(<span class="hljs-number">0.2</span>))
</code></pre>
<h2 id="heading-sigmoid-activation">Sigmoid activation</h2>
<p>Because we want to produce a probability that the image is one of two butterfly species (i.e. binary classification), we can use a sigmoid activation layer.</p>
<pre><code class="lang-python">cnn.add(Activation(<span class="hljs-string">'relu'</span>))
cnn.add(Dense(<span class="hljs-number">1</span>))
cnn.add(Activation( <span class="hljs-string">'sigmoid'</span>))
</code></pre>
<h2 id="heading-applying-the-convolutional-neural-network-on-the-butterfly-images">Applying the convolutional neural network on the butterfly images</h2>
<p>Now we can define the complete convolutional neural network structure as displayed at the beginning of this post. First, we need to import the necessary Keras modules. Then we can start adding the layers that we explained above.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> keras.models <span class="hljs-keyword">import</span> Sequential
<span class="hljs-keyword">from</span> keras.layers <span class="hljs-keyword">import</span> Conv2D, MaxPooling2D
<span class="hljs-keyword">from</span> keras.layers <span class="hljs-keyword">import</span> Activation, Flatten, Dense, Dropout
<span class="hljs-keyword">from</span> keras.preprocessing.image <span class="hljs-keyword">import</span> ImageDataGenerator
<span class="hljs-keyword">import</span> time
IMG_SIZE = <span class="hljs-comment"># Replace with the size of your images</span>
NB_CHANNELS = <span class="hljs-comment"># 3 for RGB images or 1 for grayscale images</span>
BATCH_SIZE = <span class="hljs-comment"># Typical values are 8, 16 or 32</span>
NB_TRAIN_IMG = <span class="hljs-comment"># Replace with the total number training images</span>
NB_VALID_IMG = <span class="hljs-comment"># Replace with the total number validation images</span>
</code></pre>
<p>I made some additional parameters explicit for the conv layers. Here is a short explanation:</p>
<ul>
<li>kernel_size specifies the filter size. So for the first conv layer this is size 2×2</li>
<li>padding = ‘same’ means applying zero padding as such the original image size is preserved.</li>
<li>padding = ‘valid’ means we do not apply any padding.</li>
<li>data_format = ‘channels_last’ is just to specify that the number of color channels is specified last in the input_shape argument.</li>
</ul>
<pre><code class="lang-python">cnn = Sequential()
cnn.add(Conv2D(filters=<span class="hljs-number">32</span>, 
               kernel_size=(<span class="hljs-number">2</span>,<span class="hljs-number">2</span>), 
               strides=(<span class="hljs-number">1</span>,<span class="hljs-number">1</span>),
               padding=<span class="hljs-string">'same'</span>,
               input_shape=(IMG_SIZE,IMG_SIZE,NB_CHANNELS),
               data_format=<span class="hljs-string">'channels_last'</span>))
cnn.add(Activation(<span class="hljs-string">'relu'</span>))
cnn.add(MaxPooling2D(pool_size=(<span class="hljs-number">2</span>,<span class="hljs-number">2</span>),
                     strides=<span class="hljs-number">2</span>))
cnn.add(Conv2D(filters=<span class="hljs-number">64</span>,
               kernel_size=(<span class="hljs-number">2</span>,<span class="hljs-number">2</span>),
               strides=(<span class="hljs-number">1</span>,<span class="hljs-number">1</span>),
               padding=<span class="hljs-string">'valid'</span>))
cnn.add(Activation(<span class="hljs-string">'relu'</span>))
cnn.add(MaxPooling2D(pool_size=(<span class="hljs-number">2</span>,<span class="hljs-number">2</span>),
                     strides=<span class="hljs-number">2</span>))
cnn.add(Flatten())        
cnn.add(Dense(<span class="hljs-number">64</span>))
cnn.add(Activation(<span class="hljs-string">'relu'</span>))
cnn.add(Dropout(<span class="hljs-number">0.25</span>))
cnn.add(Dense(<span class="hljs-number">1</span>))
cnn.add(Activation(<span class="hljs-string">'sigmoid'</span>))
cnn.compile(loss=<span class="hljs-string">'binary_crossentropy'</span>, optimizer=<span class="hljs-string">'rmsprop'</span>, metrics=[<span class="hljs-string">'accuracy'</span>])
</code></pre>
<p>Finally, we compile this network structure and set the loss parameter to binary_crossentropy which is good for binary targets and use accuracy as the evaluation metric.</p>
<p>After having specified the network structure, we create the generators for the training and validation samples. On the training samples, we apply data augmentation as explained above. On the validation samples, we do not apply any augmentation as they are just used to evaluate the model performance.</p>
<pre><code class="lang-python">train_datagen = ImageDataGenerator(
    rotation_range = <span class="hljs-number">40</span>,                  
    width_shift_range = <span class="hljs-number">0.2</span>,                  
    height_shift_range = <span class="hljs-number">0.2</span>,                  
    rescale = <span class="hljs-number">1.</span>/<span class="hljs-number">255</span>,                  
    shear_range = <span class="hljs-number">0.2</span>,                  
    zoom_range = <span class="hljs-number">0.2</span>,                     
    horizontal_flip = <span class="hljs-literal">True</span>)
validation_datagen = ImageDataGenerator(rescale = <span class="hljs-number">1.</span>/<span class="hljs-number">255</span>)
train_generator = train_datagen.flow_from_directory(
    <span class="hljs-string">'../flickr/img/train'</span>,
    target_size=(IMG_SIZE,IMG_SIZE),
    class_mode=<span class="hljs-string">'binary'</span>,
    batch_size = BATCH_SIZE)
validation_generator = validation_datagen.flow_from_directory(
    <span class="hljs-string">'../flickr/img/validation'</span>,
    target_size=(IMG_SIZE,IMG_SIZE),
    class_mode=<span class="hljs-string">'binary'</span>,
    batch_size = BATCH_SIZE)
</code></pre>
<p>With the flow_from_directory method on the generators we can easily go through all the images in the specified directories.</p>
<p>Lastly, we can fit the convolutional neural network on the training data and evaluate with the validation data. The resulting weights of the model can be saved and reused later on.</p>
<pre><code class="lang-python">start = time.time()
cnn.fit_generator(
    train_generator,
    steps_per_epoch=NB_TRAIN_IMG//BATCH_SIZE,
    epochs=<span class="hljs-number">50</span>,
    validation_data=validation_generator,
    validation_steps=NB_VALID_IMG//BATCH_SIZE)
end = time.time()
print(<span class="hljs-string">'Processing time:'</span>,(end - start)/<span class="hljs-number">60</span>)
cnn.save_weights(<span class="hljs-string">'cnn_baseline.h5'</span>)
</code></pre>
<p>The number of epochs is arbitrarily set to 50. An epoch is the cycle of forward propagation, checking the error and then adjusting the weights during backpropagation.</p>
<p>The steps_per_epoch is set to the number of training images divided by the batch size (by the way, the double division symbol will make sure the result is an integer and not a float). Specifying a batch size greater than 1 will speed up the process. Idem for the validation_steps parameter.</p>
<h2 id="heading-results">Results</h2>
<p>After running 50 epochs, we have a training accuracy of 0.8091 and validation accuracy of 0.7359. So the convolutional neural network still suffers from quite some overfitting. We also see that the validation accuracy varies quite a lot. This is because we have a small set of validation samples. It would be better to do k-fold cross-validation for each evaluation round. But that would take quite some time.</p>
<p>To address the overfitting we could:</p>
<ul>
<li>increase the dropout rate</li>
<li>apply dropout at each layer</li>
<li>find more training data</li>
</ul>
<p>We’ll look into the first two options and monitor the result. The results of our first model will serve as a baseline. After applying an extra dropout layer and increasing the dropout rates, the model is a bit less overfitted.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/image-116.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>I hope you’ve all enjoyed reading this post and learned something new. The full code is available on <a target="_blank" href="https://github.com/bertcarremans/Vlindervinder">Github</a>. Cheers!  </p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to install TensorFlow and Keras using Anaconda Navigator — without the command line ]]>
                </title>
                <description>
                    <![CDATA[ By Ekapope Viriyakovithya Say no to pip install in the command line! Here's an alternative way to install TensorFlow on your local machine in 3 steps. _Photo by [Unsplash](https://unsplash.com/@kowalikus?utm_source=ghost&utm_medium=referral&utm_camp... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/install-tensorflow-and-keras-using-anaconda-navigator-without-command-line/</link>
                <guid isPermaLink="false">66d45e404a7504b7409c3386</guid>
                
                    <category>
                        <![CDATA[ anaconda ]]>
                    </category>
                
                    <category>
                        <![CDATA[ keras ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ TensorFlow ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Wed, 24 Jul 2019 10:15:00 +0000</pubDate>
                <media:content url="https://cdn-media-2.freecodecamp.org/w1280/5f9ca140740569d1a4ca4d8c.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Ekapope Viriyakovithya</p>
<p>Say no to pip install in the command line! Here's an alternative way to install TensorFlow on your local machine in 3 steps.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/04/image-239.png" alt="Image" width="600" height="400" loading="lazy">
_Photo by [Unsplash](https://unsplash.com/@kowalikus?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit"&gt;Krzysztof Kowalik / &lt;a href="https://unsplash.com/?utm_source=ghost&amp;utm_medium=referral&amp;utm<em>campaign=api-credit)</em></p>
<h1 id="heading-why-am-i-writing-this">Why am I writing this?</h1>
<p>I played around with pip install with multiple configurations for several hours, trying to figure how to properly set my python environment for TensorFlow and Keras.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/07/image-31.png" alt="Image" width="600" height="400" loading="lazy">
<em>why is tensorflow so hard to install — 600k+ results</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/07/image-32.png" alt="Image" width="600" height="400" loading="lazy">
<em>unable to install tensorflow on windows site:stackoverflow.com — 26k+ results</em></p>
<h1 id="heading-just-before-i-gave-up-i-found-this">Just before I gave up, I found this…</h1>
<p>_“<a target="_blank" href="https://www.anaconda.com/tensorflow-in-anaconda/?source=post_page---------------------------">One key benefit of installing TensorFlow using conda rather than pip is a result of the conda package management system. When TensorFlow is installed using conda, conda installs all the necessary and compatible dependencies for the packages as well.</a> _”__</p>
<p>This article will walk you through the process how to install TensorFlow and Keras by using the GUI version of Anaconda. I assume you have downloaded and installed <a target="_blank" href="https://www.anaconda.com/distribution/?source=post_page---------------------------">Anaconda Navigator</a> already.</p>
<h1 id="heading-lets-get-started">Let’s get started!</h1>
<ol>
<li>Launch Anaconda Navigator. Go to the Environments tab and click ‘Create’.</li>
</ol>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/07/image-33.png" alt="Image" width="600" height="400" loading="lazy">
<em>Go to ‘Environments tab’, click ‘Create’</em></p>
<ol start="2">
<li>Input a new environment name - I put ‘tensorflow_env’. <strong>Make sure to select Python 3.6 here!</strong> Then ‘Create’, this may take few minutes.</li>
</ol>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/07/image-34.png" alt="Image" width="600" height="400" loading="lazy">
<em>make sure to select Python 3.6</em></p>
<ol start="3">
<li>In your new ‘tensorflow_env’ environment, select ‘Not installed’, and type in ‘tensorflow’. Then, tick ‘tensorflow’ and ‘Apply’. The pop-up window will appear, go ahead and apply. This may take several minutes.</li>
</ol>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/07/image-35.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Do the same for ‘keras’.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/07/image-36.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Check your installation by importing the packages. If everything is okay, the command will return nothing. If the installation was unsuccessful, you will get an error.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/07/image-37.png" alt="Image" width="600" height="400" loading="lazy">
<em>no error pop up — Yeah!</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/07/image-38.png" alt="Image" width="600" height="400" loading="lazy">
<em>You can also try with Spyder.</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/07/image-39.png" alt="Image" width="600" height="400" loading="lazy">
<em>no error pop up — Yeah!</em></p>
<p>And…Ta-da! It’s done! You can follow <a target="_blank" href="https://towardsdatascience.com/how-to-build-a-neural-network-with-keras-e8faa33d0ae4?source=post_page---------------------------">this article</a> to test your newly installed packages :)</p>
<hr>
<p>Thank you for reading. Please give it a try, and let me know your feedback!</p>
<p>Consider following me on <a target="_blank" href="https://github.com/ekapope?source=post_page---------------------------">GitHub</a>, <a target="_blank" href="https://medium.com/@ekapope.v?source=post_page---------------------------">Medium</a>, and <a target="_blank" href="https://twitter.com/EkapopeV?source=post_page---------------------------">Twitter</a> to get more articles and tutorials on your feed if you like what I did. :)</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Object Detection in Colab with Fizyr Retinanet ]]>
                </title>
                <description>
                    <![CDATA[ By RomRoc Let’s continue our journey to explore the best machine learning frameworks in computer vision. In the first article we explored object detection with the official Tensorflow APIs. The second article was dedicated to an excellent framework f... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/object-detection-in-colab-with-fizyr-retinanet-efed36ac4af3/</link>
                <guid isPermaLink="false">66c35c29cf1314a450f0d731</guid>
                
                    <category>
                        <![CDATA[ Google Colab ]]>
                    </category>
                
                    <category>
                        <![CDATA[ keras ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Thu, 04 Apr 2019 17:56:59 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*g5nzQWVR79PK2vyznKgPAA.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By RomRoc</p>
<p>Let’s continue our journey to explore the best machine learning frameworks in computer vision.</p>
<p>In the <a target="_blank" href="https://hackernoon.com/object-detection-in-google-colab-with-custom-dataset-5a7bb2b0e97e">first article</a> we explored object detection with the official Tensorflow APIs. The <a target="_blank" href="https://hackernoon.com/instance-segmentation-in-google-colab-with-custom-dataset-b3099ac23f35">second article</a> was dedicated to an excellent framework for instance segmentation, Matterport Mask R-CNN based on Keras.</p>
<p>In this article we examine <strong>Keras implementation of RetinaNet object detection developed by <a target="_blank" href="https://github.com/fizyr/keras-retinanet">Fizyr</a></strong>. RetinaNet, as described in <a target="_blank" href="https://arxiv.org/abs/1708.02002">Focal Loss for Dense Object Detection</a>, is the state of the art for object detection.<br>The object to detect with the trained model will be my little goat Rosa.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/gzJo8LgsXIrXkN2K65AGcJ6cfANG7XtNzsob" alt="Image" width="596" height="453" loading="lazy">
<em>Object detection with Fizyr</em></p>
<p><strong>The colab notebook and dataset are available in <a target="_blank" href="https://github.com/RomRoc/objdet_fizyr_colab">my Github repo</a>.</strong></p>
<p>In this article, we go through all the steps in a single Google Colab netebook to train a model starting from a custom dataset.</p>
<p>We will keep in mind these principles:</p>
<ul>
<li>illustrate how to make the annotation dataset</li>
<li>describe all the steps in a single Notebook</li>
<li>use free software, Google Colab and Google Drive, so it’s based exclusively on <strong><em>free cloud resources</em></strong></li>
</ul>
<p>At the end of the article you will be surprised by the simplicity of use and the good results we will obtain through this object detection framework.</p>
<p><em>Despite its ease of use, Fizyr is a great framework, also used by the <a target="_blank" href="https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/discussion/70421"><strong>winner</strong></a> <strong>of the Kaggle competition</strong> “RSNA Pneumonia Detection Challenge”.</em></p>
<h3 id="heading-making-the-dataset">Making the dataset</h3>
<p>We start by creating annotations for the training and validation dataset, using the tool <a target="_blank" href="https://github.com/tzutalin/labelImg"><strong>LabelImg</strong></a>. This excellent annotation tool let you quickly annotate the bounding boxes of the objects to train the machine learning model.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/Id6MpAH6MV52QtprI-i9IcXRn7tIle0GQfsR" alt="Image" width="800" height="434" loading="lazy">
<em>LabelImg annotation tool</em></p>
<p>LabelImg creates annotations in PascalVoc format, so we need to convert annotations to Fizyr format:</p>
<ul>
<li>create a zip file containing training dataset images and annotations with the same filename (check my example dataset in Github)</li>
</ul>
<pre><code>objdet_dataset.zip|- img1.jpg|- img1.xml|- img2.jpg|- img2.xml...
</code></pre><ul>
<li>Upload zip file in Google Drive, get Drive file id, and substitute the DATASET_DRIVEID value</li>
<li>Run cell that iterates over the xml files and creates annotations.csv file</li>
</ul>
<p><em>Note: you can see <a target="_blank" href="https://stackoverflow.com/a/48855034/9250875">my answer</a> on Stackoverflow to get the Drive file id.</em></p>
<h3 id="heading-model-training">Model training</h3>
<p>Model training is the core of the notebook. Fizyr offers various parameters, described in <a target="_blank" href="https://github.com/fizyr/keras-retinanet/blob/c841da27f540084d27e971b6d00c178ff005d344/keras_retinanet/bin/train.py#L358">Github</a>, to run and optimize this step.</p>
<p>It’s a good option to start from a pretrained model instead of training a model from scratch. Fizyr released a model based on ResNet50 architecture, pretrained on Coco dataset.</p>
<pre><code>URL_MODEL = <span class="hljs-string">'https://github.com/fizyr/keras-retinanet/releases/download/0.5.0/resnet50_coco_best_v2.1.0.h5'</span>
</code></pre><p>We can even use our pretrained model, and continue the training from it. This option is particularly useful to train for some epochs, so save it in Google Drive, and later restart the training from the saved model. In this way we can bypass the 12-hour execution limit in Colab, and we can train the model for many epochs.</p>
<p>From my tests, a high value of batch_size and steps offers better results, but they greatly increase the execution time of each epoch.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/PntGODQ4dBvWoaqJGrEErgXfKuOiBRnGE8D8" alt="Image" width="800" height="751" loading="lazy">
<em>Tensorboard training charts</em></p>
<p>We can start training from our custom dataset with:</p>
<pre><code>!keras_retinanet/bin/train.py --freeze-backbone --random-transform --weights {PRETRAINED_MODEL} --batch-size <span class="hljs-number">8</span> --steps <span class="hljs-number">500</span> --epochs <span class="hljs-number">10</span> csv annotations.csv classes.csv
</code></pre><p>Let’s analyze each argument passed to the script train.py.</p>
<ul>
<li>freeze-backbone: freeze the backbone layers, particularly useful when we use a small dataset, to avoid overfitting</li>
<li>random-transform: randomly transform the dataset to get data augmentation</li>
<li>weights: initialize the model with a pretrained model (your own model or one released by Fizyr)</li>
<li>batch-size: training batch size, higher value gives smoother learning curve</li>
<li>steps: number of steps for epochs</li>
<li>epochs: number of epochs to train</li>
<li>csv: annotations files generated by the script above</li>
</ul>
<p>The training process output contains a description of layers and loss metrics during training, and as you can see, loss metrics decrease during each epoch:</p>
<pre><code>Using TensorFlow backend....Layer (type)                    Output Shape         Param #     Connected toinput_1 (InputLayer)            (None, None, None, <span class="hljs-number">3</span> <span class="hljs-number">0</span>padding_conv1 (ZeroPadding2D)   (None, None, None, <span class="hljs-number">3</span> <span class="hljs-number">0</span>           input_1[<span class="hljs-number">0</span>][<span class="hljs-number">0</span>]                    ...Total params: <span class="hljs-number">36</span>,<span class="hljs-number">382</span>,<span class="hljs-number">957</span>Trainable params: <span class="hljs-number">12</span>,<span class="hljs-number">821</span>,<span class="hljs-number">805</span>Non-trainable params: <span class="hljs-number">23</span>,<span class="hljs-number">561</span>,<span class="hljs-number">152</span>NoneEpoch <span class="hljs-number">1</span>/<span class="hljs-number">10500</span>/<span class="hljs-number">500</span> [==============================] - <span class="hljs-number">1314</span>s <span class="hljs-number">3</span>s/step - loss: <span class="hljs-number">1.0659</span> - regression_loss: <span class="hljs-number">0.6996</span> - classification_loss: <span class="hljs-number">0.3663</span>Epoch <span class="hljs-number">2</span>/<span class="hljs-number">10500</span>/<span class="hljs-number">500</span> [==============================] - <span class="hljs-number">1296</span>s <span class="hljs-number">3</span>s/step - loss: <span class="hljs-number">0.6747</span> - regression_loss: <span class="hljs-number">0.5698</span> - classification_loss: <span class="hljs-number">0.1048</span>Epoch <span class="hljs-number">3</span>/<span class="hljs-number">10500</span>/<span class="hljs-number">500</span> [==============================] - <span class="hljs-number">1304</span>s <span class="hljs-number">3</span>s/step - loss: <span class="hljs-number">0.5763</span> - regression_loss: <span class="hljs-number">0.5010</span> - classification_loss: <span class="hljs-number">0.0753</span>
</code></pre><pre><code>Epoch <span class="hljs-number">3</span>/<span class="hljs-number">10500</span>/<span class="hljs-number">500</span> [==============================] - <span class="hljs-number">1257</span>s <span class="hljs-number">3</span>s/step - loss: <span class="hljs-number">0.5705</span> - regression_loss: <span class="hljs-number">0.4974</span> - classification_loss: <span class="hljs-number">0.0732</span>
</code></pre><h3 id="heading-inference">Inference</h3>
<p>The last step performs inference of test images with the trained model.<br>The Fizyr framework allows us to perform inference using CPU, even if you trained the model with GPU. This feature is important in typical production environments, where people usually opt for less expensive hardware infrastructures for inference, without GPUs.</p>
<p>Let’s examine the following lines in detail:</p>
<pre><code>model_path = os.path.join(<span class="hljs-string">'snapshots'</span>, sorted(os.listdir(<span class="hljs-string">'snapshots'</span>), reverse=True)[<span class="hljs-number">0</span>])print(model_path)
</code></pre><pre><code># load retinanet modelmodel = models.load_model(model_path, backbone_name=<span class="hljs-string">'resnet50'</span>)model = models.convert_model(model)
</code></pre><p>The first line sets the model file as the last model generated by the training process in /snapshots directory. Then the model is loaded from the filesystem and converted to run inference.</p>
<p>You can change the values of THRES_SCORE, which represents the confidence threshold to show an object detection.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/mkkUoWpQY5-4mpXzEacDzy7bqP1QfGaVqUXZ" alt="Image" width="596" height="453" loading="lazy">
<em>Object detection inference</em></p>
<h3 id="heading-conclusions">Conclusions</h3>
<p>We went through the complete journey to make object detection with Fizyr implementation of RetinaNet. We created a dataset, trained a model, and ran inference (<a target="_blank" href="https://github.com/RomRoc/objdet_fizyr_colab">here</a> is my Github repo for the notebook and dataset).</p>
<p>I was impressed by the following aspects of this excellent framework:</p>
<ul>
<li>this framework is <strong>easy to use</strong> to get good inference, even without much customization</li>
<li>it was <strong>simple to transform annotations</strong> to Fizyr’s dataset format, compared to other frameworks.</li>
</ul>
<p>In general Fizyr is a good choice to start an object detection project, in particular if you need to quickly get good results.</p>
<p>If you enjoyed this article, leave a few claps, it will encourage me to explore further machine learning opportunities :)</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to set up NSFW content detection with Machine Learning ]]>
                </title>
                <description>
                    <![CDATA[ By Gant Laborde Teaching a machine to recognize indecent content wasn’t difficult in retrospect, but it sure was tough the first time through. Here are some lessons learned, and some tips and tricks I uncovered while building an NSFW model. Though th... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-set-up-nsfw-content-detection-with-machine-learning-229a9725829c/</link>
                <guid isPermaLink="false">66d45edfc7632f8bfbf1e42e</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ keras ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Wed, 20 Mar 2019 16:01:08 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/0*auWeZYXZjFkr33e6" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Gant Laborde</p>
<p>Teaching a machine to recognize indecent content wasn’t difficult in retrospect, but it sure was tough the first time through.</p>
<p>Here are some lessons learned, and some tips and tricks I uncovered while building an NSFW model.</p>
<p>Though there are lots of ways this could have been implemented, the hope of this post is to provide a friendly narrative so that others can understand what this process can look like.</p>
<p>If you’re new to ML, this will inspire you to train a model. If you’re familiar with it, I’d love to hear how you would have gone about building this model and ask you to share your code.</p>
<h3 id="heading-the-plan">The Plan:</h3>
<ol>
<li>Get lots and lots of data</li>
<li>Label and clean the data</li>
<li>Use Keras and transfer learning</li>
<li>Refine your model</li>
</ol>
<h3 id="heading-get-lots-and-lots-of-data">Get lots and lots of data</h3>
<p><a target="_blank" href="https://github.com/alexkimxyz/nsfw_data_scraper">Fortunately, a really cool set of scraping scripts were released for a NSFW dataset</a>. The code is simple already comes with labeled data categories. This means that just accepting this data scraper’s defaults will give us 5 categories pulled from hundreds of subreddits.</p>
<p>The instructions are quite simple, you can simply run the 6 friendly scripts. Pay attention to them as you may decide to change things up.</p>
<p>If you have more subreddits that you’d like to add, you should edit the source URLs before running step 1.</p>
<blockquote>
<p>E.g. — If you were to add a new source of neutral examples, you’d add to the subreddit list in <code>nsfw_data_scraper/scripts/source_urls/neutral.txt</code>.</p>
</blockquote>
<p>Reddit is a great resource of content around the web, since most subreddits are slightly policed by humans to be on target for that subreddit.</p>
<h3 id="heading-label-and-clean-the-data">Label and clean the data</h3>
<p>The data we got from the NSFW data scraper is already labeled! But expect some errors. Especially since Reddit isn’t perfectly curated.</p>
<p>Duplication is also quite common, but fixable without slow human comparison.</p>
<p>The first thing I like to run is <code>duplicate-file-finder</code> which is the fastest exact file match and deleter. It’s powered in Python.</p>
<p><a target="_blank" href="https://github.com/Qarj/duplicate-file-finder"><strong>Qarj/duplicate-file-finder</strong></a><br><a target="_blank" href="https://github.com/Qarj/duplicate-file-finder">_Find duplicate files. Contribute to Qarj/duplicate-file-finder development by creating an account on GitHub._github.com</a></p>
<p>I can generally get a majority of duplicates knocked out with this command.</p>
<pre><code>python dff.py --path train/path --<span class="hljs-keyword">delete</span>
</code></pre><p>Now, this doesn’t catch images that are “essentially” the same. For that, I advocate using <a target="_blank" href="https://macpaw.com/gemini">a Macpaw tool called “Gemini 2”.</a></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/bDP2lY9uBf6Kk2f-EKjXCpabDnTl35uPuVIP" alt="Image" width="800" height="483" loading="lazy"></p>
<p>While this looks super simple, don’t forget to dig into the automatic duplicates, and select ALL the duplicates until your Gemini screen declares “Nothing Remaining” like so:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/ips7SCI7zqpXXdmvSTZJCl5eInzHNRbBWcuT" alt="Image" width="800" height="479" loading="lazy"></p>
<p>It’s safe to say this can take an extreme amount of time if you have a huge dataset. Personally, I ran it on each classification before I ran it on the parent folder in order to keep reasonable runtimes.</p>
<h3 id="heading-use-keras-and-transfer-learning">Use Keras and transfer learning</h3>
<p>I’ve looked at Tensorflow, Pytorch, and raw Python as ways to build a machine learning model from scratch. But I’m not looking to discover something new, I want to effectively do something pre-existing. So I went pragmatic.</p>
<p>I found Keras to be the most practical API for writing a simple model. Even Tensorflow agrees and is currently <a target="_blank" href="https://medium.com/tensorflow/standardizing-on-keras-guidance-on-high-level-apis-in-tensorflow-2-0-bad2b04c819a">working to be more Keras-like</a>. Also, with only one graphics card, I’m going to grab a popular pre-existing model + weights, and simply train on top of it with some transfer learning.</p>
<p>After a little research, I chose <a target="_blank" href="https://cloud.google.com/tpu/docs/inception-v3-advanced">Inception v3</a> weighted with <a target="_blank" href="http://image-net.org/about-overview">imagenet</a>. To me, that's like going to the pre-existing ML store and buying the Aston Martin. We’ll just shave off the top layer so we can use that model to our needs.</p>
<pre><code class="lang-py">conv_base = InceptionV3(    
  weights=<span class="hljs-string">'imagenet'</span>,     
  include_top=<span class="hljs-literal">False</span>,     
  input_shape=(height, width, num_channels)
)
</code></pre>
<p><img src="https://cdn-media-1.freecodecamp.org/images/Cf05CV83hyD1eVnXBMTvHpoIumCDdE5hUSeW" alt="Image" width="200" height="200" loading="lazy"></p>
<p>With the model in place, I added 3 more layers. A 256 hidden neuron layer, followed by a hidden 128 neuron layer, followed by a final 5 neuron layer. The latter being the ultimate classification into the five final classes moderated by softmax.</p>
<pre><code class="lang-py"><span class="hljs-comment"># Add 256</span>
x = Dense(<span class="hljs-number">256</span>, activation=<span class="hljs-string">'relu'</span>, kernel_initializer=initializers.he_normal(seed=<span class="hljs-literal">None</span>), kernel_regularizer=regularizers.l2(<span class="hljs-number">.0005</span>))(x)
x = Dropout(<span class="hljs-number">0.5</span>)(x)
<span class="hljs-comment"># Add 128</span>
x = Dense(<span class="hljs-number">128</span>,activation=<span class="hljs-string">'relu'</span>, kernel_initializer=initializers.he_normal(seed=<span class="hljs-literal">None</span>))(x)
x = Dropout(<span class="hljs-number">0.25</span>)(x)
<span class="hljs-comment"># Add 5</span>
predictions = Dense(<span class="hljs-number">5</span>,  kernel_initializer=<span class="hljs-string">"glorot_uniform"</span>, activation=<span class="hljs-string">'softmax'</span>)(x)
</code></pre>
<p>Visually this code turns into this:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/lc7FWvMTeY6fic-6PoAO9iM2k4ZE7Ljynb6-" alt="Image" width="369" height="850" loading="lazy"></p>
<p>Some of the above might seem odd. After all, it’s not everyday you say “glorot_uniform”. But strange words aside, my new hidden layers are being regularized to prevent overfitting.</p>
<p>I’m using dropout, which will randomly remove neural pathways so no one feature dominates the model.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/vuHimlxoXmGv4HdmwjMHjRnb11fEQ-cFGaz6" alt="Image" width="800" height="443" loading="lazy">
<em>Too soon?</em></p>
<p>Additionally, I’ve added L2 regularization to the first layer as well.</p>
<p>Now that the model is done, I augmented my dataset with some generated agitation. I rotated, shifted, cropped, sheered, zoomed, flipped, and channel shifted my training images. This helps with assuring the images are trained through common noise.</p>
<p>All the above systems are meant to prevent overfitting the model on the training data. Even if it is a ton of data, I want to keep the model as generalizable to new data as possible.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/Pq6cIzsEIcKcYmiMQekBd1zsPKtUchvrNwqF" alt="Image" width="674" height="667" loading="lazy">
<em>I gotchu model!</em></p>
<p>After running this for a long time, I got around 87% accuracy on the model! That’s a pretty good version one! Let’s make it great.</p>
<h3 id="heading-refine-your-model">Refine your model</h3>
<h4 id="heading-basic-fine-tuning">Basic fine-tuning</h4>
<p>Once the new layers are trained up, you can unlock some deeper layers in your Inception model for retraining. The following code unlocks everything after as of the layer <code>conv2d_56</code>.</p>
<pre><code class="lang-py">set_trainable = <span class="hljs-literal">False</span>
<span class="hljs-keyword">for</span> layer <span class="hljs-keyword">in</span> conv_base.layers:    
    <span class="hljs-keyword">if</span> layer.name == <span class="hljs-string">'conv2d_56'</span>:
        set_trainable = <span class="hljs-literal">True</span>
    <span class="hljs-keyword">if</span> set_trainable:
        layer.trainable = <span class="hljs-literal">True</span>
    <span class="hljs-keyword">else</span>:
        layer.trainable = <span class="hljs-literal">False</span>
</code></pre>
<p>I ran the model for a long time with these newly unlocked layers, and once I added exponential decay (via a scheduled learning rate), the model converged on a 91% accuracy on my test data!</p>
<p>With 300,000 images, finding mistakes in the training data was impossible. But with a model with only 9% error, I could break down the errors by category, and then I could look at only around 5,400 images! Essentially, I could use the model to help me find misclassifications and clean the dataset!</p>
<p>Technically, this would find false negatives only. Doing nothing for bias on the false positives, but with something that detects NSFW content, I imagine recall is more important than precision.</p>
<h4 id="heading-the-most-important-part-of-refining">The most important part of refining</h4>
<p>Even if you have a lot of test data, it’s usually pulled from the same well. The best test is to make it easy for others to use and check your model. This works best in open source and simple demos. I released <a target="_blank" href="http://nsfwjs.com">http://nsfwjs.com</a> which helped the community identify bias, and the community did just that!</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/ij7fLu-tghGePVI0E-da-xHaG2lwhK0Hkiwe" alt="Image" width="800" height="320" loading="lazy"></p>
<p>The community got two interesting indicators of bias fairly quickly. The fun one was that <a target="_blank" href="https://shift.infinite.red/machine-learning-has-opinions-about-jeff-goldblum-strong-opinions-5438447ead35">Jeffrey Goldblum kept getting miscategorized</a>, and the not-so-fun one was that the model was overly sensitive to females.</p>
<p>Once you start getting into hundreds of thousands of images, it’s hard for one person (like <em>moi</em>) to identify where an issue might be. Even if I looked through a thousand images in detail for bias, I wouldn’t have even scratched the surface of the dataset as a whole.</p>
<p><em>That’s why it’s important to speak up.</em> Misclassifying Jeff Goldblum is an entertaining data point, but identifying, documenting, and filing a ticket with examples does something powerful and good. I was able to get to work on fixing the bias.</p>
<p>With new images, improved training, and better validation I was able to retrain the model over a few weeks and attain a much better outcome. The resulting model was far more accurate in the wild. Well, unless you laughed as hard as I did about the Jeff Goldblum issue.</p>
<p><strong>If I could manufacture one flaw… I’d keep Jeff.</strong> But alas, we have hit 93% accuracy!</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/avOLUcFEzFYhgsl7AirD9r5CC-PZdq3DtaEe" alt="Image" width="640" height="480" loading="lazy"></p>
<h3 id="heading-in-summary">In Summary</h3>
<p>It might have taken a lot of time, but it wasn’t hard, and it was fun to build a model. I suggest you grab the source code and try it for yourself! I’ll probably even attempt to retrain the model with other frameworks for comparison.</p>
<blockquote>
<p>Show me what you’ve got. Contribute or ? Star/watch the repo if you’d like to see progress: h<a target="_blank" href="https://github.com/GantMan/nsfw_model">ttps://github.com/GantMan/nsfw_model</a></p>
</blockquote>
<p><img src="https://cdn-media-1.freecodecamp.org/images/ruqh7LQQn8zhBYOaEmCVq76aF0eRD4eYHgG0" alt="Image" width="799" height="649" loading="lazy"></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/70liozeM4alstSXP78IIk8SbNCiiOFrnasPF" alt="Image" width="800" height="71" loading="lazy"></p>
<p><a target="_blank" href="https://www.freecodecamp.org/news/how-to-set-up-nsfw-content-detection-with-machine-learning-229a9725829c/undefined">Gant Laborde</a> is Chief Technology Strategist at <a target="_blank" href="http://infinite.red">Infinite Red</a>, a published author, adjunct professor, worldwide public speaker, and mad scientist in training. Clap/follow/<a target="_blank" href="https://twitter.com/GantLaborde">tweet</a> or visit him <a target="_blank" href="http://gantlaborde.com/">at a conference</a>.</p>
<h4 id="heading-have-a-minute-check-out-a-few-more">Have a minute? Check out a few more:</h4>
<p><a target="_blank" href="https://shift.infinite.red/avoid-nightmares-nsfw-js-ab7b176978b1"><strong>Avoid Nightmares — NSFW JS</strong></a><br><a target="_blank" href="https://shift.infinite.red/avoid-nightmares-nsfw-js-ab7b176978b1">_Client-side indecent content checking for the soul_shift.infinite.red</a><a target="_blank" href="https://shift.infinite.red/5-things-that-suck-about-remote-work-506b98dd38f9"><strong>5 Things that Suck about Remote Work</strong></a><br><a target="_blank" href="https://shift.infinite.red/5-things-that-suck-about-remote-work-506b98dd38f9">_The Pitfalls of Remote Work + Proposed Solutions_shift.infinite.red</a></p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
