<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ regularization - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ regularization - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sun, 24 May 2026 22:25:24 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/regularization/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Handle Overfitting in Deep Learning Models ]]>
                </title>
                <description>
                    <![CDATA[ By Bert Carremans Overfitting occurs when you achieve a good fit of your model on the training data, but it does not generalize well on new, unseen data. In other words, the model learned patterns specific to the training data, which are irrelevant i... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/handling-overfitting-in-deep-learning-models/</link>
                <guid isPermaLink="false">66d45dd9bc9760a197a10351</guid>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ keras ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Overfitting ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ regularization ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Sun, 05 Jan 2020 22:36:48 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2020/01/1_XXtWMdH-YVL8z1VtnfG_iw.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Bert Carremans</p>
<p>Overfitting occurs when you achieve a good fit of your model on the training data, but it does not generalize well on new, unseen data. In other words, the model learned patterns specific to the training data, which are irrelevant in other data.</p>
<p>We can identify overfitting by looking at validation metrics like loss or accuracy. Usually, the validation metric stops improving after a certain number of epochs and begins to decrease afterward. The training metric continues to improve because the model seeks to find the best fit for the training data.</p>
<p>There are several manners in which we can reduce overfitting in deep learning models. The best option is to <strong>get more training data</strong>. Unfortunately, in real-world situations, you often do not have this possibility due to time, budget, or technical constraints.</p>
<p>Another way to reduce overfitting is to <strong>lower the capacity of the model to memorize the training data</strong>. As such, the model will need to focus on the relevant patterns in the training data, which results in better generalization. In this post, we’ll discuss three options to achieve this.</p>
<h1 id="heading-set-up-the-project">Set up the project</h1>
<p>We start by importing the necessary packages and configuring some parameters. We will use <a target="_blank" href="https://keras.io/">Keras</a> to fit the deep learning models. The training data is the <a target="_blank" href="https://www.kaggle.com/crowdflower/twitter-airline-sentiment">Twitter US Airline Sentiment data set from Kaggle</a>.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Basic packages</span>
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd 
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> re
<span class="hljs-keyword">import</span> collections
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">from</span> pathlib <span class="hljs-keyword">import</span> Path
<span class="hljs-comment"># Packages for data preparation</span>
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
<span class="hljs-keyword">from</span> nltk.corpus <span class="hljs-keyword">import</span> stopwords
<span class="hljs-keyword">from</span> keras.preprocessing.text <span class="hljs-keyword">import</span> Tokenizer
<span class="hljs-keyword">from</span> keras.utils.np_utils <span class="hljs-keyword">import</span> to_categorical
<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> LabelEncoder
<span class="hljs-comment"># Packages for modeling</span>
<span class="hljs-keyword">from</span> keras <span class="hljs-keyword">import</span> models
<span class="hljs-keyword">from</span> keras <span class="hljs-keyword">import</span> layers
<span class="hljs-keyword">from</span> keras <span class="hljs-keyword">import</span> regularizers
NB_WORDS = <span class="hljs-number">10000</span>  <span class="hljs-comment"># Parameter indicating the number of words we'll put in the dictionary</span>
NB_START_EPOCHS = <span class="hljs-number">20</span>  <span class="hljs-comment"># Number of epochs we usually start to train with</span>
BATCH_SIZE = <span class="hljs-number">512</span>  <span class="hljs-comment"># Size of the batches used in the mini-batch gradient descent</span>
MAX_LEN = <span class="hljs-number">20</span>  <span class="hljs-comment"># Maximum number of words in a sequence</span>
root = Path(<span class="hljs-string">'../'</span>)
input_path = root / <span class="hljs-string">'input/'</span> 
ouput_path = root / <span class="hljs-string">'output/'</span>
source_path = root / <span class="hljs-string">'source/'</span>
</code></pre>
<h1 id="heading-some-helper-functions">Some helper functions</h1>
<p>We will use some helper functions throughout this post.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">deep_model</span>(<span class="hljs-params">model, X_train, y_train, X_valid, y_valid</span>):</span>
    <span class="hljs-string">'''
    Function to train a multi-class model. The number of epochs and 
    batch_size are set by the constants at the top of the
    notebook. 

    Parameters:
        model : model with the chosen architecture
        X_train : training features
        y_train : training target
        X_valid : validation features
        Y_valid : validation target
    Output:
        model training history
    '''</span>
    model.compile(optimizer=<span class="hljs-string">'rmsprop'</span>
                  , loss=<span class="hljs-string">'categorical_crossentropy'</span>
                  , metrics=[<span class="hljs-string">'accuracy'</span>])

    history = model.fit(X_train
                       , y_train
                       , epochs=NB_START_EPOCHS
                       , batch_size=BATCH_SIZE
                       , validation_data=(X_valid, y_valid)
                       , verbose=<span class="hljs-number">0</span>)
    <span class="hljs-keyword">return</span> history
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">eval_metric</span>(<span class="hljs-params">model, history, metric_name</span>):</span>
    <span class="hljs-string">'''
    Function to evaluate a trained model on a chosen metric. 
    Training and validation metric are plotted in a
    line chart for each epoch.

    Parameters:
        history : model training history
        metric_name : loss or accuracy
    Output:
        line chart with epochs of x-axis and metric on
        y-axis
    '''</span>
    metric = history.history[metric_name]
    val_metric = history.history[<span class="hljs-string">'val_'</span> + metric_name]
    e = range(<span class="hljs-number">1</span>, NB_START_EPOCHS + <span class="hljs-number">1</span>)
    plt.plot(e, metric, <span class="hljs-string">'bo'</span>, label=<span class="hljs-string">'Train '</span> + metric_name)
    plt.plot(e, val_metric, <span class="hljs-string">'b'</span>, label=<span class="hljs-string">'Validation '</span> + metric_name)
    plt.xlabel(<span class="hljs-string">'Epoch number'</span>)
    plt.ylabel(metric_name)
    plt.title(<span class="hljs-string">'Comparing training and validation '</span> + metric_name + <span class="hljs-string">' for '</span> + model.name)
    plt.legend()
    plt.show()
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_model</span>(<span class="hljs-params">model, X_train, y_train, X_test, y_test, epoch_stop</span>):</span>
    <span class="hljs-string">'''
    Function to test the model on new data after training it
    on the full training data with the optimal number of epochs.

    Parameters:
        model : trained model
        X_train : training features
        y_train : training target
        X_test : test features
        y_test : test target
        epochs : optimal number of epochs
    Output:
        test accuracy and test loss
    '''</span>
    model.fit(X_train
              , y_train
              , epochs=epoch_stop
              , batch_size=BATCH_SIZE
              , verbose=<span class="hljs-number">0</span>)
    results = model.evaluate(X_test, y_test)
    print()
    print(<span class="hljs-string">'Test accuracy: {0:.2f}%'</span>.format(results[<span class="hljs-number">1</span>]*<span class="hljs-number">100</span>))
    <span class="hljs-keyword">return</span> results

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">remove_stopwords</span>(<span class="hljs-params">input_text</span>):</span>
    <span class="hljs-string">'''
    Function to remove English stopwords from a Pandas Series.

    Parameters:
        input_text : text to clean
    Output:
        cleaned Pandas Series 
    '''</span>
    stopwords_list = stopwords.words(<span class="hljs-string">'english'</span>)
    <span class="hljs-comment"># Some words which might indicate a certain sentiment are kept via a whitelist</span>
    whitelist = [<span class="hljs-string">"n't"</span>, <span class="hljs-string">"not"</span>, <span class="hljs-string">"no"</span>]
    words = input_text.split() 
    clean_words = [word <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> words <span class="hljs-keyword">if</span> (word <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> stopwords_list <span class="hljs-keyword">or</span> word <span class="hljs-keyword">in</span> whitelist) <span class="hljs-keyword">and</span> len(word) &gt; <span class="hljs-number">1</span>] 
    <span class="hljs-keyword">return</span> <span class="hljs-string">" "</span>.join(clean_words) 

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">remove_mentions</span>(<span class="hljs-params">input_text</span>):</span>
    <span class="hljs-string">'''
    Function to remove mentions, preceded by @, in a Pandas Series

    Parameters:
        input_text : text to clean
    Output:
        cleaned Pandas Series 
    '''</span>
    <span class="hljs-keyword">return</span> re.sub(<span class="hljs-string">r'@\w+'</span>, <span class="hljs-string">''</span>, input_text)
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">compare_models_by_metric</span>(<span class="hljs-params">model_1, model_2, model_hist_1, model_hist_2, metric</span>):</span>
    <span class="hljs-string">'''
    Function to compare a metric between two models 

    Parameters:
        model_hist_1 : training history of model 1
        model_hist_2 : training history of model 2
        metrix : metric to compare, loss, acc, val_loss or val_acc

    Output:
        plot of metrics of both models
    '''</span>
    metric_model_1 = model_hist_1.history[metric]
    metric_model_2 = model_hist_2.history[metric]
    e = range(<span class="hljs-number">1</span>, NB_START_EPOCHS + <span class="hljs-number">1</span>)

    metrics_dict = {
        <span class="hljs-string">'acc'</span> : <span class="hljs-string">'Training Accuracy'</span>,
        <span class="hljs-string">'loss'</span> : <span class="hljs-string">'Training Loss'</span>,
        <span class="hljs-string">'val_acc'</span> : <span class="hljs-string">'Validation accuracy'</span>,
        <span class="hljs-string">'val_loss'</span> : <span class="hljs-string">'Validation loss'</span>
    }

    metric_label = metrics_dict[metric]
    plt.plot(e, metric_model_1, <span class="hljs-string">'bo'</span>, label=model_1.name)
    plt.plot(e, metric_model_2, <span class="hljs-string">'b'</span>, label=model_2.name)
    plt.xlabel(<span class="hljs-string">'Epoch number'</span>)
    plt.ylabel(metric_label)
    plt.title(<span class="hljs-string">'Comparing '</span> + metric_label + <span class="hljs-string">' between models'</span>)
    plt.legend()
    plt.show()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">optimal_epoch</span>(<span class="hljs-params">model_hist</span>):</span>
    <span class="hljs-string">'''
    Function to return the epoch number where the validation loss is
    at its minimum

    Parameters:
        model_hist : training history of model
    Output:
        epoch number with minimum validation loss
    '''</span>
    min_epoch = np.argmin(model_hist.history[<span class="hljs-string">'val_loss'</span>]) + <span class="hljs-number">1</span>
    print(<span class="hljs-string">"Minimum validation loss reached in epoch {}"</span>.format(min_epoch))
    <span class="hljs-keyword">return</span> min_epoch
</code></pre>
<h1 id="heading-data-preparation">Data preparation</h1>
<h2 id="heading-data-cleaning">Data cleaning</h2>
<p>We load the CSV with the tweets and perform a random shuffle. It’s a good practice to shuffle the data before splitting between a train and test set. That way the sentiment classes are equally distributed over the train and test sets. We’ll only keep the <strong>text</strong> column as input and the <strong>airline_sentiment</strong> column as the target.</p>
<p>The next thing we’ll do is <strong>remove </strong>stopwords<strong>**. Stopwords do not have any value for predicting the sentiment. Furthermore, as we want to build a model that can be used for other airline companies as well, we </strong>remove the mentions**.</p>
<pre><code class="lang-python">df = pd.read_csv(input_path / <span class="hljs-string">'Tweets.csv'</span>)
df = df.reindex(np.random.permutation(df.index))  
df = df[[<span class="hljs-string">'text'</span>, <span class="hljs-string">'airline_sentiment'</span>]]
df.text = df.text.apply(remove_stopwords).apply(remove_mentions)
</code></pre>
<h2 id="heading-train-test-split">Train-Test split</h2>
<p>The evaluation of the model performance needs to be done on a separate test set. As such, we can estimate how well the model generalizes. This is done with the <strong>train_test_split</strong> method of scikit-learn.</p>
<pre><code>X_train, X_test, y_train, y_test = train_test_split(df.text, df.airline_sentiment, test_size=<span class="hljs-number">0.1</span>, random_state=<span class="hljs-number">37</span>)
</code></pre><h2 id="heading-converting-words-to-numbers">Converting words to numbers</h2>
<p>To use the text as input for a model, we first need to convert the words into tokens, which simply means converting the words into integers that refer to an index in a dictionary. Here we will only keep the most frequent words in the training set.</p>
<p>We clean up the text by applying <strong>filters</strong> and putting the words to <strong>lowercase</strong>. Words are separated by spaces.</p>
<pre><code class="lang-python">tk = Tokenizer(num_words=NB_WORDS,
               filters=<span class="hljs-string">'!"#$%&amp;()*+,-./:;&lt;=&gt;?@[\\]^_`{"}~\t\n'</span>,
               lower=<span class="hljs-literal">True</span>,
               char_level=<span class="hljs-literal">False</span>,
               split=<span class="hljs-string">' '</span>)
tk.fit_on_texts(X_train)
</code></pre>
<p>After having created the dictionary we can convert the text of a tweet to a vector with NB_WORDS values. With <strong>mode=binary</strong>, it contains an indicator whether the word appeared in the tweet or not. This is done with the <strong>texts_to_matrix</strong> method of the Tokenizer.</p>
<pre><code class="lang-python">X_train_oh = tk.texts_to_matrix(X_train, mode=<span class="hljs-string">'binary'</span>)
X_test_oh = tk.texts_to_matrix(X_test, mode=<span class="hljs-string">'binary'</span>)
</code></pre>
<h2 id="heading-converting-the-target-classes-to-numbers">Converting the target classes to numbers</h2>
<p>We need to convert the target classes to numbers as well, which in turn are one-hot-encoded with the <strong>to_categorical</strong> method in Keras.</p>
<pre><code class="lang-python">le = LabelEncoder()
y_train_le = le.fit_transform(y_train)
y_test_le = le.transform(y_test)
y_train_oh = to_categorical(y_train_le)
y_test_oh = to_categorical(y_test_le)
</code></pre>
<h2 id="heading-splitting-off-a-validation-set">Splitting off a validation set</h2>
<p>Now that our data is ready, we split off a validation set. This validation set will be used to evaluate the model performance when we tune the parameters of the model.</p>
<pre><code class="lang-python">X_train_rest, X_valid, y_train_rest, y_valid = train_test_split(X_train_oh, y_train_oh, test_size=<span class="hljs-number">0.1</span>, random_state=<span class="hljs-number">37</span>)
</code></pre>
<h1 id="heading-deep-learning">Deep learning</h1>
<h2 id="heading-creating-a-model-that-overfits">Creating a model that overfits</h2>
<p>We start with a model that overfits. It has 2 densely connected layers of 64 elements. The <strong>input_shape</strong> for the first layer is equal to the number of words we kept in the dictionary and for which we created one-hot-encoded features.</p>
<p>As we need to predict 3 different sentiment classes, the last layer has 3 elements. The <strong>softmax</strong> activation function makes sure the three probabilities sum up to 1.</p>
<p>The number of parameters to train is computed as <strong>(nb inputs x nb elements in hidden layer) + nb bias terms</strong>. The number of inputs for the first layer equals the number of words in our corpus. The subsequent layers have the number of outputs of the previous layer as inputs. So the number of parameters per layer are:</p>
<ul>
<li>First layer : (10000 x 64) + 64 = 640064</li>
<li>Second layer : (64 x 64) + 64 = 4160</li>
<li>Last layer : (64 x 3) + 3 = 195</li>
</ul>
<pre><code class="lang-python">base_model = models.Sequential()
base_model.add(layers.Dense(<span class="hljs-number">64</span>, activation=<span class="hljs-string">'relu'</span>, input_shape=(NB_WORDS,)))
base_model.add(layers.Dense(<span class="hljs-number">64</span>, activation=<span class="hljs-string">'relu'</span>))
base_model.add(layers.Dense(<span class="hljs-number">3</span>, activation=<span class="hljs-string">'softmax'</span>))
base_model.name = <span class="hljs-string">'Baseline model'</span>
</code></pre>
<p>Because this project is a multi-class, single-label prediction, we use <strong>categorical_crossentropy</strong> as the loss function and <strong>softmax</strong> as the final activation function. We fit the model on the train data and validate on the validation set. We run for a predetermined number of epochs and will see when the model starts to overfit.</p>
<pre><code class="lang-python">base_history = deep_model(base_model, X_train_rest, y_train_rest, X_valid, y_valid)
base_min = optimal_epoch(base_history)
eval_metric(base_model, base_history, <span class="hljs-string">'loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_ZwKhGQkYF3FqQlhe.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>In the beginning, the <strong>validation loss</strong> goes down. But at epoch 3 this stops and the validation loss starts increasing rapidly. This is when the models begin to overfit.</p>
<p>The <strong>training loss</strong> continues to go down and almost reaches zero at epoch 20. This is normal as the model is trained to fit the train data as well as possible.</p>
<h1 id="heading-handling-overfitting">Handling overfitting</h1>
<p>Now, we can try to do something about the overfitting. There are different options to do that.</p>
<ul>
<li><strong>Reduce the network’s capacity</strong> by removing layers or reducing the number of elements in the hidden layers</li>
<li>Apply <strong>regularization</strong>, which comes down to adding a cost to the loss function for large weights</li>
<li>Use <strong>Dropout layers</strong>, which will randomly remove certain features by setting them to zero</li>
</ul>
<h2 id="heading-reducing-the-networks-capacity">Reducing the network’s capacity</h2>
<p>Our first model has a large number of trainable parameters. The higher this number, the easier the model can memorize the target class for each training sample. Obviously, this is not ideal for generalizing on new data.</p>
<p>By lowering the capacity of the network, you force it to learn the patterns that matter or that minimize the loss. On the other hand, reducing the network’s capacity too much will lead to <strong>underfitting</strong>. The model will not be able to learn the relevant patterns in the train data.</p>
<p>We reduce the network’s capacity by removing one hidden layer and lowering the number of elements in the remaining layer to 16.</p>
<pre><code class="lang-python">reduced_model = models.Sequential()
reduced_model.add(layers.Dense(<span class="hljs-number">16</span>, activation=<span class="hljs-string">'relu'</span>, input_shape=(NB_WORDS,)))
reduced_model.add(layers.Dense(<span class="hljs-number">3</span>, activation=<span class="hljs-string">'softmax'</span>))
reduced_model.name = <span class="hljs-string">'Reduced model'</span>
reduced_history = deep_model(reduced_model, X_train_rest, y_train_rest, X_valid, y_valid)
reduced_min = optimal_epoch(reduced_history)
eval_metric(reduced_model, reduced_history, <span class="hljs-string">'loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_ZDi9EJn6dORo4LCY.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>We can see that it takes more epochs before the reduced model starts overfitting. The validation loss also goes up slower than our first model.</p>
<pre><code class="lang-python">compare_models_by_metric(base_model, reduced_model, base_history, reduced_history, <span class="hljs-string">'val_loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_W8RSZtaBR-SDIGn5.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>When we compare the validation loss of the baseline model, it is clear that the reduced model starts overfitting at a later epoch. The validation loss stays lower much longer than the baseline model.</p>
<h2 id="heading-applying-regularization">Applying regularization</h2>
<p>To address overfitting, we can apply weight regularization to the model. This will add a cost to the loss function of the network for large weights (or parameter values). As a result, you get a simpler model that will be forced to learn only the relevant patterns in the train data.</p>
<p>There are <strong>L1 regularization and L2 regularization</strong>.</p>
<ul>
<li>L1 regularization will add a cost with regards to the <strong>absolute value of the parameters</strong>. It will result in some of the weights to be equal to zero.</li>
<li>L2 regularization will add a cost with regards to the <strong>squared value of the parameters</strong>. This results in smaller weights.</li>
</ul>
<p>Let’s try with L2 regularization.</p>
<pre><code class="lang-python">reg_model = models.Sequential()
reg_model.add(layers.Dense(<span class="hljs-number">64</span>, kernel_regularizer=regularizers.l2(<span class="hljs-number">0.001</span>), activation=<span class="hljs-string">'relu'</span>, input_shape=(NB_WORDS,)))
reg_model.add(layers.Dense(<span class="hljs-number">64</span>, kernel_regularizer=regularizers.l2(<span class="hljs-number">0.001</span>), activation=<span class="hljs-string">'relu'</span>))
reg_model.add(layers.Dense(<span class="hljs-number">3</span>, activation=<span class="hljs-string">'softmax'</span>))
reg_model.name = <span class="hljs-string">'L2 Regularization model'</span>
reg_history = deep_model(reg_model, X_train_rest, y_train_rest, X_valid, y_valid)
reg_min = optimal_epoch(reg_history)
</code></pre>
<p>For the regularized model we notice that it starts overfitting in the same epoch as the baseline model. However, the loss increases much slower afterward.</p>
<pre><code class="lang-python">eval_metric(reg_model, reg_history, <span class="hljs-string">'loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_4UV4tgegrn6UOdRX.png" alt="Image" width="600" height="400" loading="lazy"></p>
<pre><code class="lang-python">compare_models_by_metric(base_model, reg_model, base_history, reg_history, <span class="hljs-string">'val_loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_5ybMcTqkOXyC7xc4.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-adding-dropout-layers">Adding dropout layers</h2>
<p>The last option we’ll try is to add dropout layers. A dropout layer will randomly set output features of a layer to zero.</p>
<pre><code class="lang-python">drop_model = models.Sequential()
drop_model.add(layers.Dense(<span class="hljs-number">64</span>, activation=<span class="hljs-string">'relu'</span>, input_shape=(NB_WORDS,)))
drop_model.add(layers.Dropout(<span class="hljs-number">0.5</span>))
drop_model.add(layers.Dense(<span class="hljs-number">64</span>, activation=<span class="hljs-string">'relu'</span>))
drop_model.add(layers.Dropout(<span class="hljs-number">0.5</span>))
drop_model.add(layers.Dense(<span class="hljs-number">3</span>, activation=<span class="hljs-string">'softmax'</span>))
drop_model.name = <span class="hljs-string">'Dropout layers model'</span>
drop_history = deep_model(drop_model, X_train_rest, y_train_rest, X_valid, y_valid)
drop_min = optimal_epoch(drop_history)
eval_metric(drop_model, drop_history, <span class="hljs-string">'loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_B0SuqLpCTYGP4Bwz.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The model with dropout layers starts overfitting later than the baseline model. The loss also increases slower than the baseline model.</p>
<pre><code class="lang-python">compare_models_by_metric(base_model, drop_model, base_history, drop_history, <span class="hljs-string">'val_loss'</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_ykk8IT9v3wgsq1Wf.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The model with the dropout layers starts overfitting later. Compared to the baseline model the loss also remains much lower.</p>
<h1 id="heading-training-on-the-full-train-data-and-evaluation-on-test-data">Training on the full train data and evaluation on test data</h1>
<p>At first sight, the reduced model seems to be the best model for generalization. But let’s check that on the test set.</p>
<pre><code class="lang-python">base_results = test_model(base_model, X_train_oh, y_train_oh, X_test_oh, y_test_oh, base_min)
reduced_results = test_model(reduced_model, X_train_oh, y_train_oh, X_test_oh, y_test_oh, reduced_min)
reg_results = test_model(reg_model, X_train_oh, y_train_oh, X_test_oh, y_test_oh, reg_min)
drop_results = test_model(drop_model, X_train_oh, y_train_oh, X_test_oh, y_test_oh, drop_min)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/1_faz4dZBCsh3yB6tAZzSXkg.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>As shown above, all three options help to reduce overfitting. We manage to increase the accuracy on the test data substantially. Among these three options, the model with the dropout layers performs the best on the test data.</p>
<p>You can find the notebook on <a target="_blank" href="https://github.com/bertcarremans/TwitterUSAirlineSentiment/blob/master/source/Handling%20overfitting%20in%20deep%20learning%20models.ipynb">GitHub</a>. Have fun with it!</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
