keras - freeCodeCamp.org

How to Convert a Keras SavedModel into a Browser-based Web App

freeCodeCamp — Tue, 18 May 2021 16:36:22 +0000

By Suchandra Datta

If you're a Python developer who works with Keras SavedModels, this article is for you.

Perhaps you're not sure how to use SavedModels to leverage the power of machine learning in browser-based web apps. But don't worry – we'll cover all the basic steps you need to get started.

Along with that, we'll go over some important concepts that'll help make it easier for you to transition to JavaScript from Python.

Before we dive into the process, let's address some questions that are likely to pop into your mind at this point.

What is a Keras SavedModel?

A Keras model is made up of the network architecture, model weights, and an optimizer for your loss function.

The default format for saving models on disk is the SavedModel format. This format allows us to save models with custom objects with minimum hassle.

SavedModel stores the optimizer, loss, and network architecture in the saved_model.pb file while the weights are stored in the variables directory.

For more detailed information on the SavedModel format, check out the official docs here.

How do I train a Keras SavedModel if I don't have a GPU?

Most machine learning enthusiasts without access to GPU facilities start off with model development on Google Colaboratory.

I've been an avid admirer of Google Colab and its features ever since I first became interested in the field of machine learning. It offers a Jupyter Notebook environment with free access to GPU's with a maximum training time of 12 hours.

If you've got any questions regarding Google Colaboratory, head over to their FAQ section linked here.

Why would I want to convert a SavedModel into a web app?

Web-based products are everywhere, and they're generally pretty easy to use. You're probably reading this article from a browser right now, either from your phone, desktop, or laptop.

Machine learning models, at the end of the day, are meant to be used in the real-world not kept inside a glass box. So what better way to bring your model to users than through a web-based medium?

On top of that, browser-based apps don't require any installation overhead and can be accessed uniformly from multiple devices.

Okay then, let's get started

I had built a simple emotion detection CNN model that could predict 7 emotions (happy, sad, neutral, angry, surprise, fear and disgust) using Python and the Keras API.

Trying to convert it into a format suitable for the web without prior experience proved to be a bit difficult. The entire process, which I'll describe next, is thanks to the wonderful documentation of Tensorflow.js, the MDN Web docs, and Firebase hosting documentation.

Using these resources, I was able to narrow down the process to the following steps:

Convert Keras SavedModel to the Tensorflow.js Layers Format
Load the model via JavaScript and Promises
Access an image uploaded by a user
Preprocess the uploaded image
Model inference in browser and display output via a user interface

Let's look at each of these steps in greater detail.

Photo from Unsplash

How to Convert a Keras SavedModel to the Tensorflow.js Layers Format

To convert a Keras SavedModel to the Tensorflow.js layers format, we'd need to use the tensorflowjs_converter script. We can also use the Python API as described in their official docs here.

I ran into a frustrating error with the former, as for some reason the tensorflowjs_converter did not seem to work on Google Colab.

I had saved the model on drive and the "My Drive" part of the file path, specifically the space, seemed to be causing trouble. I found it mentioned in this GitHub issue #3618 here.

Using the Python API worked seamlessly, which gave me a model.json file for the model architecture and binary files for the weights. Now I was ready to use it on the web!

Code to convert SavedModel Format to Layers Format

But wait! Why do we need to convert? Why don't we just train our model using Tensorflow.js itself?

Well, you need to do this conversion if you've already spent a lot of time training your Keras models on large datasets and don't want to rewrite and retrain it using JavaScript.

How to Load the Model via JavaScript and Promises

Tensorflow.js is a JavaScript-based library for machine learning model development. You can use it in the browser as well as through the popular JavaScript runtime Node.js.

You can set it up in two different ways: either by including it using a script tag or using it through Node.js.

Since the CNN model I trained is fairly straight-forward, I opted for the script tag approach.

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js">script>

Now that we've included the Tensorflow.js library, the next step is to load the model. We can load the model in the following ways:

Browser's local storage
Browser's IndexedDB storage
From an HTTP or HTTPS endpoint
From native file system using Node.js

Loading the model from an HTTPS endpoint seemed to be the most feasible way for me. So I hosted the model files on Firebase Hosting and loaded the model using the following code:

const model = await tf.loadLayersModel('model.json');

Tensorflow uses the fetch method to load resources using a Promise-based approach. Fetch returns a Promise which resolves to the response containing the requested resources.

A Promise in JavaScript is a proxy for a value which you don't know at this current instant in time, but that will maybe be known at some later point in time.

For example, when requesting for URL-based resources, we don't know immediately if we'll actually get those resources – we'll have to wait for some time until the server responds (or doesn't).

But waiting in any form is detrimental to responsiveness and continued user interaction, which is critical for web pages. So JavaScript allows you to use asynchronous calls via Promises. These let you request resources AND continue with subsequent statements irrespective of the server's response.

To allow cleaner and easier error handling with Promises, async/await was introduced. Await blocks control flow until a Promise returns and the functions with await statements are declared async.

How to Access an Image Uploaded by a User

Let's create a simple file upload functionality using an HTML input tag and another button that'll start the prediction computations when clicked.

<div class="container" id="tray">
        <div id="uploadFile" class="custombutton">
            <i class="fa fa-file" style="font-size:25px;color: #1ab5e3">i><br/><br/>
            <input type="file" name="fileupload" accept="image/*" onchange="display(event)">
        div>
        <div class="custombutton">
            <i class="fa fa-bar-chart" style="font-size:25px;color: #1ab5e3">i><br/><br/>
            <input type="button" name="predict" onclick="predict_emotion()" value="PREDICT">
        div>
    div>

The file upload and predict buttons look like this:

Next, we access the image file uploaded and display it using object URLs as described in the MDN Web docs linked here.

let input_image = document.getElementById("input_image")
input_image.src = URL.createObjectURL(event.target.files[0]);
document.getElementById("input_image_container").style.display = "block";

<div id="input_image_container"><img src="#" id="input_image" style="top:5vh;">div>

After uploading an image, it looks like this:

How to Preprocess the Uploaded Image

This is model domain-specific, and requires different steps for different applications.

For my model, I didn't have to do much, just some simple normalization and resizing which I easily performed using Tensorflow.js functions.

Do check out their official API reference for a thorough understanding of the functions offered and their use cases.

//Preprocessing steps 
        /*
        (1)Resize to 48*48
        (2)Convert to grayscale using simple mean
        (3)Convert to float
        (4)Reshape to (1,48,48,1)
        (5)Normalize by dividing by 255.0
        */
let step1 = tf.browser.fromPixels(input)
.resizeNearestNeighbor([48,48])
.mean(2)
.toFloat()
.expandDims(0)
.expandDims(-1)
.div(255.0)

Model Inference in the Browser and Displaying the Output via a User Interface

The predict function returns the predictions – in our case, a tensor with 7 probability values for the 7 emotions.

We scale up the probabilities for displaying in the browser using one div for each emotion and the div's width to specify the scaled up probability value.

pred = model.predict(step1)
pred.data()
    .then((data) => {console.log(data)
                   output = document.getElementById("output_chart")
                output.innerHTML = ""
                max_val = -1
                max_val_index = -1
                for(let i=0;i"width: "+data[i]*150+"px; height: 25px; position:relative; margin-top: 3vh; background-color: violet; "
                    output.innerHTML+="
"
                    if(data[i] > max_val)
                    {
                        max_val = data[i]
                        max_val_index = i
                    }
                }
                EMOTION_DETECTED = emotions[max_val_index]
                document.getElementsByClassName("output_screen")[0].style.display="flex";
document.getElementById("output_text").innerHTML=""
document.getElementById("output_text").innerHTML = "Emotions and corresponding scaled up probability
Emotion detected: " + EMOTION_DETECTED + "(" + (max_val*100).toFixed(2) + "% probability)
"

Great – we've got all the building blocks ready! Now let's put it all together. We'll integrate the following parts:

The HTML markup which serves as a simple UI
Script tag for accessing Tensorflow.js
Script tag for our Font Awesome icons
JavaScript code for model loading, inference, and output

Here is the final JavaScript code:

//Display image uploaded by user
function display(event)
    {
        let input_image = document.getElementById("input_image")
        input_image.src = URL.createObjectURL(event.target.files[0]);
        document.getElementById("input_image_container").style.display = "block";
    }

//Predict emotion and display output
async function predict_emotion()
    {
        let input = document.getElementById("input_image");
        //Preprocessing steps 
        /*
        (1)Resize to 48*48
        (2)Convert to grayscale using simple mean
        (3)Convert to float
        (4)Reshape to (1,48,48,1)
        (5)Normalize by dividing by 255.0
        */
        let step1 = tf.browser.fromPixels(input).resizeNearestNeighbor([48,48]).mean(2).toFloat().expandDims(0).expandDims(-1).div(255.0)
        const model = await tf.loadLayersModel('model.json');
        pred = model.predict(step1)
        pred.print()
        console.log("End of predict function")
        //This array is encoded with index i = corresponding emotion. In dataset, 0 = Angry, 1 = Disgust, 2 = Fear, 3 = Happy, 4 = Sad, 5 = Surprise and 6 = Neutral
        emotions = ["Angry", "Disgust", "Fear", "Happy", "Sad", "Surprise", "Neutral"]
        //At which index in tensor we get the largest value ?
        pred.data()
            .then((data) => {console.log(data)
                output = document.getElementById("output_chart")
                output.innerHTML = ""
                max_val = -1
                max_val_index = -1
                for(let i=0;i"width: "+data[i]*150+"px; height: 25px; position:relative; margin-top: 3vh; background-color: violet; "
                    output.innerHTML+="
"
                    if(data[i] > max_val)
                    {
                        max_val = data[i]
                        max_val_index = i
                    }
                }
                EMOTION_DETECTED = emotions[max_val_index]
                document.getElementsByClassName("output_screen")[0].style.display="flex";
                document.getElementById("output_text").innerHTML=""
                document.getElementById("output_text").innerHTML = "Emotions and corresponding scaled up probability
Emotion detected: " + EMOTION_DETECTED + "(" + (max_val*100).toFixed(2) + "% probability)
"
        })    

    }

Here's the final HTML and script tags:

html>
<html>
<head>
    <title>title>
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
    <link rel="stylesheet" type="text/css" href="styles/page_styling.css">

head>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js">script>
<body>
    <div id="input_image_container"><img src="#" id="input_image" style="top:5vh;">div>
    <div class="container" id="tray">
        <div id="uploadFile" class="custombutton">
            <i class="fa fa-file" style="font-size:25px;color: #1ab5e3">i><br/><br/>
            <input type="file" name="fileupload" accept="image/*" onchange="display(event)">
        div>
        <div class="custombutton">
            <i class="fa fa-bar-chart" style="font-size:25px;color: #1ab5e3">i><br/><br/>
            <input type="button" name="predict" onclick="predict_emotion()" value="PREDICT">
        div>
    div>
    <div class="container output_screen">
        <div id="emotion_tags">
            <ul>
                <li>Angryli>
                <li>Disgustli>
                <li>Fearli>
                <li>Happyli>
                <li>Sadli>
                <li>Surpriseli>
                <li>Neutralli>
            ul>
        div>
        <div id="output_chart">div>
        <div id="output_text">div>
    div>
<script src="scripts/script.js">script>
body>
html>

Here's a sample output, where the top three predicted emotions are sad, happy, and neutral:

Predictions and UI

Wrapping up

In this article, we went through the basic steps you need to go through to convert a Keras SavedModel to a web-friendly format. We learned how to load, preprocess, and infer in the browser using Tensorflow.js and display output via a user interface.

I hope you enjoyed reading this article and found it helpful. Have a good day and I wish you good luck in your coding journey!

Photo from Unsplash

Keras Course – Learn Python Deep Learning and Neural Networks

Beau Carnes — Thu, 18 Jun 2020 20:51:08 +0000

Keras is a neural network API written in Python and integrated with TensorFlow. You can learn how to use Keras in a new video course on the freeCodeCamp.org YouTube channel.

In this course from deeplizard, you will learn how to prepare and process data for artificial neural networks, build and train artificial neural networks from scratch, build and train convolutional neural networks (CNNs), implement fine-tuning and transfer learning, and more.

Each section of the course focuses on a specific concept, and shows how the full implementation is done in code using Keras and Python.

You will learn to build some networks from scratch. Others will be pre-trained state-of-the-art models that you'll get to fine-tune to the data. Then you'll learn how to deploy models using both front-end and back-end deployment techniques.

Here's the full course syllabus:

Part 1: Artificial Neural Network Basics

Section 1: Intro to Keras and neural networks

Processing data
Building and training neural networks
Validation and inference
Saving and loading models

Section 2: Convolutional Neural Networks (CNNs)

Image processing
Building and training CNNs
Using CNNs for inference

Section 3: Fine-tuning and transfer learning

Intro to fine-tuning and VGG16 model
Implement fine-tuning on VGG16 model
Using fine-tuned models for inference
Intro to MobileNet
Fine-tuning MobileNet on subset of data

Section 4: Additional topics

Data augmentation
Keras' image labeling implementation
Achieving reproducible results
Learnable parameters

Part 2: Neural network model deployment

Section 1: Deployment with Flask

Introduction to Flask and web services
Build a simple Flask app and web app
Send and receive data with Flask
Host neural network with Flask
Build neural network web app to interact with Flask service
Integrating data visualization with D3, DC, Crossfilter
Alternative ways to access neural network from Powershell and Curl
Information privacy and data protection

Section 2: Deployment with TensorFlow.js

Introduction to client-side neural networks
Convert Keras model to TFJS model
Set up Node.js and Express
Build UI for neural network web app
Host a neural network with TFJS
Explore tensor operations through image processing
Examine tensor operations with debugger
Broadcasting tensors
Efficiency of hosting MobileNet in the browser

You can watch the full course on the freeCodeCamp.org YouTube channel (3 hour watch).

How to Handle Overfitting in Deep Learning Models

freeCodeCamp — Sun, 05 Jan 2020 22:36:48 +0000

By Bert Carremans

Overfitting occurs when you achieve a good fit of your model on the training data, but it does not generalize well on new, unseen data. In other words, the model learned patterns specific to the training data, which are irrelevant in other data.

We can identify overfitting by looking at validation metrics like loss or accuracy. Usually, the validation metric stops improving after a certain number of epochs and begins to decrease afterward. The training metric continues to improve because the model seeks to find the best fit for the training data.

There are several manners in which we can reduce overfitting in deep learning models. The best option is to get more training data. Unfortunately, in real-world situations, you often do not have this possibility due to time, budget, or technical constraints.

Another way to reduce overfitting is to lower the capacity of the model to memorize the training data. As such, the model will need to focus on the relevant patterns in the training data, which results in better generalization. In this post, we’ll discuss three options to achieve this.

Set up the project

We start by importing the necessary packages and configuring some parameters. We will use Keras to fit the deep learning models. The training data is the Twitter US Airline Sentiment data set from Kaggle.

# Basic packages
import pandas as pd 
import numpy as np
import re
import collections
import matplotlib.pyplot as plt
from pathlib import Path
# Packages for data preparation
from sklearn.model_selection import train_test_split
from nltk.corpus import stopwords
from keras.preprocessing.text import Tokenizer
from keras.utils.np_utils import to_categorical
from sklearn.preprocessing import LabelEncoder
# Packages for modeling
from keras import models
from keras import layers
from keras import regularizers
NB_WORDS = 10000  # Parameter indicating the number of words we'll put in the dictionary
NB_START_EPOCHS = 20  # Number of epochs we usually start to train with
BATCH_SIZE = 512  # Size of the batches used in the mini-batch gradient descent
MAX_LEN = 20  # Maximum number of words in a sequence
root = Path('../')
input_path = root / 'input/' 
ouput_path = root / 'output/'
source_path = root / 'source/'

Some helper functions

We will use some helper functions throughout this post.

def deep_model(model, X_train, y_train, X_valid, y_valid):
    '''
    Function to train a multi-class model. The number of epochs and 
    batch_size are set by the constants at the top of the
    notebook. 

    Parameters:
        model : model with the chosen architecture
        X_train : training features
        y_train : training target
        X_valid : validation features
        Y_valid : validation target
    Output:
        model training history
    '''
    model.compile(optimizer='rmsprop'
                  , loss='categorical_crossentropy'
                  , metrics=['accuracy'])

    history = model.fit(X_train
                       , y_train
                       , epochs=NB_START_EPOCHS
                       , batch_size=BATCH_SIZE
                       , validation_data=(X_valid, y_valid)
                       , verbose=0)
    return history
def eval_metric(model, history, metric_name):
    '''
    Function to evaluate a trained model on a chosen metric. 
    Training and validation metric are plotted in a
    line chart for each epoch.

    Parameters:
        history : model training history
        metric_name : loss or accuracy
    Output:
        line chart with epochs of x-axis and metric on
        y-axis
    '''
    metric = history.history[metric_name]
    val_metric = history.history['val_' + metric_name]
    e = range(1, NB_START_EPOCHS + 1)
    plt.plot(e, metric, 'bo', label='Train ' + metric_name)
    plt.plot(e, val_metric, 'b', label='Validation ' + metric_name)
    plt.xlabel('Epoch number')
    plt.ylabel(metric_name)
    plt.title('Comparing training and validation ' + metric_name + ' for ' + model.name)
    plt.legend()
    plt.show()
def test_model(model, X_train, y_train, X_test, y_test, epoch_stop):
    '''
    Function to test the model on new data after training it
    on the full training data with the optimal number of epochs.

    Parameters:
        model : trained model
        X_train : training features
        y_train : training target
        X_test : test features
        y_test : test target
        epochs : optimal number of epochs
    Output:
        test accuracy and test loss
    '''
    model.fit(X_train
              , y_train
              , epochs=epoch_stop
              , batch_size=BATCH_SIZE
              , verbose=0)
    results = model.evaluate(X_test, y_test)
    print()
    print('Test accuracy: {0:.2f}%'.format(results[1]*100))
    return results

def remove_stopwords(input_text):
    '''
    Function to remove English stopwords from a Pandas Series.

    Parameters:
        input_text : text to clean
    Output:
        cleaned Pandas Series 
    '''
    stopwords_list = stopwords.words('english')
    # Some words which might indicate a certain sentiment are kept via a whitelist
    whitelist = ["n't", "not", "no"]
    words = input_text.split() 
    clean_words = [word for word in words if (word not in stopwords_list or word in whitelist) and len(word) > 1] 
    return " ".join(clean_words) 

def remove_mentions(input_text):
    '''
    Function to remove mentions, preceded by @, in a Pandas Series

    Parameters:
        input_text : text to clean
    Output:
        cleaned Pandas Series 
    '''
    return re.sub(r'@\w+', '', input_text)
def compare_models_by_metric(model_1, model_2, model_hist_1, model_hist_2, metric):
    '''
    Function to compare a metric between two models 

    Parameters:
        model_hist_1 : training history of model 1
        model_hist_2 : training history of model 2
        metrix : metric to compare, loss, acc, val_loss or val_acc

    Output:
        plot of metrics of both models
    '''
    metric_model_1 = model_hist_1.history[metric]
    metric_model_2 = model_hist_2.history[metric]
    e = range(1, NB_START_EPOCHS + 1)

    metrics_dict = {
        'acc' : 'Training Accuracy',
        'loss' : 'Training Loss',
        'val_acc' : 'Validation accuracy',
        'val_loss' : 'Validation loss'
    }

    metric_label = metrics_dict[metric]
    plt.plot(e, metric_model_1, 'bo', label=model_1.name)
    plt.plot(e, metric_model_2, 'b', label=model_2.name)
    plt.xlabel('Epoch number')
    plt.ylabel(metric_label)
    plt.title('Comparing ' + metric_label + ' between models')
    plt.legend()
    plt.show()

def optimal_epoch(model_hist):
    '''
    Function to return the epoch number where the validation loss is
    at its minimum

    Parameters:
        model_hist : training history of model
    Output:
        epoch number with minimum validation loss
    '''
    min_epoch = np.argmin(model_hist.history['val_loss']) + 1
    print("Minimum validation loss reached in epoch {}".format(min_epoch))
    return min_epoch

Data preparation

Data cleaning

We load the CSV with the tweets and perform a random shuffle. It’s a good practice to shuffle the data before splitting between a train and test set. That way the sentiment classes are equally distributed over the train and test sets. We’ll only keep the text column as input and the airline_sentiment column as the target.

The next thing we’ll do is remove stopwords**. Stopwords do not have any value for predicting the sentiment. Furthermore, as we want to build a model that can be used for other airline companies as well, we remove the mentions**.

df = pd.read_csv(input_path / 'Tweets.csv')
df = df.reindex(np.random.permutation(df.index))  
df = df[['text', 'airline_sentiment']]
df.text = df.text.apply(remove_stopwords).apply(remove_mentions)

Train-Test split

The evaluation of the model performance needs to be done on a separate test set. As such, we can estimate how well the model generalizes. This is done with the train_test_split method of scikit-learn.

X_train, X_test, y_train, y_test = train_test_split(df.text, df.airline_sentiment, test_size=0.1, random_state=37)

Converting words to numbers

To use the text as input for a model, we first need to convert the words into tokens, which simply means converting the words into integers that refer to an index in a dictionary. Here we will only keep the most frequent words in the training set.

We clean up the text by applying filters and putting the words to lowercase. Words are separated by spaces.

tk = Tokenizer(num_words=NB_WORDS,
               filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{"}~\t\n',
               lower=True,
               char_level=False,
               split=' ')
tk.fit_on_texts(X_train)

After having created the dictionary we can convert the text of a tweet to a vector with NB_WORDS values. With mode=binary, it contains an indicator whether the word appeared in the tweet or not. This is done with the texts_to_matrix method of the Tokenizer.

X_train_oh = tk.texts_to_matrix(X_train, mode='binary')
X_test_oh = tk.texts_to_matrix(X_test, mode='binary')

Converting the target classes to numbers

We need to convert the target classes to numbers as well, which in turn are one-hot-encoded with the to_categorical method in Keras.

le = LabelEncoder()
y_train_le = le.fit_transform(y_train)
y_test_le = le.transform(y_test)
y_train_oh = to_categorical(y_train_le)
y_test_oh = to_categorical(y_test_le)

Splitting off a validation set

Now that our data is ready, we split off a validation set. This validation set will be used to evaluate the model performance when we tune the parameters of the model.

X_train_rest, X_valid, y_train_rest, y_valid = train_test_split(X_train_oh, y_train_oh, test_size=0.1, random_state=37)

Deep learning

Creating a model that overfits

We start with a model that overfits. It has 2 densely connected layers of 64 elements. The input_shape for the first layer is equal to the number of words we kept in the dictionary and for which we created one-hot-encoded features.

As we need to predict 3 different sentiment classes, the last layer has 3 elements. The softmax activation function makes sure the three probabilities sum up to 1.

The number of parameters to train is computed as (nb inputs x nb elements in hidden layer) + nb bias terms. The number of inputs for the first layer equals the number of words in our corpus. The subsequent layers have the number of outputs of the previous layer as inputs. So the number of parameters per layer are:

First layer : (10000 x 64) + 64 = 640064
Second layer : (64 x 64) + 64 = 4160
Last layer : (64 x 3) + 3 = 195

base_model = models.Sequential()
base_model.add(layers.Dense(64, activation='relu', input_shape=(NB_WORDS,)))
base_model.add(layers.Dense(64, activation='relu'))
base_model.add(layers.Dense(3, activation='softmax'))
base_model.name = 'Baseline model'

Because this project is a multi-class, single-label prediction, we use categorical_crossentropy as the loss function and softmax as the final activation function. We fit the model on the train data and validate on the validation set. We run for a predetermined number of epochs and will see when the model starts to overfit.

base_history = deep_model(base_model, X_train_rest, y_train_rest, X_valid, y_valid)
base_min = optimal_epoch(base_history)
eval_metric(base_model, base_history, 'loss')

In the beginning, the validation loss goes down. But at epoch 3 this stops and the validation loss starts increasing rapidly. This is when the models begin to overfit.

The training loss continues to go down and almost reaches zero at epoch 20. This is normal as the model is trained to fit the train data as well as possible.

Handling overfitting

Now, we can try to do something about the overfitting. There are different options to do that.

Reduce the network’s capacity by removing layers or reducing the number of elements in the hidden layers
Apply regularization, which comes down to adding a cost to the loss function for large weights
Use Dropout layers, which will randomly remove certain features by setting them to zero

Reducing the network’s capacity

Our first model has a large number of trainable parameters. The higher this number, the easier the model can memorize the target class for each training sample. Obviously, this is not ideal for generalizing on new data.

By lowering the capacity of the network, you force it to learn the patterns that matter or that minimize the loss. On the other hand, reducing the network’s capacity too much will lead to underfitting. The model will not be able to learn the relevant patterns in the train data.

We reduce the network’s capacity by removing one hidden layer and lowering the number of elements in the remaining layer to 16.

reduced_model = models.Sequential()
reduced_model.add(layers.Dense(16, activation='relu', input_shape=(NB_WORDS,)))
reduced_model.add(layers.Dense(3, activation='softmax'))
reduced_model.name = 'Reduced model'
reduced_history = deep_model(reduced_model, X_train_rest, y_train_rest, X_valid, y_valid)
reduced_min = optimal_epoch(reduced_history)
eval_metric(reduced_model, reduced_history, 'loss')

We can see that it takes more epochs before the reduced model starts overfitting. The validation loss also goes up slower than our first model.

compare_models_by_metric(base_model, reduced_model, base_history, reduced_history, 'val_loss')

When we compare the validation loss of the baseline model, it is clear that the reduced model starts overfitting at a later epoch. The validation loss stays lower much longer than the baseline model.

Applying regularization

To address overfitting, we can apply weight regularization to the model. This will add a cost to the loss function of the network for large weights (or parameter values). As a result, you get a simpler model that will be forced to learn only the relevant patterns in the train data.

There are L1 regularization and L2 regularization.

L1 regularization will add a cost with regards to the absolute value of the parameters. It will result in some of the weights to be equal to zero.
L2 regularization will add a cost with regards to the squared value of the parameters. This results in smaller weights.

Let’s try with L2 regularization.

reg_model = models.Sequential()
reg_model.add(layers.Dense(64, kernel_regularizer=regularizers.l2(0.001), activation='relu', input_shape=(NB_WORDS,)))
reg_model.add(layers.Dense(64, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
reg_model.add(layers.Dense(3, activation='softmax'))
reg_model.name = 'L2 Regularization model'
reg_history = deep_model(reg_model, X_train_rest, y_train_rest, X_valid, y_valid)
reg_min = optimal_epoch(reg_history)

For the regularized model we notice that it starts overfitting in the same epoch as the baseline model. However, the loss increases much slower afterward.

eval_metric(reg_model, reg_history, 'loss')

compare_models_by_metric(base_model, reg_model, base_history, reg_history, 'val_loss')

Adding dropout layers

The last option we’ll try is to add dropout layers. A dropout layer will randomly set output features of a layer to zero.

drop_model = models.Sequential()
drop_model.add(layers.Dense(64, activation='relu', input_shape=(NB_WORDS,)))
drop_model.add(layers.Dropout(0.5))
drop_model.add(layers.Dense(64, activation='relu'))
drop_model.add(layers.Dropout(0.5))
drop_model.add(layers.Dense(3, activation='softmax'))
drop_model.name = 'Dropout layers model'
drop_history = deep_model(drop_model, X_train_rest, y_train_rest, X_valid, y_valid)
drop_min = optimal_epoch(drop_history)
eval_metric(drop_model, drop_history, 'loss')

The model with dropout layers starts overfitting later than the baseline model. The loss also increases slower than the baseline model.

compare_models_by_metric(base_model, drop_model, base_history, drop_history, 'val_loss')

The model with the dropout layers starts overfitting later. Compared to the baseline model the loss also remains much lower.

Training on the full train data and evaluation on test data

At first sight, the reduced model seems to be the best model for generalization. But let’s check that on the test set.

base_results = test_model(base_model, X_train_oh, y_train_oh, X_test_oh, y_test_oh, base_min)
reduced_results = test_model(reduced_model, X_train_oh, y_train_oh, X_test_oh, y_test_oh, reduced_min)
reg_results = test_model(reg_model, X_train_oh, y_train_oh, X_test_oh, y_test_oh, reg_min)
drop_results = test_model(drop_model, X_train_oh, y_train_oh, X_test_oh, y_test_oh, drop_min)

Conclusion

As shown above, all three options help to reduce overfitting. We manage to increase the accuracy on the test data substantially. Among these three options, the model with the dropout layers performs the best on the test data.

You can find the notebook on GitHub. Have fun with it!

A Deep Dive into Word Embeddings for Sentiment Analysis

freeCodeCamp — Sun, 05 Jan 2020 14:27:33 +0000

By Bert Carremans

When applying one-hot encoding to words, we end up with sparse (containing many zeros) vectors of high dimensionality. On large data sets, this could cause performance issues.

Additionally, one-hot encoding does not take into account the semantics of the words. So words like airplane and aircraft are considered to be two different features while we know that they have a very similar meaning. Word embeddings address these two issues.

Word embeddings are dense vectors with much lower dimensionality. Secondly, the semantic relationships between words are reflected in the distance and direction of the vectors.

We will work with the TwitterAirlineSentiment data set on Kaggle. This data set contains roughly 15K tweets with 3 possible classes for the sentiment (positive, negative and neutral). In my previous post, we tried to classify the tweets by tokenizing the words and applying two classifiers. Let’s see if word embeddings can outperform that.

After reading this tutorial you will know how to compute task-specific word embeddings with the Embedding layer of Keras. Secondly, we will investigate whether word embeddings trained on a larger corpus can improve the accuracy of our model.

The structure of this tutorial is:

Intuition behind word embeddings
Project set-up
Data preparation
Keras and its Embedding layer
Pre-trained word embeddings — GloVe
Training word embeddings with more dimensions

Intuition behind word embeddings

Before we can use words in a classifier, we need to convert them into numbers. One way to do that is to simply map words to integers. Another way is to one-hot encode words. Each tweet could then be represented as a vector with a dimension equal to (a limited set of) the words in the corpus. The words occurring in the tweet have a value of 1 in the vector. All other vector values equal zero.

Word embeddings are computed differently. Each word is positioned into a multi-dimensional space. The number of dimensions in this space is chosen by the data scientist. You can experiment with different dimensions and see what provides the best result.

The vector values for a word represent its position in this embedding space. Synonyms are found close to each other while words with opposite meanings have a large distance between them. You can also apply mathematical operations on the vectors which should produce semantically correct results. A typical example is that the sum of the word embeddings of king and female produces the word embedding of queen.

Project set-up

Let’s start by importing all packages for this project.

import pandas as pd
import numpy as np
import re
import collections
import matplotlib.pyplot as plt
from pathlib import Path
from sklearn.model_selection import train_test_split
from nltk.corpus import stopwords
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils.np_utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from keras import models
from keras import layers

We define some parameters and paths used throughout the project. Most of them are self-explanatory. But others will be explained further in the code.

NB_WORDS = 10000  # Parameter indicating the number of words we'll put in the dictionary
VAL_SIZE = 1000  # Size of the validation set
NB_START_EPOCHS = 10  # Number of epochs we usually start to train with
BATCH_SIZE = 512  # Size of the batches used in the mini-batch gradient descent
MAX_LEN = 24  # Maximum number of words in a sequence
GLOVE_DIM = 100  # Number of dimensions of the GloVe word embeddings
root = Path('../')
input_path = root / 'input/'
ouput_path = root / 'output/'
source_path = root / 'source/'

Throughout this code, we will also use some helper functions for data preparation, modeling and visualization. These function definitions are not shown here to keep the blog post clutter free. You can always refer to the notebook in Github to look at the code.

Data preparation

Reading the data and cleaning

We read in the CSV file with the tweets and apply a random shuffle on its indexes. After that, we remove stop words and @ mentions. A test set of 10% is split off to evaluate the model on new data.

df = pd.read_csv(input_path / 'Tweets.csv')
df = df.reindex(np.random.permutation(df.index))
df = df[['text', 'airline_sentiment']]
df.text = df.text.apply(remove_stopwords).apply(remove_mentions)
X_train, X_test, y_train, y_test = train_test_split(df.text, df.airline_sentiment, test_size=0.1, random_state=37)

Convert words into integers

With the Tokenizer from Keras, we convert the tweets into sequences of integers. We limit the number of words to the _NBWORDS most frequent words. Additionally, the tweets are cleaned with some filters, set to lowercase and split on spaces.

tk = Tokenizer(num_words=NB_WORDS,
filters='!"#$%&()*+,-./:;<=>?@[\]^_`{"}~\t\n',lower=True, split=" ")
tk.fit_on_texts(X_train)
X_train_seq = tk.texts_to_sequences(X_train)
X_test_seq = tk.texts_to_sequences(X_test)

Equal length of sequences

Each batch needs to provide sequences of equal length. We achieve this with the _padsequences method. By specifying maxlen, the sequences or padded with zeros or truncated.

X_train_seq_trunc = pad_sequences(X_train_seq, maxlen=MAX_LEN)
X_test_seq_trunc = pad_sequences(X_test_seq, maxlen=MAX_LEN)

Encoding the target variable

The target classes are strings which need to be converted into numeric vectors. This is done with the LabelEncoder from Sklearn and the _tocategorical method from Keras.

le = LabelEncoder()
y_train_le = le.fit_transform(y_train)
y_test_le = le.transform(y_test)
y_train_oh = to_categorical(y_train_le)
y_test_oh = to_categorical(y_test_le)

Splitting off the validation set

From the training data, we split off a validation set of 10% to use during training.

X_train_emb, X_valid_emb, y_train_emb, y_valid_emb = train_test_split(X_train_seq_trunc, y_train_oh, test_size=0.1, random_state=37)

Modeling

Keras and the Embedding layer

Keras provides a convenient way to convert each word into a multi-dimensional vector. This can be done with the Embedding layer. It will compute the word embeddings (or use pre-trained embeddings) and look up each word in a dictionary to find its vector representation. Here we will train word embeddings with 8 dimensions.

emb_model = models.Sequential()
emb_model.add(layers.Embedding(NB_WORDS, 8, input_length=MAX_LEN))
emb_model.add(layers.Flatten())
emb_model.add(layers.Dense(3, activation='softmax'))
emb_history = deep_model(emb_model, X_train_emb, y_train_emb, X_valid_emb, y_valid_emb)

We have a validation accuracy of about 74%. The number of words in the tweets is rather low, so this result is quite good. By comparing the training and validation loss, we see that the model starts overfitting from epoch 6.

In a previous article, I discussed how we can avoid overfitting. You might want to read that if you want to deep dive on that topic.

When we train the model on all data (including the validation data, but excluding the test data) and set the number of epochs to 6, we get a test accuracy of 78%. This test result is OK, but let’s see if we can improve with pre-trained word embeddings.

emb_results = test_model(emb_model, X_train_seq_trunc, y_train_oh, X_test_seq_trunc, y_test_oh, 6)
print('/n')
print('Test accuracy of word embeddings model: {0:.2f}%'.format(emb_results[1]*100))

Pre-trained word embeddings — Glove

Because the training data is not so large, the model might not be able to learn good embeddings for the sentiment analysis. Alternatively, we can load pre-trained word embeddings built on a much larger training data.

The GloVe database contains multiple pre-trained word embeddings, and more specific embeddings trained on tweets. So this might be useful for the task at hand.

First, we put the word embeddings in a dictionary where the keys are the words and the values the word embeddings.

glove_file = 'glove.twitter.27B.' + str(GLOVE_DIM) + 'd.txt'
emb_dict = {}
glove = open(input_path / glove_file)
for line in glove:
    values = line.split()
    word = values[0]
    vector = np.asarray(values[1:], dtype='float32')
    emb_dict[word] = vector
glove.close()

With the GloVe embeddings loaded in a dictionary, we can look up the embedding for each word in the corpus of the airline tweets. These will be stored in a matrix with a shape of _NBWORDS and _GLOVEDIM. If a word is not found in the GloVe dictionary, the word embedding values for the word are zero.

emb_matrix = np.zeros((NB_WORDS, GLOVE_DIM))
for w, i in tk.word_index.items():
    if i < NB_WORDS:
        vect = emb_dict.get(w)
        if vect is not None:
        emb_matrix[i] = vect
    else:
        break

Then we specify the model just like we did with the model above.

glove_model = models.Sequential()
glove_model.add(layers.Embedding(NB_WORDS, GLOVE_DIM, input_length=MAX_LEN))
glove_model.add(layers.Flatten())
glove_model.add(layers.Dense(3, activation='softmax'))

In the Embedding layer (which is layer 0 here) we set the weights for the words to those found in the GloVe word embeddings. By setting trainable to False we make sure that the GloVe word embeddings cannot be changed. After that, we run the model.

glove_model.layers[0].set_weights([emb_matrix])
glove_model.layers[0].trainable = False
glove_history = deep_model(glove_model, X_train_emb, y_train_emb, X_valid_emb, y_valid_emb)

The model overfits fast after 3 epochs. Furthermore, the validation accuracy is lower compared to the embeddings trained on the training data.

glove_results = test_model(glove_model, X_train_seq_trunc, y_train_oh, X_test_seq_trunc, y_test_oh, 3)
print('/n')
print('Test accuracy of word glove model: {0:.2f}%'.format(glove_results[1]*100))

As a final exercise, let’s see what results we get when we train the embeddings with the same number of dimensions as the GloVe data.

Training word embeddings with more dimensions

We will train the word embeddings with the same number of dimensions as the GloVe embeddings (i.e. GLOVE_DIM).

emb_model2 = models.Sequential()
emb_model2.add(layers.Embedding(NB_WORDS, GLOVE_DIM, input_length=MAX_LEN))
emb_model2.add(layers.Flatten())
emb_model2.add(layers.Dense(3, activation='softmax'))
emb_history2 = deep_model(emb_model2, X_train_emb, y_train_emb, X_valid_emb, y_valid_emb)

emb_results2 = test_model(emb_model2, X_train_seq_trunc, y_train_oh, X_test_seq_trunc, y_test_oh, 3)
print('/n')
print('Test accuracy of word embedding model 2: {0:.2f}%'.format(emb_results2[1]*100))

On the test data we get good results, but we do not outperform the LogisticRegression with the CountVectorizer. So there is still room for improvement.

Conclusion

The best result is achieved with 100-dimensional word embeddings that are trained on the available data. This even outperforms the use of word embeddings that were trained on a much larger Twitter corpus.

Until now we have just put a Dense layer on the flattened embeddings. By doing this, we do not take into account the relationships between the words in the tweet. This can be achieved with a recurrent neural network or a 1D convolutional network. But that’s something for a future post :)

How to classify butterflies with deep learning in Keras

freeCodeCamp — Thu, 08 Aug 2019 18:59:32 +0000

By Bert Carremans

A while ago I read an interesting blog post on the website of the Dutch organization Vlinderstichting. Every year they organize a count of butterflies. Volunteers help in determining the different butterfly species in their garden. The Vlinderstichting gathers and analyses the results.

As the determination of the butterfly species is done by the volunteers, inevitably this process is prone to errors. As a result, the Vlinderstichting has to manually check the submissions, which is time-consuming.

Specifically, there are three butterflies for which the Vlinderstichting receives many wrong determinations. These are

Meadow brown or Maniola jurtina
Gatekeeper or Pyronia tithonus
Small heath or Coenonympha pamphilus

In this article, I will describe the steps to fit a deep learning model that helps to make the distinction between the first two butterflies.

Downloading images with the Flickr API

To train a convolutional neural network I need to find images of butterflies with the correct label. Surely I could take pictures myself of the butterflies that I want to classify. They sometimes fly around in my garden…

Just kidding, that would take ages. For this, I need an automated way to get the images. To do that I use the Flickr API via Python.

Setting up the Flickr API

Firstly, I install the flickrapi package with pip. Then I create the necessary API keys on the Flickr website to connect to the Flickr API.

Besides the flickrapi package, I import the os and urllib packages for downloading the images and setting up the directories.

from flickrapi import FlickrAPI
import urllib
import os
import config

In the config module, I define the public and secret keys for the Flickr API. So this is simply a Python script (config.py) with the code below:

API_KEY = 'XXXXXXXXXXXXXXXXX'  // replace with your key
API_SECRET = 'XXXXXXXXXXXXXXXXX'  // replace with your secret
IMG_FOLDER = 'XXXXXXXXXXXXXXXXX'  // replace with your folder to store the images

I keep these keys in a separate file for security reasons. As a result, you can save the code in a public repository like GitHub or BitBucket and putting the config.py in .gitignore. Consequently, you can share your code with others while not having to worry about someone having access to your credentials.

To extract images of different butterfly species, I wrote a function download_flickr_photos. I will explain this function step by step. In addition, I’ve made the full code available on GitHub.

Input parameters

First of all, I check if the input parameters are of the correct type or values. If not, I raise an error. The explanation of the parameters can be found in the docstring of the function.

if not (isinstance(keywords, str) or isinstance(keywords, list)):
    raise AttributeError('keywords must be a string or a list of strings')
if not (size in ['thumbnail', 'square', 'medium', 'original']):
    raise AttributeError('size must be "thumbnail", "square", "medium" or "original"')
if not (max_nb_img == -1 or (max_nb_img > 0 and isinstance(max_nb_img, int))):
    raise AttributeError('max_nb_img must be an integer greater than zero or equal to -1')

Secondly, I define some of the parameters that will be used in the walk method later on. I create a list for the keywords and determine from which URL the images need to be downloaded.

if isinstance(keywords, str):
    keywords_list = []
    keywords_list.append(keywords)
else:
    keywords_list = keywords
if size == 'thumbnail':
    size_url = 'url_t'
elif size == 'square':
    size_url = 'url_q'
elif size == 'medium':
    size_url = 'url_c'
elif size == 'original':
    size_url = 'url_o'

Connecting to the Flickr API

Next, I connect to the Flickr API. In the FlickrAPI call I use the API keys defined in the config module.

flickr = FlickrAPI(config.API_KEY, config.API_SECRET)

Creating subfolders per butterfly species

I save the images of each butterfly species in a separate subfolder. The name of each subfolder is the butterfly species’ name, given by the keyword. If the subfolder does not exist yet, I create it.

results_folder = config.IMG_FOLDER + keyword.replace(" ", "_") + "/"
if not os.path.exists(results_folder):
    os.makedirs(results_folder)

Walking around in the Flickr library

photos = flickr.walk(
    text=keyword,
    extras='url_m',
    license='1,2,4,5',
    per_page=50)

I use the walk method of the Flickr API to search for images for the specified keyword. This walk method has the same parameters as the search method in the Flickr API.

In the text parameter, I use the keyword to search for images related to this keyword. Secondly, in the extras parameter, I specify url_m for a small, medium size of the images. More explanation on the image sizes and their respective URL is given in this Flickcurl C library.

Thirdly, in the license parameter, I select images with a non-commercial license. More on the license codes and their meaning can be found on the Flickr API platform. Finally, the per_page parameter specifies how many images I allow per page.

As a result, I have a generator called photos to download the images.

Downloading Flickr images

With the photos generator, I can download all the images found for the search query. First I get the specific URL at which I will download the image. Then I increment the count variable and use this counter to create the image filenames.

With the urlretrieve method, I download the image and save it in the folder for the butterfly species. If an error occurs I print out the error message.

for photo in photos:
    try:
        url=photo.get('url_m')
        print(url)
        count += 1
        urllib.request.urlretrieve(url,  results_folder + str(count) +".jpg")
    except Exception as e:
        print(e, 'Download failure')

To download multiple butterfly species, I create a list and call the function download_flickr_photos in a for loop. For simplicity, I only download two butterfly species of the three mentioned above.

butterflies = ['meadow brown butterfly', 'gatekeeper butterfly']
for butterfly in butterflies:
    download_flickr_photos(butterfly)

Data augmentation of images

Training a convnet on a small number of images will result in overfitting. Consequently, the model will make errors in classifying new, unseen images. Data augmentation can help to avoid this. Luckily Keras has some nice tools to transform images easily.

I’d like to compare it with how my son classifies cars on the road. At the moment he’s only 2 years old and hasn’t seen as many cars as an adult. So you could say his training set of images is rather small. Therefore he’s more likely to misclassify cars. For instance, he sometimes takes an ambulance mistakenly for a police van.

As he will grow older, he will see more ambulances and police vans, with the corresponding label that I will give him. So his training set will become larger and thus he will classify them more correctly.

For that reason, we need to provide the convnet with more butterfly images than we have at the moment. An easy solution for that is data augmentation. In short, this means applying a set of transformations to the Flickr images.

Keras provides a wide range of image transformations. But first, we’ll have to convert the images so that Keras can work with them.

Converting an image to numbers

We start by importing the Keras module. We will demonstrate the image transformations with one example image. For that purpose, we use the load_img method.

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
i = load_img('data/train/maniola_jurtina/1.jpg' )
x = img_to_array(i)
x = x.reshape((1,) + x.shape)

The load_img method creates a Python Image Library file. We’ll need to convert this to a Numpy array to use it in the ImageDataGenerator method later on. That’s done with the handy img_to_array method. As a result, we have an array of shape 75x75x3. These dimensions reflect the width, height and RGB values.

In fact, each pixel of the image has 3 RGB values. These range between 0 and 255 and represent the intensity of Red, Green and Blue. A lower value stands for higher intensity and a higher value for lower intensity. For instance, one pixel can be represented as a list of these three values [ 78, 136, 60]. Black would represented as [0, 0, 0].

Finally, we need to add an extra dimension to avoid a ValueError when applying the transformations. This is done with the reshape function.

Alright, now we have something to work with. Let’s continue with the transformations.

Rotation

By specifying a value between 0 and 180, Keras will randomly choose an angle to rotate the image. It will do this clockwise or counter-clockwise. In our example, the image will be rotated with maximum of 90 degrees.

ImageDataGenerator also has a parameter fill_mode. The default value is ‘nearest’. By rotating the image within the width and height of the original image we end up with “empty” pixels. The fill_mode then uses the nearest pixels to fill this empty space.

imgGen = ImageDataGenerator(rotation_range = 90)
i = 1
for batch in imgGen.flow(x, batch_size=1, save_to_dir='example_transformations', save_format='jpeg', save_prefix='trsf'):
    i += 1
    if i > 3:
        break

In the flow method, we specify where to save the transformed images. Make sure this directory exists! We also prefix the newly created images for convenience. The flow method would run infinitely, but for this example, we only generate three images. So when our counter reaches this value, we break the for loop. You can see the result below.

Width shift

In the width_shift_range parameter, you specify the ratio of the original width by which the image can be shifted to the left or right. Again, the fill_mode will fill up the newly created empty pixels. For the remaining examples, I will only show how to instantiate the ImageDataGenerator with the respective parameter. The code to generate the images is the same as in the rotation example.

imgGen = ImageDataGenerator(width_shift_range = 90)

In the transformed images we see that the image is shifted to the right. The empty pixels are filled which gives it a bit of a stretched look.

The same can be done for shifting up or down by specifying a value for the height_shift_range parameter.

Rescale

Rescaling an image will multiply the RGB values of each pixel by a chosen value before any other preprocessing. In our example, we apply min-max scaling to the values. As a result, these values will range between 0 and 1. This makes the values smaller and easier for the model to process.

imgGen = ImageDataGenerator(rescale = 1./255)

Shear

With the shear_range parameter, we can specify how the shearing transformations must be applied. This transformation can produce rather weird images when the value is set too high. So don’t set it too high.

imgGen = ImageDataGenerator(shear_range = 0.2)

Zoom

This transformation will zoom inside the picture. Just like the shearing parameter, this value should not be exaggerated to keep the images realistic.

imgGen = ImageDataGenerator(zoom_range = 0.2)

Horizontal flip

This transformation flips an image horizontally. Life can be simple sometimes…

imgGen = ImageDataGenerator(horizontal_flip = True)

All transformations combined

Now that we have seen the effect of each transformation separately, we apply all the combinations together.

imgGen = ImageDataGenerator(
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    rescale = 1./255,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True)
i = 1
for batch in imgGen.flow(x, batch_size=1, save_to_dir='example_transformations', save_format='jpeg', save_prefix='all'):
    i += 1
    if i > 3:
        break

Setting up the folder structure

We need to store these images in a specific folder structure. As such we can use the method flow_from_directory to augment the images and create the corresponding labels. This folder structure needs to look like this:

train
maniola_jurtina
0.jpg
1.jpg
…
pyronia_tithonus
0.jpg
1.jpg
…
validation
maniola_jurtina
0.jpg
1.jpg
…
pyronia_tithonus
0.jpg
1.jpg
…

To create this folder structure I created a gist img_train_test_split.py. Feel free to use it in your projects.

Creating the generators

Just as before, we specify the configuration parameters for the training generator. The validation images will not be transformed as the training images. We only divide the RGB values to make them smaller.

The flow_from_directory method takes the images from the train or validation folder and generates batches of 32 transformed images. By setting the class_mode to ‘binary’ a one-dimensional label is created based on the image’s folder name.

train_datagen = ImageDataGenerator(
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    rescale = 1./255,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True)
validation_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
    'data/train',
    batch_size=32,
    class_mode='binary')
validation_generator = validation_datagen.flow_from_directory(
    'data/validation',
    batch_size=32,
    class_mode='binary')

What about different image sizes?

The Flickr API lets you download images of specific sizes. However, in real-world applications the image sizes are not always constant. If the aspect ratio of the images is the same, we can simply resize the images. Otherwise, we can crop the images. Unfortunately, it is difficult to crop the image while keeping the object we want to classify intact.

Keras can deal with different image sizes. When configuring the model you can specify None for the width and height in input_shape.

input_shape=(3, None, None)  # Theano
input_shape=(None, None, 3)  # Tensorflow

I wanted to show that it is possible to work with different image sizes, however, it has some drawbacks.

not all layers (e.g. Flatten) will work with None as an input dimension
it can be computationally heavy to run

Building the deep learning model

For the remainder of this article, I will discuss the structure of a convolutional neural network, illustrated with some examples for our butterfly project. At the end of this article, we’ll have our first classification results.

What layers does a convolutional neural network consist of?

Of course, you can choose how many layers and their type to add to your convolutional neural network (also called CNN or convnet). In this project we will start with the following structure:

Let’s understand what each layer does and how we create them with Keras.

Input layer

These different versions of the images were modified via several transformations. Then, these images are converted into a numerical representation or a matrix.

The dimensions of this matrix will be width x height x number of (color) channels. For RGB images the number of channels will be three. For grayscale images, this is equal to one. Below you can see a numerical representation of a 7×7 RGB image.

As our images are of size 75×75, we need to specify that in the input_shape parameter when adding the first convolutional layer.

cnn = Sequential()
cnn.add(Conv2D(32,(3,3), input_shape = (3 ,75 ,75)))

Convolutional layer

In the first layers, the convolutional neural network will look for lower-level features, like horizontal or vertical edges. The further we go in the network it will look for higher-level features, such as a wing of a butterfly, for example. But how does it detect features when it gets only numbers as input? That’s where filters come in.

Filters (or kernels)

You can think of a filter as a searchlight of a specific size that scans over the image. The filter example below has dimensions of 3x3x3 and contains weights that will detect a vertical edge. For a grayscale image, the dimensions would have been 3x3x1. Usually, a filter has smaller dimensions than the image we want to classify. 3×3, 5×5 or 7×7 are typically used. The third dimension should always be equal to the number of channels.

While scanning the image, the RGB values are transformed. It does this transformation by multiplying the RGB values with the filter’s weights. Finally, the multiplied values are then summed over all channels. In our 7x7x3 image example and the 3x3x3 filter, this would result in a 5x5x1 outcome.

The animation below illustrates this convolutional operation. For simplicity, we only look for a vertical edge in the Red channel. Thus, the weights for the Green and Blue channels are all equal to zero. But you should keep in mind that the multiplication results for these channels are added to the result of the Red channel.

As shown below the convolutional layer will produce numerical outcomes. When you have higher numbers, this means that the filter came across the feature it was looking for. In our example, a vertical edge.

We can specify that we want more than one filter. These filters could have their own feature to look for in an image. Suppose we use 32 filters of size 3x3x3. The result of all filters is stacked and we end up with a 5x5x32 volume in our example. In the code snippet above we added 32 filters of size 3x3x3.

Stride

In the example above we saw that the filter moves up one pixel at a time. This is the so-called stride. We could increase the number of pixels the filter moves up. Increasing the stride will reduce the dimensions of the original image much faster. In the example below, you see how the filter moves around with a stride of 2, which would result in a 3x3x1 outcome for a 3x3x3 filter and a 7x7x3 image.

Padding

By applying a filter, the dimensions of the original image are quickly reduced. Especially the pixels at the edges of the image are only used once in the convolutional operation. This results in a loss of information. If you want to avoid that, you can specify padding. Padding adds “extra pixels” around the image.

Suppose we add padding of one pixel around the 7x7x3 image. This results in a 9x9x3 image. If we apply a 3x3x3 filter and a stride of 1, we end up with a 7x7x1 outcome. So, in that case, we preserve the dimensions of the original image and the outer pixels are used more than once.

You can calculate the resulting outcome of the convolutional operation with specific padding and stride as follows:

1 + [(original dimension + padding x 2 — filter dimension) / stride size]

For example, suppose we have this set-up of our conv layer:

7x7x3 image
3x3x3 filter
padding of 1 pixel
stride of 2 pixels

That will give 1 + [(7 + 1 x 2–3) / 2] = 4

Why do we need convolutional layers?

A benefit of using conv layers is that the number of parameters to estimate is much lower. Much lower compared to having a normal hidden layer. Suppose we continue with our example image of 7x7x3 and a filter of 3x3x3 with no padding and stride of 1. The convolutional layer would have 5x5x1 + 1 bias = 26 weights to estimate. In a neural network with 7x7x3 inputs and 5x5x1 neurons in the hidden layer, we would need to estimate 3.675 weights. Imagine what this number is when you have larger images…

ReLu layer

Or Rectified Linear unit layer. This layer adds nonlinearity to the network. The convolutional layer is a linear layer as it sums up the multiplications of the filter weights and RGB values.

The outcome of a ReLu function is equal to zero for all values of x <= 0. Otherwise, it is equal to the value of x. The code in Keras to add a ReLu layer is:

cnn.add(Activation(‘relu’))

Pooling

Pooling aggregates the input volume in order to reduce the dimensions further. This speeds up computation time as the number of parameters to be estimated are reduced. Besides that, it helps to avoid overfitting by making the network more robust. Below we illustrate max pooling with a size of 2×2 and stride of 2.

The code in Keras to add pooling with a size of 2×2 is:

cnn.add(MaxPooling2D(pool_size = (2 ,2)))

Fully connected layer

At the end, the convnet is able to detect higher level features in the input images. This can then serve as an input for a fully connected layer. Before we can do that, we will flatten the output of the last ReLu layer. Flattening means we convert it to a vector. The vector values are then connected to all neurons in the fully connected layer. To do that in Python we use the following Keras functions:

cnn.add(Flatten())        
cnn.add(Dense(64))

Dropout

Just like pooling, dropout can help to avoid overfitting. It randomly sets a specified fraction of the inputs to zero, during the training of the model. A dropout rate between 20 and 50% is considered to work well.

cnn.add(Dropout(0.2))

Sigmoid activation

Because we want to produce a probability that the image is one of two butterfly species (i.e. binary classification), we can use a sigmoid activation layer.

cnn.add(Activation('relu'))
cnn.add(Dense(1))
cnn.add(Activation( 'sigmoid'))

Applying the convolutional neural network on the butterfly images

Now we can define the complete convolutional neural network structure as displayed at the beginning of this post. First, we need to import the necessary Keras modules. Then we can start adding the layers that we explained above.

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Flatten, Dense, Dropout
from keras.preprocessing.image import ImageDataGenerator
import time
IMG_SIZE = # Replace with the size of your images
NB_CHANNELS = # 3 for RGB images or 1 for grayscale images
BATCH_SIZE = # Typical values are 8, 16 or 32
NB_TRAIN_IMG = # Replace with the total number training images
NB_VALID_IMG = # Replace with the total number validation images

I made some additional parameters explicit for the conv layers. Here is a short explanation:

kernel_size specifies the filter size. So for the first conv layer this is size 2×2
padding = ‘same’ means applying zero padding as such the original image size is preserved.
padding = ‘valid’ means we do not apply any padding.
data_format = ‘channels_last’ is just to specify that the number of color channels is specified last in the input_shape argument.

cnn = Sequential()
cnn.add(Conv2D(filters=32, 
               kernel_size=(2,2), 
               strides=(1,1),
               padding='same',
               input_shape=(IMG_SIZE,IMG_SIZE,NB_CHANNELS),
               data_format='channels_last'))
cnn.add(Activation('relu'))
cnn.add(MaxPooling2D(pool_size=(2,2),
                     strides=2))
cnn.add(Conv2D(filters=64,
               kernel_size=(2,2),
               strides=(1,1),
               padding='valid'))
cnn.add(Activation('relu'))
cnn.add(MaxPooling2D(pool_size=(2,2),
                     strides=2))
cnn.add(Flatten())        
cnn.add(Dense(64))
cnn.add(Activation('relu'))
cnn.add(Dropout(0.25))
cnn.add(Dense(1))
cnn.add(Activation('sigmoid'))
cnn.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Finally, we compile this network structure and set the loss parameter to binary_crossentropy which is good for binary targets and use accuracy as the evaluation metric.

After having specified the network structure, we create the generators for the training and validation samples. On the training samples, we apply data augmentation as explained above. On the validation samples, we do not apply any augmentation as they are just used to evaluate the model performance.

train_datagen = ImageDataGenerator(
    rotation_range = 40,                  
    width_shift_range = 0.2,                  
    height_shift_range = 0.2,                  
    rescale = 1./255,                  
    shear_range = 0.2,                  
    zoom_range = 0.2,                     
    horizontal_flip = True)
validation_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_directory(
    '../flickr/img/train',
    target_size=(IMG_SIZE,IMG_SIZE),
    class_mode='binary',
    batch_size = BATCH_SIZE)
validation_generator = validation_datagen.flow_from_directory(
    '../flickr/img/validation',
    target_size=(IMG_SIZE,IMG_SIZE),
    class_mode='binary',
    batch_size = BATCH_SIZE)

With the flow_from_directory method on the generators we can easily go through all the images in the specified directories.

Lastly, we can fit the convolutional neural network on the training data and evaluate with the validation data. The resulting weights of the model can be saved and reused later on.

start = time.time()
cnn.fit_generator(
    train_generator,
    steps_per_epoch=NB_TRAIN_IMG//BATCH_SIZE,
    epochs=50,
    validation_data=validation_generator,
    validation_steps=NB_VALID_IMG//BATCH_SIZE)
end = time.time()
print('Processing time:',(end - start)/60)
cnn.save_weights('cnn_baseline.h5')

The number of epochs is arbitrarily set to 50. An epoch is the cycle of forward propagation, checking the error and then adjusting the weights during backpropagation.

The steps_per_epoch is set to the number of training images divided by the batch size (by the way, the double division symbol will make sure the result is an integer and not a float). Specifying a batch size greater than 1 will speed up the process. Idem for the validation_steps parameter.

Results

After running 50 epochs, we have a training accuracy of 0.8091 and validation accuracy of 0.7359. So the convolutional neural network still suffers from quite some overfitting. We also see that the validation accuracy varies quite a lot. This is because we have a small set of validation samples. It would be better to do k-fold cross-validation for each evaluation round. But that would take quite some time.

To address the overfitting we could:

increase the dropout rate
apply dropout at each layer
find more training data

We’ll look into the first two options and monitor the result. The results of our first model will serve as a baseline. After applying an extra dropout layer and increasing the dropout rates, the model is a bit less overfitted.

I hope you’ve all enjoyed reading this post and learned something new. The full code is available on Github. Cheers!

How to install TensorFlow and Keras using Anaconda Navigator — without the command line

freeCodeCamp — Wed, 24 Jul 2019 10:15:00 +0000

By Ekapope Viriyakovithya

Say no to pip install in the command line! Here's an alternative way to install TensorFlow on your local machine in 3 steps.

_Photo by [Unsplash](https://unsplash.com/@kowalikus?utm_source=ghost&utm_medium=referral&utm_campaign=api-credit">Krzysztof Kowalik / Why am I writing this?

I played around with pip install with multiple configurations for several hours, trying to figure how to properly set my python environment for TensorFlow and Keras.

why is tensorflow so hard to install — 600k+ results

unable to install tensorflow on windows site:stackoverflow.com — 26k+ results

Just before I gave up, I found this…

_“One key benefit of installing TensorFlow using conda rather than pip is a result of the conda package management system. When TensorFlow is installed using conda, conda installs all the necessary and compatible dependencies for the packages as well. _”__

This article will walk you through the process how to install TensorFlow and Keras by using the GUI version of Anaconda. I assume you have downloaded and installed Anaconda Navigator already.

Let’s get started!

Launch Anaconda Navigator. Go to the Environments tab and click ‘Create’.

Go to ‘Environments tab’, click ‘Create’

Input a new environment name - I put ‘tensorflow_env’. Make sure to select Python 3.6 here! Then ‘Create’, this may take few minutes.

make sure to select Python 3.6

In your new ‘tensorflow_env’ environment, select ‘Not installed’, and type in ‘tensorflow’. Then, tick ‘tensorflow’ and ‘Apply’. The pop-up window will appear, go ahead and apply. This may take several minutes.

Do the same for ‘keras’.

Check your installation by importing the packages. If everything is okay, the command will return nothing. If the installation was unsuccessful, you will get an error.

no error pop up — Yeah!

You can also try with Spyder.

no error pop up — Yeah!

And…Ta-da! It’s done! You can follow this article to test your newly installed packages :)

Thank you for reading. Please give it a try, and let me know your feedback!

Consider following me on GitHub, Medium, and Twitter to get more articles and tutorials on your feed if you like what I did. :)

Object Detection in Colab with Fizyr Retinanet

freeCodeCamp — Thu, 04 Apr 2019 17:56:59 +0000

By RomRoc

Let’s continue our journey to explore the best machine learning frameworks in computer vision.

In the first article we explored object detection with the official Tensorflow APIs. The second article was dedicated to an excellent framework for instance segmentation, Matterport Mask R-CNN based on Keras.

In this article we examine Keras implementation of RetinaNet object detection developed by Fizyr. RetinaNet, as described in Focal Loss for Dense Object Detection, is the state of the art for object detection.
The object to detect with the trained model will be my little goat Rosa.

Object detection with Fizyr

The colab notebook and dataset are available in my Github repo.

In this article, we go through all the steps in a single Google Colab netebook to train a model starting from a custom dataset.

We will keep in mind these principles:

illustrate how to make the annotation dataset
describe all the steps in a single Notebook
use free software, Google Colab and Google Drive, so it’s based exclusively on free cloud resources

At the end of the article you will be surprised by the simplicity of use and the good results we will obtain through this object detection framework.

Despite its ease of use, Fizyr is a great framework, also used by the winner of the Kaggle competition “RSNA Pneumonia Detection Challenge”.

Making the dataset

We start by creating annotations for the training and validation dataset, using the tool LabelImg. This excellent annotation tool let you quickly annotate the bounding boxes of the objects to train the machine learning model.

LabelImg annotation tool

LabelImg creates annotations in PascalVoc format, so we need to convert annotations to Fizyr format:

create a zip file containing training dataset images and annotations with the same filename (check my example dataset in Github)

objdet_dataset.zip|- img1.jpg|- img1.xml|- img2.jpg|- img2.xml...

Upload zip file in Google Drive, get Drive file id, and substitute the DATASET_DRIVEID value
Run cell that iterates over the xml files and creates annotations.csv file

Note: you can see my answer on Stackoverflow to get the Drive file id.

Model training

Model training is the core of the notebook. Fizyr offers various parameters, described in Github, to run and optimize this step.

It’s a good option to start from a pretrained model instead of training a model from scratch. Fizyr released a model based on ResNet50 architecture, pretrained on Coco dataset.

URL_MODEL = 'https://github.com/fizyr/keras-retinanet/releases/download/0.5.0/resnet50_coco_best_v2.1.0.h5'

We can even use our pretrained model, and continue the training from it. This option is particularly useful to train for some epochs, so save it in Google Drive, and later restart the training from the saved model. In this way we can bypass the 12-hour execution limit in Colab, and we can train the model for many epochs.

From my tests, a high value of batch_size and steps offers better results, but they greatly increase the execution time of each epoch.

Tensorboard training charts

We can start training from our custom dataset with:

!keras_retinanet/bin/train.py --freeze-backbone --random-transform --weights {PRETRAINED_MODEL} --batch-size 8 --steps 500 --epochs 10 csv annotations.csv classes.csv

Let’s analyze each argument passed to the script train.py.

freeze-backbone: freeze the backbone layers, particularly useful when we use a small dataset, to avoid overfitting
random-transform: randomly transform the dataset to get data augmentation
weights: initialize the model with a pretrained model (your own model or one released by Fizyr)
batch-size: training batch size, higher value gives smoother learning curve
steps: number of steps for epochs
epochs: number of epochs to train
csv: annotations files generated by the script above

The training process output contains a description of layers and loss metrics during training, and as you can see, loss metrics decrease during each epoch:

Using TensorFlow backend....Layer (type)                    Output Shape         Param #     Connected toinput_1 (InputLayer)            (None, None, None, 3 0padding_conv1 (ZeroPadding2D)   (None, None, None, 3 0           input_1[0][0]                    ...Total params: 36,382,957Trainable params: 12,821,805Non-trainable params: 23,561,152NoneEpoch 1/10500/500 [==============================] - 1314s 3s/step - loss: 1.0659 - regression_loss: 0.6996 - classification_loss: 0.3663Epoch 2/10500/500 [==============================] - 1296s 3s/step - loss: 0.6747 - regression_loss: 0.5698 - classification_loss: 0.1048Epoch 3/10500/500 [==============================] - 1304s 3s/step - loss: 0.5763 - regression_loss: 0.5010 - classification_loss: 0.0753

Epoch 3/10500/500 [==============================] - 1257s 3s/step - loss: 0.5705 - regression_loss: 0.4974 - classification_loss: 0.0732

Inference

The last step performs inference of test images with the trained model.
The Fizyr framework allows us to perform inference using CPU, even if you trained the model with GPU. This feature is important in typical production environments, where people usually opt for less expensive hardware infrastructures for inference, without GPUs.

Let’s examine the following lines in detail:

model_path = os.path.join('snapshots', sorted(os.listdir('snapshots'), reverse=True)[0])print(model_path)

# load retinanet modelmodel = models.load_model(model_path, backbone_name='resnet50')model = models.convert_model(model)

The first line sets the model file as the last model generated by the training process in /snapshots directory. Then the model is loaded from the filesystem and converted to run inference.

You can change the values of THRES_SCORE, which represents the confidence threshold to show an object detection.

Object detection inference

Conclusions

We went through the complete journey to make object detection with Fizyr implementation of RetinaNet. We created a dataset, trained a model, and ran inference (here is my Github repo for the notebook and dataset).

I was impressed by the following aspects of this excellent framework:

this framework is easy to use to get good inference, even without much customization
it was simple to transform annotations to Fizyr’s dataset format, compared to other frameworks.

In general Fizyr is a good choice to start an object detection project, in particular if you need to quickly get good results.

If you enjoyed this article, leave a few claps, it will encourage me to explore further machine learning opportunities :)

How to set up NSFW content detection with Machine Learning

freeCodeCamp — Wed, 20 Mar 2019 16:01:08 +0000

By Gant Laborde

Teaching a machine to recognize indecent content wasn’t difficult in retrospect, but it sure was tough the first time through.

Here are some lessons learned, and some tips and tricks I uncovered while building an NSFW model.

Though there are lots of ways this could have been implemented, the hope of this post is to provide a friendly narrative so that others can understand what this process can look like.

If you’re new to ML, this will inspire you to train a model. If you’re familiar with it, I’d love to hear how you would have gone about building this model and ask you to share your code.

The Plan:

Get lots and lots of data
Label and clean the data
Use Keras and transfer learning
Refine your model

Get lots and lots of data

Fortunately, a really cool set of scraping scripts were released for a NSFW dataset. The code is simple already comes with labeled data categories. This means that just accepting this data scraper’s defaults will give us 5 categories pulled from hundreds of subreddits.

The instructions are quite simple, you can simply run the 6 friendly scripts. Pay attention to them as you may decide to change things up.

If you have more subreddits that you’d like to add, you should edit the source URLs before running step 1.

E.g. — If you were to add a new source of neutral examples, you’d add to the subreddit list in nsfw_data_scraper/scripts/source_urls/neutral.txt.

Reddit is a great resource of content around the web, since most subreddits are slightly policed by humans to be on target for that subreddit.

Label and clean the data

The data we got from the NSFW data scraper is already labeled! But expect some errors. Especially since Reddit isn’t perfectly curated.

Duplication is also quite common, but fixable without slow human comparison.

The first thing I like to run is duplicate-file-finder which is the fastest exact file match and deleter. It’s powered in Python.

Qarj/duplicate-file-finder
_Find duplicate files. Contribute to Qarj/duplicate-file-finder development by creating an account on GitHub._github.com

I can generally get a majority of duplicates knocked out with this command.

python dff.py --path train/path --delete

Now, this doesn’t catch images that are “essentially” the same. For that, I advocate using a Macpaw tool called “Gemini 2”.

While this looks super simple, don’t forget to dig into the automatic duplicates, and select ALL the duplicates until your Gemini screen declares “Nothing Remaining” like so:

It’s safe to say this can take an extreme amount of time if you have a huge dataset. Personally, I ran it on each classification before I ran it on the parent folder in order to keep reasonable runtimes.

Use Keras and transfer learning

I’ve looked at Tensorflow, Pytorch, and raw Python as ways to build a machine learning model from scratch. But I’m not looking to discover something new, I want to effectively do something pre-existing. So I went pragmatic.

I found Keras to be the most practical API for writing a simple model. Even Tensorflow agrees and is currently working to be more Keras-like. Also, with only one graphics card, I’m going to grab a popular pre-existing model + weights, and simply train on top of it with some transfer learning.

After a little research, I chose Inception v3 weighted with imagenet. To me, that's like going to the pre-existing ML store and buying the Aston Martin. We’ll just shave off the top layer so we can use that model to our needs.

conv_base = InceptionV3(    
  weights='imagenet',     
  include_top=False,     
  input_shape=(height, width, num_channels)
)

With the model in place, I added 3 more layers. A 256 hidden neuron layer, followed by a hidden 128 neuron layer, followed by a final 5 neuron layer. The latter being the ultimate classification into the five final classes moderated by softmax.

# Add 256
x = Dense(256, activation='relu', kernel_initializer=initializers.he_normal(seed=None), kernel_regularizer=regularizers.l2(.0005))(x)
x = Dropout(0.5)(x)
# Add 128
x = Dense(128,activation='relu', kernel_initializer=initializers.he_normal(seed=None))(x)
x = Dropout(0.25)(x)
# Add 5
predictions = Dense(5,  kernel_initializer="glorot_uniform", activation='softmax')(x)

Visually this code turns into this:

Some of the above might seem odd. After all, it’s not everyday you say “glorot_uniform”. But strange words aside, my new hidden layers are being regularized to prevent overfitting.

I’m using dropout, which will randomly remove neural pathways so no one feature dominates the model.

Too soon?

Additionally, I’ve added L2 regularization to the first layer as well.

Now that the model is done, I augmented my dataset with some generated agitation. I rotated, shifted, cropped, sheered, zoomed, flipped, and channel shifted my training images. This helps with assuring the images are trained through common noise.

All the above systems are meant to prevent overfitting the model on the training data. Even if it is a ton of data, I want to keep the model as generalizable to new data as possible.

I gotchu model!

After running this for a long time, I got around 87% accuracy on the model! That’s a pretty good version one! Let’s make it great.

Refine your model

Basic fine-tuning

Once the new layers are trained up, you can unlock some deeper layers in your Inception model for retraining. The following code unlocks everything after as of the layer conv2d_56.

set_trainable = False
for layer in conv_base.layers:    
    if layer.name == 'conv2d_56':
        set_trainable = True
    if set_trainable:
        layer.trainable = True
    else:
        layer.trainable = False

I ran the model for a long time with these newly unlocked layers, and once I added exponential decay (via a scheduled learning rate), the model converged on a 91% accuracy on my test data!

With 300,000 images, finding mistakes in the training data was impossible. But with a model with only 9% error, I could break down the errors by category, and then I could look at only around 5,400 images! Essentially, I could use the model to help me find misclassifications and clean the dataset!

Technically, this would find false negatives only. Doing nothing for bias on the false positives, but with something that detects NSFW content, I imagine recall is more important than precision.

The most important part of refining

Even if you have a lot of test data, it’s usually pulled from the same well. The best test is to make it easy for others to use and check your model. This works best in open source and simple demos. I released http://nsfwjs.com which helped the community identify bias, and the community did just that!

The community got two interesting indicators of bias fairly quickly. The fun one was that Jeffrey Goldblum kept getting miscategorized, and the not-so-fun one was that the model was overly sensitive to females.

Once you start getting into hundreds of thousands of images, it’s hard for one person (like moi) to identify where an issue might be. Even if I looked through a thousand images in detail for bias, I wouldn’t have even scratched the surface of the dataset as a whole.

That’s why it’s important to speak up. Misclassifying Jeff Goldblum is an entertaining data point, but identifying, documenting, and filing a ticket with examples does something powerful and good. I was able to get to work on fixing the bias.

With new images, improved training, and better validation I was able to retrain the model over a few weeks and attain a much better outcome. The resulting model was far more accurate in the wild. Well, unless you laughed as hard as I did about the Jeff Goldblum issue.

If I could manufacture one flaw… I’d keep Jeff. But alas, we have hit 93% accuracy!

In Summary

It might have taken a lot of time, but it wasn’t hard, and it was fun to build a model. I suggest you grab the source code and try it for yourself! I’ll probably even attempt to retrain the model with other frameworks for comparison.

Show me what you’ve got. Contribute or ? Star/watch the repo if you’d like to see progress: https://github.com/GantMan/nsfw_model

Gant Laborde is Chief Technology Strategist at Infinite Red, a published author, adjunct professor, worldwide public speaker, and mad scientist in training. Clap/follow/tweet or visit him at a conference.

Have a minute? Check out a few more:

Avoid Nightmares — NSFW JS
_Client-side indecent content checking for the soul_shift.infinite.red 5 Things that Suck about Remote Work
_The Pitfalls of Remote Work + Proposed Solutions_shift.infinite.red