fastai, - freeCodeCamp.org

How to Use Fast.ai – A Beginner-Friendly Gateway to Deep Learning

Manish Shivanandhan — Fri, 02 Feb 2024 00:10:29 +0000

Fast.ai is a user-friendly library that brings the power of deep learning to your fingertips, regardless of your skill level. Let’s learn how it works.

Have you ever felt curious about deep learning but found the technical complexity overwhelming? Fast.ai is your answer.

Fast.ai simplifies the journey into deep learning. It makes deep learning accessible to you even if you’re not a seasoned data scientist.

In this article, we’ll explore what Fast.ai is, why it stands out, and how you can get started with some basic code examples.

What is Fast.ai?

Fast.ai is a library built on top of PyTorch, one of the leading deep-learning frameworks.

It’s designed to make deep learning more approachable. The library provides high-level components that make it easy to build and train neural networks.

What sets Fast.ai apart is its focus on practicality and its ability to be used by people with varying levels of coding experience.

Why Choose Fast.ai?

User-Friendly

The Fast.ai library simplifies the deep learning process and abstracts away many of the complex details, making it easier for users to create powerful models.

The fastai library sits on top of popular deep-learning frameworks like PyTorch. It provides a high-level API for building and training neural networks.

You can also integrate other powerful models like Hugging Face transformers using Fast.ai.

Practical Approach

Fast.ai emphasizes a practical and hands-on approach to deep learning.

The Fast.ai library focuses on practical usage and real-world applications, helping you learn by doing.

Their courses and resources are designed to help students quickly get up and running with machine learning models. These include building and training neural networks for image recognition, natural language processing, and many others.

Free Courses

Fast.ai offers free online courses that cover a wide range of deep learning topics. Fast.ai courses are a few of the best in the market and their students have gone on to become popular machine learning researchers.

These courses are known for their practicality, clear explanations, and use of real-world datasets. These courses are designed to be accessible to individuals with varying levels of prior AI knowledge.

Fast.ai also incorporates the latest developments into its courses and resources, ensuring that students have access to state-of-the-art techniques.

How to Get Started with Fast.ai

Now that you understand what Fast.ai is, let's write some code. You can check out the google colab notebook if you want to quickly try this example.

Note: It is recomended that you run this code on your system since running it in colab will take a long time (30 mins approx).

Before using the library, you have to set up your environment. Fast.ai runs on Python and requires PyTorch.

You can install Fast.ai using the pip command (remove the ! if you are installing it on your terminal, as the ! is only for colab notebooks. Notebooks treat the code following ! as shell scripts).

!pip install fastai

We’ll go through a simple sentiment analysis example in this article, demonstrating how you can implement NLP models using the fast.ai library.

Let’s start with importing the library:

from fastai.text.all import *

This line of code imports specific functionality from the Fast.ai library for natural language processing (NLP), particularly text analysis.

Let me break it down for you:

from fastai.text.all specifies that you want to import all components from the fastai.text module which contains tools and functions for working with text data.

By including this line at the beginning of your code, you make all the text-related functionality from the Fastai library available for your use, making it easier to perform tasks like sentiment analysis, text classification, and others.

Next, we’ll use the IMDB dataset, also available in Fast.ai.

path = untar_data(URLs.IMDB)

This line of code downloads and extracts the IMDB dataset, making it ready for further processing and analysis.

The variable path will contain the local file path to the dataset, allowing you to access and work with the data in your code.

Next, we have to load the data. Data loaders are used to efficiently load and process data during the training of a machine learning model.

TextDataLoaders is a class provided by the Fast.ai library that allows you to create data loaders specifically designed for text data.

dls = TextDataLoaders.from_folder(path, valid='test')

from_folder(path, valid='test') is a function call on the TextDataLoaders class. It is used to create the data loaders.

Here's what each argument means:

path: This is the directory path where your text data is stored. In this case, it's the path variable that you previously defined, which contains the local path to the IMDB dataset.
valid='test': This argument specifies which folder or subset of your data should be used for validation. In the IMDB dataset, there are typically two main subsets: train for training data and test for testing or validation data. By setting valid to test, you're indicating that the 'test' folder within the path directory should be used for validation. This is a common practice in machine learning to have a separate validation set to evaluate the model's performance during training.
The resulting dls variable will contain the text data loaders, which include both training and validation data splits. These data loaders can be used to load and preprocess text data batches during the training of your sentiment analysis model or any other text-based model.

Now that we have the data for training, let’s train the model.

We will create a text classification model using the Fast.ai library, fine-tune it on the provided text data, and train it for a specified number of epochs (repetitions).

learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)

Let’s break down each line:

text_classifier_learner — The text classification learner is used to create a learner object for training and working with text classification models. Let’s look at the arguments.
dls — This is the data loader object you previously created using TextDataLoaders.from_folder(). It contains the training and validation data for your text classification task.
AWD_LSTM — This is a pre-defined architecture for the neural network used in text classification tasks. AWD_LSTM stands for ASGD Weight-Dropped LSTM. It is a type of recurrent neural network (RNN) architecture that is effective for sequential data like text.
drop_mult=0.5 — This argument controls the amount of dropout regularization applied to the neural network. Dropout is a regularization technique used to prevent overfitting (training the model too much). drop_mult=0.5means that dropout will be applied at a moderate rate.
metrics=accuracy — This specifies that the accuracy metric should be used to evaluate the model’s performance during training. Accuracy is a common metric for classification tasks, measuring the percentage of correctly classified examples.

Now let's fine-tune the model using the loaded data.

learn.fine_tune(1)

learn.fine_tune(1) — This line of code fine-tunes the text classification model.
1 — The parameter 1 is the number of epochs for which the model will be trained. An epoch is one pass through the entire training dataset. Training for multiple epochs allows the model to learn from the data multiple times, here for simplicity’s sake we use 1.

In summary, these lines of code create a text classification model, load your text data, fine-tune the model on the data for four epochs using a specified learning rate, and use accuracy as the metric to evaluate the model’s performance.

The resulting learn object represents your trained text classification model, which can be used to make predictions on new text data.

We are done. Now our model is ready to start predicting the sentiments of the text.

Let’s test the model with a movie review.

learn.predict("I really loved that movie, it was awesome!")

And here is the result.

('pos', tensor(1), tensor([0.4885, 0.5115]))

The possays the given sentence is a positive sentence. The next array says how confident the model is in predicting whether the given sentence is positive or negative. This confidence score can be improved by increasing the number of epochs (which will take a long time to train, unless you have a powerful computer).

Hope this helps you to understand how to work with the Fast.ai library. I personally prefer to use Huggingface for most use cases, but if I have to train models from scratch, Fast.ai would be my first choice.

Conclusion

Fast.ai offers a fantastic starting point for anyone interested in deep learning. Its simplicity and practicality make it a valuable tool for both beginners and experienced practitioners.

Using Fast.ai, you’ll discover that deep learning is not as daunting as it seems. Whether you’re a student, a developer, or a curious learner, Fast.ai can be your gateway to the fascinating world of artificial intelligence. So, get started, experiment, and enjoy the journey into deep learning with Fast.ai.

If you are a student of AI, subscribe to turingtalks.ai to learn practical concepts on general machine learning and NLP. You can also visit my website to get in touch with me.

How to Deploy a TensorFlow Model as a RESTful API Service

freeCodeCamp — Mon, 07 Mar 2022 14:58:44 +0000

By Neil Ruaro

If you're like I am, then you've probably watched and read a number of tutorials on creating machine learning models with TensorFlow, PyTorch, Scikit-Learn or any other framework out there.

But there is one thing that these tutorials tend to miss out on, and that's model deployment.

In this tutorial, I'll discuss on how to deploy a CNN TensorFlow model that classifies food images to Heroku using FastAPI and Docker.

Tech We'll Be Using

If you're unfamiliar, FastAPI is a Python web framework for creating fast API applications. And in my opinion, it is the easiest to learn out of all the Python web frameworks out there.

FastAPI also has default integration with swagger documentation and makes it easy to configure and update.

Docker, on the other hand, is an industry staple in software engineering, as it is one of the most popular containerization softwares out there. Docker is used for developing, deploying, and managing applications in virtualized environments called containers.

The main selling point of using Docker is that it solves the problem "it works on my machine, why not in yours?". Coincidentally, I actually faced this exact issue working on this very project, ultimately fixing it when I decided to use Docker.

Heroku, lastly, is a cloud platform where you can deploy, manage, and scale web applications. It works with back-end applications, front-end applications, or full-stack applications.

Prerequisuites

Before we begin, you'll first need the following:

A Docker account
A Heroku account, and the Heroku CLI
A Python installation

The Application We're Building

We're going to be building a RESTful API service for a TensorFlow CNN model that classifies food images.

After building the API service, I'll show you how to dockerize the application, and then deploy it to Heroku.

How to Download the Necessities

You'll first need to clone the GitHub repository at this link.

git clone https://github.com/eRuaro/food-vision-api.git

There are two branches in this repository – you'll use the start-here branch as main is the completed branch.

Once you've gotten the cloned repository, you'll need to download Docker to your local system, and the Heroku CLI as well.

You must also install the following packages on pip:

FastAPI
TensorFlow
Numpy
Uvicorn
Image

To do so, create a requirements.txt file on the start-here branch, and put in the following. Note that you can use any other version of the listed packages below, as long as they still work together.

fastapi==0.73.0
numpy==1.19.5
uvicorn==0.15.0
image==1.5.33
tensorflow-cpu==2.7.0

After which you can install the packages using the command
pip install -r requirements.txt.

Currently our start-here branch has the saved model file, as well as the Jupyter notebook used in creating the model. The notebook also has the code that implements our API feature. That is, it implements predicting the food class of an image based on its URL link.

Brief introduction to FastAPI

With that in mind, let's start writing the code! In the root directory, create a main.py file. In that file, add the following lines of code:

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from uvicorn import run
import os

app = FastAPI()

origins = ["*"]
methods = ["*"]
headers = ["*"]

app.add_middleware(
    CORSMiddleware, 
    allow_origins = origins,
    allow_credentials = True,
    allow_methods = methods,
    allow_headers = headers    
)

@app.get("/")
async def root():
    return {"message": "Welcome to the Food Vision API!"}

if __name == "__main__":
    port = int(os.environ.get('PORT', 5000))
    run(app, host="0.0.0.0", port=port)

Running the command python -m uvicorn main:app --reload will run the app, and will listen to changes we make on the server.

Alternatively, you can use python main.py and it will run the app on port 5000, courtesy of the last 3 lines of code. However, this won't let the app listen to changes we make, so you'll have to re-run the app every time you want to see your changes.

We also added the CORSMiddleware which essentially allows us to access the API in a different host. That is, we can extend the app further by creating a front-end interface for it. We won't cover that in this article but I put it here just in case you want to create a front-end to interact with the API as well.

Going to the port where the app is running, you'll get this.

{
    "message": "Welcome to the Food Vision API!"
}

The command python -m uvicorn main:app --reload refers to the following:

main -> The file main.py
app -> The object created inside of main.py with the line app = FastAPI()
--reload -> Make the server restart after code changes

Let's dissect the code we've written so far.

@app.get("/")
async def root():
    return {"message": "Welcome to the Food Vision API!"}

@app is needed for FastAPI commands. The get is an HTTP method, while the "/" is the URL path of that specific API request. Below that we call a function that will return something. Here we just return a simple json message.

That is, we have a template for writing API endpoints with FastAPI.

@app.http_method("url_path")
async def functionName():
    return something

How to Write the API Functionality

Let's write the main API functionality, that is, taking a food image URL from the internet, and predicting the name of that food.

First, let's extend the code that we wrote earlier, import all the required functions that we'll use, and load the model itself.

from fastapi import FastAPI
from tensorflow.keras.models import load_model
from tensorflow.keras.utils import get_file 
from tensorflow.keras.utils import load_img 
from tensorflow.keras.utils import img_to_array
from tensorflow import expand_dims
from tensorflow.nn import softmax
from numpy import argmax
from numpy import max
from numpy import array
from json import dumps
from uvicorn import run
import os

app = FastAPI()
model_dir = "food-vision-model.h5"
model = load_model(model_dir)

...
...
...

if __name == "__main__":
    port = int(os.environ.get('PORT', 5000))
    run(app, host="0.0.0.0", port=port)

After loading in the model, let's add in the food classes that we have, which are based on the Food 101 dataset.

class_predictions = array([
    'apple pie',
    'baby back ribs',
    'baklava',
    'beef carpaccio',
    'beef tartare',
    'beet salad',
    'beignets',
    'bibimbap',
    'bread pudding',
    'breakfast burrito',
    'bruschetta',
    'caesar salad',
    'cannoli',
    'caprese salad',
    'carrot cake',
    'ceviche',
    'cheesecake',
    'cheese plate',
    'chicken curry',
    'chicken quesadilla',
    'chicken wings',
    'chocolate cake',
    'chocolate mousse',
    'churros',
    'clam chowder',
    'club sandwich',
    'crab cakes',
    'creme brulee',
    'croque madame',
    'cup cakes',
    'deviled eggs',
    'donuts',
    'dumplings',
    'edamame',
    'eggs benedict',
    'escargots',
    'falafel',
    'filet mignon',
    'fish and chips',
    'foie gras',
    'french fries',
    'french onion soup',
    'french toast',
    'fried calamari',
    'fried rice',
    'frozen yogurt',
    'garlic bread',
    'gnocchi',
    'greek salad',
    'grilled cheese sandwich',
    'grilled salmon',
    'guacamole',
    'gyoza',
    'hamburger',
    'hot and sour soup',
    'hot dog',
    'huevos rancheros',
    'hummus',
    'ice cream',
    'lasagna',
    'lobster bisque',
    'lobster roll sandwich',
    'macaroni and cheese',
    'macarons',
    'miso soup',
    'mussels',
    'nachos',
    'omelette',
    'onion rings',
    'oysters',
    'pad thai',
    'paella',
    'pancakes',
    'panna cotta',
    'peking duck',
    'pho',
    'pizza',
    'pork chop',
    'poutine',
    'prime rib',
    'pulled pork sandwich',
    'ramen',
    'ravioli',
    'red velvet cake',
    'risotto',
    'samosa',
    'sashimi',
    'scallops',
    'seaweed salad',
    'shrimp and grits',
    'spaghetti bolognese',
    'spaghetti carbonara',
    'spring rolls',
    'steak',
    'strawberry shortcake',
    'sushi',
    'tacos',
    'takoyaki',
    'tiramisu',
    'tuna tartare',
    'waffles'
])

Now that we have the food classes, let's write the main API functionality.

@app.post("/net/image/prediction/")
async def get_net_image_prediction(image_link: str = ""):
    if image_link == "":
        return {"message": "No image link provided"}

    img_path = get_file(
        origin = image_link
    )
    img = load_img(
        img_path, 
        target_size = (224, 224)
    )

    img_array = img_to_array(img)
    img_array = expand_dims(img_array, 0)

    pred = model.predict(img_array)
    score = softmax(pred[0])

    class_prediction = class_predictions[argmax(score)]
    model_score = round(max(score) * 100, 2)

    return {
        "model-prediction": class_prediction,
        "model-prediction-confidence-score": model_score
    }

Here, we make a post request to the endpoint /net/image/prediction/ and provide the image_url as a query parameter. That is, the full endpoint when posting an image URL link would be /net/image/prediction/image_url=image-url.

For simplicity's sake, we give the image_link a default value of "" and when there's no link passed to the endpoint, we simply return a message saying that there's no image link provided.

get_file() downloads the image through the provided URL link, while load_img() loads the image in PIL format, and turns it into the appropriate image size that the model wants.

img_to_array() converts the loaded image to a NumPy array. expand_dims() expands the dimensions of the array by one at the zero'th index.

We then use model.predict() to get the model prediction on the loaded image, and get the model's confidence score on said prediction using softmax(). I used softmax here as that's the activation function used in creating the model.

We finally then get the food type by using argmax() on the model's confidence score. We'll use that as the index that we'll use in searching through the class_predictions array which contains the various food classes we have.

Lastly, we multiply the model's confidence score by 100 so that the range of the score would be from 1 to 100.

We then return the model's prediction, and the model's confidence score.

Why We Need to Use Docker to Deploy this App

You can actually deploy this app as is on Heroku, using the usual method of defining a Procfile. But when I tried this method, I kept on getting a ValueError: Out of range float values are not JSON compliant error. I also get this error when running the app on Windows Subsystem for Linux (WSL). When I run on Windows, however, the error disappears.

You can actually avoid this error by adding this line of code, after the initial assignment of the model_score variable:

model_score = dumps(model_score.tolist())

This lets the app run on both Heroku and WSL, but it will only return these values when making the POST request.

{
    "model-prediction": "apple pie",
    "model-prediction-confidence-score": NaN,
}

So, it works on my machine (Windows), but not on Heroku (using Procfile), nor on WSL. This is the kind of problem that Docker solves!

How to Dockerize the Application

Let's start dockerizing the application. Create a Dockerfile in the project's root directory and put in the following content:

FROM python:3.7.3-stretch

# Maintainer info
LABEL maintainer="your-email-address"

# Make working directories
RUN  mkdir -p  /food-vision-api
WORKDIR  /food-vision-api

# Upgrade pip with no cache
RUN pip install --no-cache-dir -U pip

# Copy application requirements file to the created working directory
COPY requirements.txt .

# Install application dependencies from the requirements file
RUN pip install -r requirements.txt

# Copy every file in the source folder to the created working directory
COPY  . .

# Run the python application
CMD ["python", "main.py"]

This pulls the Python 3.7.3 image, and installs all the necessary packages defined in the requirements.txt file. Then it runs the application by using the command python main.py as defined in the last line of the file.

You can then build and run the application using the following CLI commands:

$ docker image build -t  .
$ docker run -p 5000:5000 -d

Then you can stop the app, and free up system resources by running the following:

$ docker container stop 
$ docker system prune

container-id is returned when running the docker run command above.

How to Deploy to Heroku

With the app now dockerized, we can deploy it to Heroku. I'm assuming you already have the Heroku CLI installed, and have already logged the CLI into your Heroku account.

Let's first create the app in Heroku through the CLI:

$ heroku create

Then we can push and release the app through the Docker container we made earlier with the following commands:

$ heroku container:push web --app 
$ heroku container:release web --app

After this, you can go to your Heroku dashboard and open the app. You should be greeted with the JSON message we have in the "/" directory of the application.

JSON message greeting on "/" directory

When you navigate to the /docs you'll be greeted with the Swagger documentation of the application. Here you can play around with the POST request we created and see if the model predictions are correct. Note that you must upload image links with the jpeg or png in its URL.

Swagger documentation of the application on /docs

Let's try this out by using a picture of a chocolate cake, its URL link is this.

Image from tallypress.com

Paste the link to the text box in the /docs as so, then press Execute.

Demonstration of the app

After pressing the Execute button, it will take a few seconds until we get the model prediction. That's because we're using tensorflow-cpu because we're limited with the RAM and the slug size of our application when using the free tier of Heroku.

After the execution is finished, you should be greeted with this response:

Response of the API after usage

As you can see, the model predicted it correctly, with a confidence score of 2.65%. This confidence score is alright as we're not dealing with model accuracy (which requires the truth value beforehand), and we're dealing with data the model hasn't seen before.

Conclusion

In this article, you learned how to deploy a TensorFlow CNN model to Heroku by serving it as a RESTful API, and by using Docker.

If you find this article helpful, feel free to share it on social media. Let's connect on Twitter! You can also support me by buying me a coffee.

Deep Learning Tutorial – How to Train and Deploy a Deep Learning Model with fast.ai

Harshit Tyagi — Tue, 06 Oct 2020 22:08:19 +0000

Deep learning is bringing revolutionary changes to many disciplines. It is also becoming more accessible to domain experts and AI enthusiasts with the advent of libraries like TensorFlow, PyTorch, and now fast.ai.

fast.ai's mission is to democratize deep learning. It is a research institute dedicated to helping everyone – from a beginner level coder to a proficient deep learning practitioner – achieve world-class results with state-of-the-art models and techniques from the latest research in the field.

This blog post will walk you through the process of developing a dog classifier using fast.ai. The goal is to learn how easy it is to get started with deep learning models and to be able to achieve near-perfect results with a limited amount of data using pre-trained models.

Prerequisite

The only prerequisite to get started is that you know how to code in Python and that you are familiar with high school math.

What You’ll Learn

Importing the libraries and setting up the notebook
Collecting Imagery Data using Microsoft Azure
Converting downloaded data into DataLoader objects
Data Augmentation
Cleaning Data using Model Training
Exporting the Trained Model
Building an Application out of your Jupyter Notebook

So let's get started.

How to Import the Libraries and Set Up the Notebook

Before we get down to building our model, we need to import the required libraries and utility function from the set of notebooks called fastbook. It's been developed to cover the introduction to Deep Learning using fast.ai and PyTorch.

Let’s install the fastbook package to set up the notebook:

!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

Then, let’s import all the functions and classes from the fastbook package and fast.ai vision widgets API:

from fastbook import *
from fastai.vision.widgets import *

How to Collect Imagery Data using Microsoft Azure

For most types of projects, you can find the data online from various data repositories and websites. To develop a Dog Classifier, we need to have images of dogs. There are many images of dogs available on the internet.

To download these images, we’ll use the Bing Image Search API provided by Microsoft Azure. So, sign up for a free account on Microsoft Azure and you’ll get $200 worth of credits.

Go to your portal and create a new Cognitive Service resource using this quickstart. Enable the Bing Image Search API. Then, from the Keys and Endpoint option in the left panel, copy the keys to your resource.

With the retrieved keys, set these keys to the environment as follows:

key = os.environ.get('AZURE_SEARCH_KEY', '')

Now, fastbook comes with utility functions like search_images_bing that returns URLs corresponding to your search query. You can learn about such functions using the help function:

help(fastbook)

You can check the search_image_bing function in this help guide. The function accepts a key to your resource that you’ve defined above and the search query, and we can access the URLs of the search results using the attrgot method:

results = search_images_bing(key, 'german shepherd dogs')
images = results.attrgot('content_url')
len(images)

We have 150 URLs of images of German Shepherd dogs:

Now, we can download these images using the download_url function. But let’s first define the type of dogs that we want.

For this tutorial, I’m going to work with three types of dogs: German Shepherds, black dogs, and Labradors.

So, let’s define a list of dog types:

dog_types = ['german shepherd', 'black', 'labrador']
path = Path('dogs')

You’ll then need to define the path where your images will be downloaded along with the semantic names of the folder for each class of dog.

if not path.exists():
    path.mkdir()
    for t in dog_types:
        dest = (path/t)
        print(dest)
        dest.mkdir(exist_ok=True)
        results = search_images_bing(key, '{} dog'.format(t))
        download_images(dest, urls=results.attrgot('content_url'))

This will create a “dogs” directory which further contains 3 directories for each type of dog image.

After that, we pass the search query (which is the dog_type) and the key to the search function, followed by the download function to download all the URLs from the search results in their respective destination (dest) directories.

We can check the images downloaded to a path using the get_image_file function:

files = get_image_files(path)
files

How to Verify Images

You can also check for the number of corrupt files/images in the files:

corrupt = verify_images(files)
corrupt

##output: (#0) []

You can remove all the corrupt files (if any) by mapping the unlink method to the list of corrupt files: corrupt.map(Path.unlink);

That’s it, we have 379 dog images ready with us to train and validate our model.

How to Convert Downloaded Data into DataLoader Objects

Now, we need a mechanism to provide data to our model. fast.ai has this concept of DataLoaders that stores multiple DataLoader objects passed to it and makes them available as a training and validation set.

Now, to convert the downloaded data into a DataLoader object, we have to provide four things:

What kinds of data we are working with
How to get the list of items
How to label these items
How to create the validation set

Now, to create these DataLoaders objects along with the information mentioned above, fast.ai offers a flexible system called the data block API_._ We can specify all the details of the DataLoader creation using the arguments and an array of transformation methods that the API offers:

dogs = DataBlock(
                  blocks=(ImageBlock, CategoryBlock),
                  get_items=get_image_files,
                  splitter=RandomSplitter(valid_pct=0.2, seed=41),
                  get_y=parent_label,
                  item_tfms=Resize(128)
                  )

Here, we have a bunch of arguments that we should understand:

blocks — this specifies the feature variables (images) and the target variable (a category for each image)
get_items — retrieves the underlying items (which are images in our case) and we have a **get_image_files** function that returns a list of all of the images in that path.
splitter — splits the data as per the provided method. We are using a random split with 20% of the data reserved for the validation set and specified the seed to get the same split on every run.
get_y — the target variable is referred to as y. To create the labels, we are using the **parent_label** function which gets the name of the folder where the file resides as its label.
item_tfms — we have images of different sizes and this causes a problem because we always send a batch of files to the model instead of a single file. Therefore we need to preprocess these images by resizing them to a standard and then group them in a tensor to pass through the model. We are using the **Resize** transformation here.

Now, we have the DataBlock object which needs to be converted to a DataLoader by providing the path to the dataset:

dls = dogs.dataloaders(path)

We can then check for the images in the dataloader object using the show_batch method:

dls.valid.show_batch()

Data Augmentation

We can add transformations to these images to create random variations of the input images, such that they appear different but still represent the same facts.

We can rotate, warp, flip, or change the brightness/contrast of the images to create these variations. We also have a standard set of augmentations encapsulated in the aug_transforms function that works pretty well for the majority of computer vision datasets.

We can now apply these transformations to an entire batch of images as all the images are of the same size (224 pixels, standard for image classification problems) using the following:

##adding item transformations
dogs = dogs.new(
        item_tfms=RandomResizedCrop(224, min_scale=0.5),
        batch_tfms=aug_transforms(mult=2)
        )
dls = dogs.dataloaders(path)
dls.train.show_batch(max_n=8, nrows=2, unique=True)

Model Training and Data Cleaning

It’s time to train the model with this limited number of images. fast.ai offers many architectures to use which makes it very easy to use transfer learning.

We can create a convolutional neural network (CNN) model using the pre-trained models that work for most of the applications/datasets.

We are going to use ResNet architecture, as it is both fast and accurate for many datasets and problems. The 18 in the **resnet18** represents the number of layers in the neural network.

We also pass the metric to measure the quality of the model’s predictions using the validation set from the dataloader. We are using error_rate which tells us how frequently the model is making incorrect predictions:

model = cnn_learner(dls, resnet18, metrics=error_rate)
model.fine_tune(4)

The fine_tune method is analogous to the fit() method in other ML libraries. Now, to train the model, we need to specify the number of times (epochs) we want to train the model on each image.

Here, we are training for only 4 epochs:

We can also visualize the predictions and compare them with the actual labels using the confusion matrix:

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

As you can see, we only have five incorrect predictions. Let’s check for the top losses, that is the images with the highest loss in the dataset:interp.plot_top_losses (6, nrows=3):

You can see that the model got confused between black and labrador. Thus, we can specify these images to be in a particular category using the ImageClassifierCleaner class.

Pass the model to the class and it will open up a widget with an intuitive GUI for data cleaning. We can change the labels of training and validation set images and view the highest-loss images.

After adding each image to their respective correct class, we have to move them to their right directory using:

for idx,cat in cleaner.change():
    shutil.move(str(cleaner.fns[idx]), str(path/cat).split('.')[0] +"_fixed.jpg")

How to Export the Trained Model

After a couple of rounds of hyperparameter tuning, and once you’re happy with your model, you need to save it so that you can deploy it on a server to be used in production.

While saving a model, we have the model architecture and the trained parameters that are of value to us. fast.ai offers the export() method to save the model in a pickle file with the extension .pkl.

model.export()
path = Path()
path.ls(file_exts='.pkl')

We can then load the model and make inferences by passing an image to the loaded model:

model_inf = load_learner(path/'export.pkl')

Use this loaded model to make inferences:

model_inf.predict('dogs/labrador/00000000.jpg')

We can check the labels from the models dataloader vocabulary:

model_inf.dls.vocab

How to Build an Application out of your Jupyter Notebook

The next step is to create an application that we can share with our friends, colleagues, recruiters, and others.

To create an application, we need to add interactive elements so that we can try and test the application’s features. We also need to make it available on the web as a webpage, which includes deploying it via some framework like Flask or simply using Voila.

You can use Voila to convert this Jupyter Notebook into a standalone app. I have not covered it here but you can go through my blog/video which covers the whole process.

Deploying your Model

I’ve covered deploying an ML model in my post here. But if you want another easy and free way of deploying your Voila application, you can use Binder.

Follow these steps to deploy the application on Binder:

Add your notebook to a GitHub repository.
Insert the URL of that repo into Binder’s URL field.
Change the file drop-down to instead select the URL.
In the “URL to open” field, enter /voila/render/<_name>_.ipynb
Click the clipboard button at the bottom right to copy the URL and paste it somewhere safe.
Click Launch.

And there you go, your dog classifier is live!

If you prefer to watch me going through all of these steps, here’s the video version of this blog:

Data Science with Harshit

With this channel, I am planning to roll out a couple of series covering the entire data science space. Here is why you should be subscribing to the channel:

These series would cover all the required/demanded quality tutorials on each of the topics and subtopics like Python fundamentals for Data Science.
Explained Mathematics and derivations of why we do what we do in ML and Deep Learning.
Podcasts with Data Scientists and Engineers at Google, Microsoft, Amazon, etc, and CEOs of big data-driven companies.
Projects and instructions to implement the topics learned so far. Learn about new certifications, Bootcamp, and resources to crack those certifications like this TensorFlow Developer Certificate Exam by Google.

If this tutorial was helpful, you should check out my data science and machine learning courses on Wiplane Academy. They are comprehensive yet compact and helps you build a solid foundation of work to showcase.

How to Train an Image Classifier and Teach Your Computer Japanese

freeCodeCamp — Sun, 21 Jul 2019 16:30:00 +0000

By Ajay Uppili Arasanipalai

Introduction

Hi. Hello. こんにちは

Those squiggly characters you just saw are from a language called Japanese. You’ve probably heard of it if you’ve ever watched Dragon Ball Z.

_Source_

Here’s the problem though: you know those ancient Japanese scrolls that make you look like you’re going to unleash an ultimate samurai ninja overlord super combo move.

Yeah, those. I can’t exactly read them, and it turns out that very few people can.

Luckily, a bunch of smart people understands how important it is that I master the Bijudama-Rasenshuriken, so they invented this thing called deep learning.

So pack your ramen and get ready. In this article, I’ll show you how to train a neural network that can accurately predict Japanese characters from their images.

To ensure that we get good results, I’m going to use of an incredible deep learning library called fastAI, which is a wrapper around PyTorch that makes it easy to implement best practices from modern research. You can read more about it on their docs.

With that said, let’s get started.

KMNIST

OK, so before we can create anime subtitles, we’re going to need a dataset. Today we’re going to focus on KMNIST.

This dataset takes of examples of characters from the Japanese Kuzushiji script, and organizes them into 10 labeled classes. The images measure 28x28 pixels, and there are 70,000 images in total, mirroring the structure of MNIST.

But why KMNIST? Well firstly, it has “MNIST” in its name, and we all know how much people in machine learning love MNIST.

_Source_

So in theory, you could just change a few lines of that Keras code that you copy-pasted from Stack Overflow and BOOM! You now have computer code that can revive an ancient Japanese script.

Of course, in practice, it isn’t that simple. For starters, the cute little model that you trained on MNIST probably won’t do that well. Because, you know, figuring out whether a number is a 2 or a 5 is just a tad easier than deciphering a forgotten cursive script that only a handful of people on earth know how to read.

Apart from that, I guess I should point out that Kuzushiji, which is what the “K” in KMNIST stands for, is not just 10 characters long. Unfortunately, I’m NOT one of the handfuls of experts that can read the language, so I can’t describe in intricate detail how it works.

But here’s what I do know: There are actually three variants of these Kuzushiji character datasets — KMNIST, Kuzushiji-49, and Kuzushiji-Kanji.

Kuzushiji-49 is variant with 49 classes instead of 10. Kuzushiji-Kanji is even more insane, with a whopping 3832 classes.

Yep, you read that right. It’s three times as many classes as ImageNet.

‍

How to Not Mess Up Your Dataset

To keep things as MNIST-y as possible, it looks like the researchers who put out the KMNIST dataset kept it in the original format (man, they really took that whole MNIST thing to heart, didn’t they).

If you take a look at the KMNIST GitHub repo, you’ll see that the dataset is served in two formats: the original MNIST thing, and as a bunch of Numpy arrays.

Of course, I know you were probably too lazy to click that link. So here you go. You can thank me later.

_Source_

Personally, I found the NumPy array format easier to work with when using fastai, but the choice is yours. If you’re using PyTorch, KMNIST comes for free as a part of torchvision.datasets.

The next challenge is actually getting those 10,000-year-old brush strokes onto your notebook (or IDE, who am I to judge). Luckily, the GitHub repo mentions that there’s this handy script called download_data.py that’ll do all the work for us. Yay!

From here, it’ll probably start getting awkward if I continue talking about how to pre-process your data without actual code. So check out the notebook if you want to dive deeper.

Moving on…

Should I use a hyper ultra Inception ResNet XXXL?‍

Short Answer

Probably not. A regular ResNet should be fine.

A Little Less Short Answer

Ok, look. By now, you’re probably thinking, “KMNIST big. KMNIST hard. Me need to use very new, very fancy model.”

Did I overdo the Bizzaro voice?

The point is, you DON’T need a shiny new model to do well on these image classification tasks. At best, you’ll probably get a marginal accuracy improvement at the cost of a whole lot of time and money.

Most of the time, you’ll just waste a whole lot of time and money.

So heed my advice — just stick to good ol’ fashion ResNets. They work really well, they're relatively fast and lightweight (compared to some of the other memory hogs like Inception and DenseNet), and best of all, people have been using them for a while, so it shouldn’t be too hard to fine-tune.

If the dataset you’re working with is simple like MNIST, use ResNet18. If it’s medium-difficulty, like CIFAR10, use ResNet34. If it’s really hard, like ImageNet, use ResNet50. If it’s harder than that, you can probably afford to use something better than a ResNet.

Don’t believe me? Check out my leading entry for the Stanford DAWNBench competition from April 2019:

What do you see? ResNets everywhere! Now come on, there’s got to be a reason for that.‍

Hyperparameters Galore

A few months ago, I wrote an article on how to pick the right hyperparameters. If you’re interested in a more general solution to this herculean task, go check that out. Here, I’m going to walk you through my process of picking good-enough hyperparameters to get good-enough results on KMNIST.

To start off, let’s go over what hyperparameters we need to tune.

We’ve already decided to use a ResNet34, so that’s that. We don’t need to figure out the number of layers, filter size, number of filters, etc. since that comes baked into our model.

See, I told you it would save time.

So what’s remaining is the big three: learning rate, batch size, and the number of epochs (plus stuff like dropout probability for which we can just use the default values).

Let’s go over them one by one.

Number of Epochs

Let’s start with the number of epochs. As you’ll come to see when you play around with the model in the notebook, our training is pretty efficient. We can easily cross 90% accuracy within a few minutes.

So given that our training is so fast in the first place, it seems extremely unlikely that we would use too many epochs and overfit. I’ve seen other KMNIST models train for over 50 epochs without any issues, so staying in the 0-30 range should be absolutely fine.

That means within the scope of the restrictions we’ve put on the model when it comes to epochs, the more, the merrier. In my experiments, I found that 10 epochs strike a good balance between model accuracy and training time.

Learning Rate

What I’m about to say is going to piss a lot of people off. But I’ll say it anyway — We don’t need to pay too much attention to the learning rate.

Yep, you heard me right. But give me a chance to explain.

Instead of going “Hmm… that doesn’t seem to work, let’s try again with lr=3e-3 ,” we’re going to use a much more systematic and disciplined approach to finding a good learning rate.

We’re going to use the learning rate finder, a revolutionary idea proposed by Leslie Smith in his paper on cyclical learning rates.

Here’s how it works:

First, we set up our model and prepare to train it for one epoch. As the model is training, we’ll gradually increase the learning rate.
Along the way, we’ll keep track of the loss at every iteration.
Finally, we select the learning rate the corresponds to the lowest loss.

When all is said and done, and you plot the loss against the learning rate, you should see something like this:

Now, before you get all giddy and pick 1e-01 as the learning rate, I’ll have you know that it’s NOT the best choice.

That’s because fastai implements a smoothening technique called exponentially weighted averages, which is the deep learning researcher version of an Instagram filter. It prevents our plots from looking like the result of giving your neighbors’ kid too much time with a blue crayon.

Since we’re using a form of averaging to make the plot look smooth, the “minimum” point that you’re looking at on the learning rate finder isn’t actually a minimum. It’s an average.

Instead, to actually find the learning rate, a good rule of thumb is to pick the learning rate that’s an order of magnitude lower than the minimum point on the smoothened plot. That tends to work really well in practice.

I understand that all this plotting and averaging might seem weird if all you’ve been brute-forcing learning rate values all your life. So I’d advise you to check out Sylvain Gugger’s explanation of the learning rate finder to learn more.

Batch Size

OK, you caught me red-handed here. My initial experiments used a batch size of 128 since that’s what the top submission used.

I know, I know. Not very creative. But it’s what I did. Afterward, I experimented with a few other batch sizes, and I couldn’t get better results. So 128 it is!

In general, batch sizes can be a weird thing to optimize, since it partially depends on the computer you’re using. If you have a GPU with more VRAM, you can train on larger batch sizes.

So if I tell you to use a batch size of 2048, for example, instead of getting that coveted top spot on Kaggle and eternal fame and glory for life, you might just end up with a CUDA: out of memory error.

So it’s hard to recommend a perfect batch size because, in practice, there are clearly computational limits. The best way to pick it is to try out values that work for you.

But how would you pick a random number from the vast sea of positive integers?

Well, you actually don’t. Since GPU memory comes is organized in bits, it’s a good idea to choose a batch size that’s a power of 2 so that your mini-batches fit snugly in memory.

Here’s what I would do: start off with a moderately large batch size like 512. Then, if you find that your model starts acting weird and the loss is not on a clear downward trend, half it. Next, repeat the training process with a batch size of 256, and see if it behaves this time.

If it doesn’t, wash, rinse, and repeat.

A Few Pretty Pictures

With the optimizations going on here, it’s going to be pretty challenging to keep track of this giant mess of models, metrics, and hyperparameters that we’ve created.

To ensure that we all remain sane human beings while climbing the accuracy mountain, we’re going to use the wandb + fastai integration.

So what does wandb actually do?

It keeps track of a whole lot of statistics about your model and how it’s performing automatically. But what’s really cool is that it also provides instant charts and visualizations to keep track of critical metrics like accuracy and loss, all in real-time!

If that wasn’t enough, it also stores all of those charts, visualizations, and statistics in the cloud, so you can access them anytime anywhere.

Your days of starting at a black terminal screen and fiddling around with matplotlib are over.

The notebook tutorial for this article has a straightforward introduction to how it works seamlessly with fastai. You can also check out the wandb workspace, where you can take a look at all the stuff I mentioned without writing any code.

Conclusion

これで終わりです

That means “this is the end.”

But you didn't need me to tell you that, did you? Not after you went through the trouble of getting a Japanese character dataset, using the learning rate finder, training a ResNet using modern best practices, and watching your model rise to glory using real-time monitoring in the cloud.

Yep, in about 20 minutes, you actually did all of that! Give yourself a pat on the back.

And please, go watch some Dragonball.

How I used Deep Learning to classify medical images with Fast.ai

freeCodeCamp — Wed, 27 Feb 2019 16:53:26 +0000

By James Dietle

Convolutional Neural Networks (CNNs) have rapidly advanced the last two years helping with medical image classification. How can we, even as hobbyists, take these recent advances and apply them to new datasets? We are going to walk through the process, and it’s surprisingly more accessible than you think.

As our family moved to Omaha, my wife (who is in a fellowship for pediatric gastroenterology) came home and said she wanted to use image classification for her research.

Oh, I was soooo ready.

For over two years, I have been playing around with deep learning as a hobby. I even wrote several articles (here and here). Now I had some direction on a problem. Unfortunately, I had no idea about anything in the gastrointestinal tract, and my wife hadn’t programmed since high school.

Start from the beginning

My entire journey into deep learning has been through the Fast.ai process. It started 2 years ago when I was trying to validate that all the “AI” and “Machine Learning” we were using in the security space wasn’t over-hyped or biased. It was, and we steered clear from those technologies. The most sobering fact was learning that being an expert in the field takes a little experimentation.

Setup

I have used Fast.ai for all the steps, and the newest version is making this more straightforward than ever. The ways to create your learning environment are proliferating rapidly. There are now docker images, Amazon amis, and services (like Crestle) that make it easier than ever to set up.

Whether you are the greenest of coding beginners or experienced ninja, start here on the Fast.ai website.

I opted to build my machine learning rig during a previous iteration of the course. However, it is not necessary, and I would recommend using another service instead. Choose the easiest route for you and start experimenting.

Changes to Fast.ai with version 3

I have taken the other iterations of Fast.ai, and after reviewing the newest course, I noticed how much more straightforward everything was in the notebook. Documentation and examples are everywhere.

Let’s dive into “lesson1-pets”, and if you have setup Fast.ai feel free to follow along in your personal jupyter instance.

lesson1-pets from Fast.ai

I prepared for the first lesson (typically defining between 2 classes — cats and dogs — as I had many times before. However, I saw this time that we were doing something much more complex regarding 33 breeds of cats and dogs using fewer lines of code.

The CNN was up and learning in 7 lines of code!!

That wasn’t the only significant change. Another huge stride was in showing errors. For example, we could quickly see a set of the top losses (items we confidently predicted wrong) and corresponding pet pictures from our dataset below.

Incorrect cat and dog breed predictions

This function was pretty much a spot check for bad data. Ensuring a lion, tiger, or bear didn’t sneak into the set. We could also see if there were glaring errors that were obvious to us.

The confusion matrix was even more beneficial to me. It allowed me to look across the whole set for patterns in misclassification between the 33 breeds.

Of the 33 breeds presented, we could see where our data diverged and ask ourselves if it made sense. A few breeds popped out in particular, and here are examples of the commonly confused images:

Staffordshire terrier and American terrier.

Egyptian Mau and Bengal

Not being a pet owner or enthusiast, I wouldn’t have be able to figure out these subtle details out about a breed’s subtle features. The model is doing a much better job than I would’ve been able to do! While I am certainly getting answers, I am also curious to find that missing feature or piece of data to improve the model.

There is an important caveat. We are now at the point where the model is teaching us about the data. Sometimes we can get stuck in a mindset where the output is the end of the process. If we fall into that trap, we might miss a fantastic opportunity to create a positive feedback loop.

30-second powerpoint drawing

Therefore, we are sitting a little wiser and little more confident in the 4th phase. Given this data, what decisions should I improve accuracy with?

More training
More images
More powerful architecture

Trick question! I am going to look at a different dataset. Let’s get up and personal with endoscope images of people’s insides.

Get the dataset, see a whole lot of sh… stuff

For anyone else interested in gastroenterology I recommend looking into The Kvasir Dataset. A good description from their site is:

the dataset containing images from inside the gastrointestinal (GI) tract. The collection of images are classified into three important anatomical landmarks and three clinically significant findings. In addition, it contains two categories of images related to endoscopic polyp removal. Sorting and annotation of the dataset is performed by medical doctors (experienced endoscopists)

There is also a research paper by experts (Pogorelov et al.) describing how they tackled the problem, which includes their findings.

Perfect, this is an excellent dataset to move from pets to people. Although a less cuddly dataset (that also includes stool samples) it is something exciting and complete.

As we download the data, the first thing we notice is that there are 8 classes in this dataset for us to classify instead of the 33 from before. However, it shouldn’t change any of our other operations.

Side Note: Originally, I spent a few hours scripting out how to move folders into validation folders, and spent some good time setting everything up. The scripting effort turned out to be a waste of time because there is already a simple function to create a validation set.

The lesson is “if something is a pain, chances are someone from the Fast.ai community has already coded it for you.”

Diving into the notebook

You can pick up my Jupyter notebook from GitHub here.

Building for speed and experimentation

As we start experimenting, it is crucial to get the framework correct. Try setting up the minimum needed to get it working that can scale up later. Make sure data is being taken in, processed, and provides outputs that make sense.

This means:

Use smaller batches
Use lower numbers of epochs
Limit transforms

If a run is taking longer than 2 minutes, figure out a way to go faster. Once everything is in place, we can get crazy.

Data Handling

Data prioritization, organization, grooming, and handling is the most important aspect of deep learning. Here is a crude picture showing how data handling occurs, or you can read the documentation.

Therefore we need to do the same thing for the endoscope data, and it is one line of code.

Explaining the variables:

Path points to our data (#1)
The validation set at 20% to properly create dataloaders
default transforms
the image size set at 224

That’s it! The data block is all set up and ready for the next phase.

Resnet

We have data and we need to decide on an architecture. Nowadays Resnet is popularly used for image classification. It has a number after it which equates to the number of layers. Many better articles exist about Resnet, therefore, to simplify for this article:

More layers = more accurate (Hooray!)

More layers = more compute and time needed (Boo..)

Therefore Resnet34 has 34 layers of image finding goodness.

Ready? I’m ready!

With the structured data, architecture, and a default error metric we have everything we need for the learner to start fitting.

Let’s look at some code:

We see that after the cycles and 7 minutes we get to 87% accuracy. Not bad. Not bad at all.

Not being a doctor, I have a very untrained eye looking at these. I have no clue what to be looking for, categorization errors, or if the data is any good. So I went straight to the confusion matrix to see where mistakes were being made.

Of the 8 classes, 2 sets of 2 are often confused with each other. As a baseline, I could only see if they are dyed, polyps, or something else. So compared to my personal baseline of 30% accuracy, the machine is getting an amazing 87%.

After looking at the images from these 2 sets side by side, you can see why. (Since they are medical images, they might be NSFW and are present in the Jupyter notebook.)

The dyed sections are being confused with each other. This type of error can be expected. They are both blue and look very similar to each other.
Esophagitis is hard to tell from a normal Z-line. Perhaps esophagitis presents redder than Z-line? I’m not certain.

Regardless, everything seems great, and we need to step up our game.

More layers, more images, more power!

Now that we see our super fast model working, let’s switch over to the powerhouse.

I increased the size of the dataset from v1 to v2. The larger set doubles the number of images available from 4000 to 8000. (Note: All examples in this article show v2.)
Transform everything that makes sense. There are lots of things you can tweak. We are going to go into more of this shortly.
Since the images from the dataset are relatively large, I decided to try making the size bigger. Although this would be slower, I was curious if it would be better able to pick out little details. This hypothesis still requires some experimentation.
More and more epochs.
If you remember from before, Resnet50 would have more layers (be more accurate) but would require more compute time and therefore be slower. So we will change the model from Resnet34 to Resnet50.

Transforms: Getting the most of an image

Image transforms are a great way to improve accuracy. If we make random changes to an image (rotate, change color, flip, etc.) we can make it seem like we have more images to train from and we are less likely to overfit. Is it as good as getting more images? No, but it’s fast and cheap.

When choosing which transforms to use, we want something that makes sense. Here are some examples of normal transforms of the same image if we were looking at dog breeds. If any of these individually came into the dataset, we would think it makes sense. Putting in transforms we now we have 8 images instead for every 1.

What if in the transformation madness we go too far? We could get the images below that are a little too extreme. We wouldn’t want to use many of these because they are not clear and do not correctly orient in a direction we would expect data to come in. While a dog could be tilted, it would never be upside down.

For the endoscope images, we are not as concerned about it being upside down or over tilted. An endoscope goes all over the place and can have a 360-degree rotation here, so I went wild with rotational transforms. Even a bit with the color as the lighting inside the body would be different. All of these seem to be in the realm of possibility.

Example of dyed polyps

(Note: the green box denotes how far the scope traveled. Therefore, this technique might be cutting off the value that could have provided.)

Reconstructing data and launching

Now we can see how to add transforms and how we would shift other variables for data:

Then we change the learner:

It really is that easy

Then we are ready to fire!

Many epochs later…

Just worry about the number of the right here

93% accurate! Not that bad, let’s look a the confusion matrix again.

It looks like the problem with dyed classification has gone away, but the esophagitis errors remain. In fact, the numbers of errors get worse in some of my iterations.

Can this run in production?

Yes, there are instructions to quickly host this information as a web service. As long as the license isn’t up and you don’t mind waiting… you can try it on Render right here!

Conclusion and Follow-up:

As you can see, it is straightforward to transfer the new course from Fast.ai to a different dataset. Much more accessible than ever before.

When going through testing, make sure you start with a fast concept to make sure everything is on the right path, then turn up the power later. Create a positive feedback loop to make sure you are both oriented correctly and as a mechanism to force you to learn more about the dataset. You will have a much richer experience in doing so.

Some observations on this dataset.

I am trying to solve this problem wrong. I am using a single classifier when these slides have multiple classifications. I discovered this later while reading the research paper. Don’t wait until the end to read papers!
As a multi-classification problem, I should be including bounding boxes for essential features.
Classifications can benefit from a feature describing how far the endoscope is in the body. Significant landmarks in the body would help to classify the images. The small green box on the bottom left of the images is a map describing where the endoscope is and might be a useful feature to explore.
If you haven’t seen the new fast.ai course take a look, it took me more time to write this post than it did to code the program, it was that simple.

Resources

Github Notebook
Kvasir Dataset
KVASIR: A Multi-Class Image Dataset for Computer Aided Gastrointestinal Disease Detection (Pogorelov, Konstantin & Randel, Kristin & Griwodz, Carsten & de Lange, Thomas & Eskeland, Sigrun & Johansen, Dag & Spampinato, Conceo & Dang Nguyen, Duc Tien & Lux, Mathias & Schmidt, Peter & Riegler, Michael & Halvorsen)
FastAI
PyTorch
Youtube video on this topic