image recognition - freeCodeCamp.org

How to Improve the Accuracy of Your Image Recognition Models

Jason — Mon, 29 Nov 2021 17:09:30 +0000

These 7 tricks and tips will take you from 50% to 90% accuracy for your image recognition models in literally minutes.

So, you have gathered a dataset, built a neural network, and trained your model.

But despite the hours (and sometimes days) of work you've invested to create the model, it spits out predictions with an accuracy of 50–70%. Chances are, this is not what you expected.

Here are a few strategies, or hacks, to boost your model’s performance metrics.

1. Get More Data

Deep learning models are only as powerful as the data you bring in. One of the easiest ways to increase validation accuracy is to add more data. This is especially useful if you don’t have many training instances.

If you’re working on image recognition models, you may consider increasing the diversity of your available dataset by employing data augmentation. These techniques include anything from flipping an image over an axis and adding noise to zooming in on the image. If you are a strong machine learning engineer, you could also try data augmentation with GANs.

2. Add More Layers

Adding more layers to your model increases its ability to learn your dataset’s features more deeply. This means that it will be able to recognize subtle differences that you, as a human, might not have picked up on.

This hack entirely relies on the nature of the task you are trying to solve.

For complex tasks, such as differentiating between the breeds of cats and dogs, adding more layers makes sense because your model will be able to learn the subtle features that differentiate a poodle from a Shih Tzu.

For simple tasks, such as classifying cats and dogs, a simple model with few layers will do.

More layers -> More nuanced model.

Photo by [Unsplash](https://unsplash.com/@alvannee?utm_source=medium&utm_medium=referral" rel="photo-creator noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener">Alvan Nee on 3. Change Your Image Size

When you preprocess your images for training and evaluation, there is a lot of experimentation you can do with regards to the image size.

If you choose an image size that is too small, your model will not be able to pick up on the distinctive features that help with image recognition.

Conversely, if your images are too big, it increases the computational resources required by your computer and/or your model might not be sophisticated enough to process them.

Common image sizes include 64x64, 128x128, 28x28 (MNIST), and 224x224 (VGG-16).

Keep in mind that most preprocessing algorithms do not consider the aspect ratio of the image, so smaller-sized images might appear to have shrunk over a certain axis.

Converting an image from a large resolution to a small size, like 28x28, usually ends up with a lot of pixelation that tends to have negative effects on your model’s performance. [Source](https://dribbble.com/shots/4829233-Pixelated-Mona-Lisa" rel="noopener)

4. Increase Epochs

Epochs are basically how many times you pass the entire dataset through the neural network. Incrementally train your model with more epochs with intervals of +25, +100, and so on.

Increasing epochs makes sense only if you have a lot of data in your dataset. However, your model will eventually reach a point where increasing epochs will not improve accuracy.

At this point, you should consider playing around with your model’s learning rate. This little hyperparameter dictates whether your model reaches its global minimum (the ultimate goal for neural nets) or gets stuck in a local minimum.

Global Minimum is the ultimate goal for neural networks. [Source](https://www.dna-ghost.com/single-post/2018/03/13/Neural-network-Escaping-from-variety-of-non-global-minimum-traps" rel="noopener)

5. Decrease Colour Channels

Colour channels reflect the dimensionality of your image arrays. Most colour (RGB) images are composed of three colour channels, while grayscale images have just one channel.

The more complex the colour channels are, the more complex the dataset is and the longer it will take to train the model.

If colour is not such a significant factor in your model, you can go ahead and convert your colour images to grayscale.

You can even consider other colour spaces, like HSV and Lab.

RGB images are composed of three colour channels: red, green, and blue. [Source](https://www.youtube.com/watch?v=ZqUotba3V5Y" rel="noopener)

6. Transfer Learning

Transfer learning involves the use of a pre-trained model, such as YOLO and ResNet, as a starting point for most computer vision and natural language processing tasks.

Pre-trained models are state-of-the-art deep learning models that were trained on millions and millions of samples, and often for months. These models have an astonishingly huge capability of detecting nuances in different images.

These models can be used as a base for your model. Most models are so good that you won’t need to add convolutional and pooling Layers.

Read more about using transfer learning.

Transfer learning can greatly improve your model’s accuracy from ~50% to 90%! Source: [Nvidia blog](https://www.nvidia.com/content/dam/en-zz/en_sg/ai-innovation-day-2019/assets/pdf/9_NVIDIA-Transfer-Learning-Toolkit-for-Intelligent-Video-Analytics.pdf" rel="noopener)

Final Thoughts

The hacks above offer a base for you to optimize a model. To really fine tune a model, you’ll need to consider tuning the various hyperparameters and functions involved in your model, such as the learning rate (as discussed above), activation functions, loss functions, and so on.

This hack comes as an “I hope you know what you’re doing” warning because there is a wider scope to mess up your model.

Always Save Your Models

Always save your model every time you make a change to your deep learning model. This will help you reuse a previous configuration of the model if it provides greater accuracy.

Most deep learning frameworks like Tensorflow and Pytorch have a “save model” method.

# In Tensorflow model.save('model.h5') # Saves the entire model to a single artifact # In Pytorch torch.save(model, PATH)

There are countless other ways to further optimize your deep learning, but the hacks described above serve as a base in the optimization part of deep learning.

Tweet at me letting me know what your favourite hack is!

How to use the Google Cloud Vision API and ClickSend to keep tabs on your pets

freeCodeCamp — Fri, 06 Jul 2018 16:17:07 +0000

By Namratha Subramanya

Just like people, dogs are scared by all kinds of things. Most often, it’s a result of having a negative experience or not being handled when their natural fears surface. In this article, we’ll create a way to make sure your dog is safe when you are away.

You can attach a camera to your dog’s collar that can capture images, and use Vision API to detect and recognize the images.

Let’s say your dog is scared of cats, and you want to make sure your little furry friend is safe from cats while playing in the backyard in your absence. You could build an application where you could get SMS alerts to your device when cats are recognized by Cloud Vision API.

In this tutorial, you’ll learn how to recognize an image using Google Cloud Vision API and alert the user with an SMS using ClickSend API. PubNub forms the skeleton of the application and interconnects the features.

The full project GitHub repo is available here.

Let’s Get Building

Assume your laptop’s webcam is the camera fixed to your dog’s collar. Below is the code that opens your webcam and takes pictures for you. You could set a time interval to capture images frequently. These images go into a canvas element and can be saved on your device. You can find the code for clicking and saving the images below.

Cloud Vision API

The Google Cloud Vision API enables developers to understand the content of an image through its powerful machine learning models. To get started with implementing the Vision API, you need to create a new project here. Before you create a new project, you need to set up your billing account. After this, you need to enable Vision API.

For more details, check this quick start link.

Run the following command in your terminal:

pip install --upgrade google-cloud-vision
To run the client library, you must first set up authentication by creating a service account here and setting an environment variable.

From the Service account drop-down list, select the New service account.

Enter a name into the Service account name field.

Do not select a value from the Role drop-down list. No role is required to access this service.

Click Create. A note appears, warning that this service account has no role.

Click Create without role. A JSON file that contains your key will download to your computer.

Now set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the file path of the JSON file that contains your service account key. This can be done as follows:

For Linux/Mac OS:

export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
For Windows:

set GOOGLE_APPLICATION_CREDENTIALS=[PATH]
Now you are all ready to run the code that recognizes your images. Here’s the Python code that takes the snapshots from the directory where you have saved them (mine is Downloads) and responds with labels.

The result of the image recognition is sent to the user using PubNub Real-time Messaging. You just need to subscribe your device to a channel, say, alert_notify to which Vision API’s sends the results of the image recognition.

Web Notification Alert using PubNub

You’ll now have to initialize your PubNub keys. Sign up for a PubNub account and create a project in the Admin Dashboard.

Now you can publish the alert message inside your Python code which you can send as a web push notification to your device. The device, in turn, subscribes to alert_notify channel and receives the alert message from your camera.

You can design the web push notification using the Notification API in HTML5.

ClickSend API

ClickSend API allows developers to integrate SMS, voice, fax, posts, or email into their applications. You could send an SMS to your mobile device along with web push notifications using PubNub. The ClickSend API is well-documented for developers.

You can use ClickSend’s HTTP API. Every time Vision API recognizes an image, you get an SMS to your device.

Congrats!

Now that you have set up Cloud Vision API and ClickSend API to communicate with each other through PubNub’s Publish-Subscribe, you will be able to receive web notifications and SMS alerts sent to your device every time your camera captures an image of a cat. Undoubtedly, this is a great starting point for building applications using different APIs and connecting them through PubNub.

_Originally published at www.pubnub.com._

How to build an image recognition iOS app with Apple’s CoreML and Vision APIs

freeCodeCamp — Fri, 01 Sep 2017 18:54:19 +0000

By Mark Mansur

With the release of CoreML and new Vision APIs at this year’s Apple World Wide Developers Conference, machine learning has never been easier to get into. Today I’m going to show you how to build a simple image recognition app.

We will learn how to gain access to the iPhone’s camera and how to pass what the camera is seeing into a machine learning model for analysis. We’ll do all this programmatically, without the use of storyboards! Crazy, I know.

Here is a look at what we are going to accomplish today:

// // ViewController.swift // cameraTest // // Created by Mark Mansur on 2017-08-01. // Copyright © 2017 Mark Mansur. All rights reserved. // import UIKit import AVFoundation import Vision class ViewController: UIViewController, AVCaptureVideoDataOutputSampleBufferDelegate { let label: UILabel = { let label = UILabel() label.textColor = .white label.translatesAutoresizingMaskIntoConstraints = false label.text = "Label" label.font = label.font.withSize(30) return label }() override func viewDidLoad() { super.viewDidLoad() setupCaptureSession() view.addSubview(label) setupLabel() } func setupCaptureSession() { let captureSession = AVCaptureSession() // search for available capture devices let availableDevices = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInWideAngleCamera], mediaType: AVMediaType.video, position: .back).devices // setup capture device, add input to our capture session do { if let captureDevice = availableDevices.first { let captureDeviceInput = try AVCaptureDeviceInput(device: captureDevice) captureSession.addInput(captureDeviceInput) } } catch { print(error.localizedDescription) } // setup output, add output to our capture session let captureOutput = AVCaptureVideoDataOutput() captureOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "videoQueue")) captureSession.addOutput(captureOutput) let previewLayer = AVCaptureVideoPreviewLayer(session: captureSession) previewLayer.frame = view.frame view.layer.addSublayer(previewLayer) captureSession.startRunning() } // called everytime a frame is captured func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) { guard let model = try? VNCoreMLModel(for: Resnet50().model) else {return} let request = VNCoreMLRequest(model: model) { (finishedRequest, error) in guard let results = finishedRequest.results as? [VNClassificationObservation] else { return } guard let Observation = results.first else { return } DispatchQueue.main.async(execute: { self.label.text = "\(Observation.identifier)" }) } guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } // executes request try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request]) } func setupLabel() { label.centerXAnchor.constraint(equalTo: view.centerXAnchor).isActive = true label.bottomAnchor.constraint(equalTo: view.bottomAnchor, constant: -50).isActive = true } }

🙌🏻 Step 1: Create a new project.

Fire up Xcode and create a new single view application. Give it a name, perhaps “ImageRecognition.” Choose swift as the main language and save your new project.

👋 Step 2 : Say goodbye to the storyboard.

For this tutorial, we are going to do everything programmatically, without the need for the storyboard. Maybe I’ll explain why in another article.

Delete main.storyboard.

Navigate to info.plist and scroll down to Deployment Info. We need to tell Xcode we are no longer using the storyboard.

Delete the main interface.

Without the storyboard we need to manually create the app window and root view controller.

Add the following to the application() function in AppDelegate.swift:

func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey: Any]?) -> Bool { // Override point for customization after application launch. window = UIWindow() window?.makeKeyAndVisible() let vc = ViewController() window?.rootViewController = vc return true }

We manually create the app window with UIWindow(), create our view controller, and tell the window to use it as its root view controller.

The app should now build and run without the storyboard 😎

⚙️ Step 3: Set up AVCaptureSession.

Before we start, import UIKit, AVFoundation and Vision. The AVCaptureSession object handles capture activity and manages the flow of data between input devices (such as the rear camera) and outputs.

We are going to start by creating a function to setup our capture session.

Create setupCaptureSession() inside ViewController.swift and instantiate a new AVCaptureSession.

func setupCaptureSession() { // creates a new capture session let captureSession = AVCaptureSession() }

Don’t forget to call this new function from ViewDidLoad().

override func viewDidLoad() { super.viewDidLoad() setupCaptureSession() }

Next, we are going to need a reference to the rear view camera. We can use a DiscoverySession to query available capture devices based on our search criteria.

Add the following code:

// search for available capture devices let availableDevices = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInWideAngleCamera], mediaType: AVMediaType.video, position: .back).devices

AvailableDevices now contains a list of available devices matching our search criteria.

We now need to gain access to our captureDevice and add it as an input to our captureSession.

Add an input to the capture session.

// get capture device, add device input to capture session do { if let captureDevice = availableDevices.first { captureSession.addInput(try AVCaptureDeviceInput(device: captureDevice)) } } catch { print(error.localizedDescription) }

The first available device will be the rear facing camera. We create a new AVCaptureDeviceInput using our capture device and add it to the capture session.

Now that we have our input setup, we can get started on how to output what the camera is capturing.

Add a video output to our capture session.

// setup output, add output to our capture session let captureOutput = AVCaptureVideoDataOutput() captureSession.addOutput(captureOutput)

AVCaptureVideoDataOutput is an output that captures video. It also provides us access to the frames being captured for processing with a delegate method we will see later.

Next, we need to add the capture session’s output as a sublayer to our view.

Add capture session output as a sublayer to the view controllers’ view.

let previewLayer = AVCaptureVideoPreviewLayer(session: captureSession) previewLayer.frame = view.frame view.layer.addSublayer(previewLayer) captureSession.startRunning()

We create a layer based on our capture session and add this layer as a sublayer to our view. CaptureSession.startRunning() starts the flow from inputs to the outputs that we connected earlier.

📷 Step 4: Permission to use the camera? Permission granted.

Nearly everyone has opened an app for the first time and has been prompted to allow the app to use the camera. Starting in iOS 10, our app will crash if we don’t prompt the user before attempting to access the camera.

Navigate to info.plist and add a new key named NSCameraUsageDescription. In the value column, simply explain to the user why your app needs camera access.

Now, when the user launches the app for the first time they will be prompted to allow access to the camera.

📊 Step 5: Getting the model.

The heart of this project is most likely the machine learning model. The model must be able to take in an image and give us back a prediction of what the image is. You can find free trained models here. The one I chose is ResNet50.

Once you obtain your model, drag and drop it into Xcode. It will automatically generate the necessary classes, providing you an interface to interact with your model.

🏞 Step 6: Image analysis.

To analyze what the camera is seeing, we need to somehow gain access to the frames being captured by the camera.

Conforming to the AVCaptureVideoDataOutputSampleBufferDelegate gives us an interface to interact with and be notified every time a frame is captured by the camera.

Conform ViewController to the AVCaptureVideoDataOutputSampleBufferDelegate.

We need to tell our Video output that ViewController is its sample buffer delegate.

Add the following line in SetupCaptureSession():

captureOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "videoQueue"))

Add the following function:

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) { guard let model = try? VNCoreMLModel(for: Resnet50().model) else { return } let request = VNCoreMLRequest(model: model) { (finishedRequest, error) in guard let results = finishedRequest.results as? [VNClassificationObservation] else { return } guard let Observation = results.first else { return } DispatchQueue.main.async(execute: { self.label.text = "\(Observation.identifier)" }) } guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } // executes request try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request]) }

Each time a frame is captured, the delegate is notified by calling captureOutput(). This is a perfect place to do our image analysis with CoreML.

First, we create a VNCoreMLModel which is essentially a CoreML model used with the vision framework. We create it with a Resnet50 Model.

Next, we create our vision request. In the completion handler, we update the onscreen UILabel with the identifier returned by the model. We then convert the frame passed to us from a CMSampleBuffer to a CVPixelBuffer. Which is the format our model needs for analysis.

Lastly, we perform the Vision request with a VNImageRequestHandler.

🗒 Step 7: Create a label.

The last step is to create a UILabel containing the model’s prediction.

Create a new UILabel and position it using constraints.

let label: UILabel = { let label = UILabel() label.textColor = .white label.translatesAutoresizingMaskIntoConstraints = false label.text = "Label" label.font = label.font.withSize(30) return label }() func setupLabel() { label.centerXAnchor.constraint(equalTo: view.centerXAnchor).isActive = true label.bottomAnchor.constraint(equalTo: view.bottomAnchor, constant: -50).isActive = true }

Don’t forget to add the label as a subview and call setupLabel() from within ViewDidLoad().

view.addSubview(label) setupLabel()

You can download the completed project from GitHub here.

Like what you see? Give this post a thumbs up 👍, follow me on Twitter, GitHub, or check out my personal page.