<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ image recognition - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ image recognition - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Thu, 11 Jun 2026 05:19:34 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/image-recognition/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Improve the Accuracy of Your Image Recognition Models ]]>
                </title>
                <description>
                    <![CDATA[ These 7 tricks and tips will take you from 50% to 90% accuracy for your image recognition models in literally minutes. So, you have gathered a dataset, built a neural network, and trained your model. But despite the hours (and sometimes days) of work... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/improve-image-recognition-model-accuracy-with-these-hacks/</link>
                <guid isPermaLink="false">66d45f379208fb118cc6cfc9</guid>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image recognition ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Jason ]]>
                </dc:creator>
                <pubDate>Mon, 29 Nov 2021 17:09:30 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/11/image-recognition-model-image.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>These 7 tricks and tips will take you from 50% to 90% accuracy for your image recognition models in literally minutes.</p>
<p>So, you have gathered a dataset, built a neural network, and trained your model.</p>
<p>But despite the hours (and sometimes days) of work you've invested to create the model, it spits out predictions with an accuracy of 50–70%. Chances are, this is not what you expected.</p>
<p>Here are a few strategies, or hacks, to boost your model’s performance metrics.</p>
<h2 id="heading-1-get-more-data">1. Get More Data</h2>
<p>Deep learning models are only as powerful as the data you bring in. One of the easiest ways to increase validation accuracy is to add more data. This is especially useful if you don’t have many training instances.</p>
<p>If you’re working on image recognition models, you may consider increasing the diversity of your available dataset by employing data augmentation. These techniques include anything from flipping an image over an axis and adding noise to zooming in on the image. If you are a strong machine learning engineer, you could also try data augmentation with GANs.</p>
<p>Read more about <a target="_blank" href="https://bair.berkeley.edu/blog/2019/06/07/data_aug/">data augmentation here</a>.</p>
<p>Keras has an amazing image preprocessing class to perform data augmentation: <a target="_blank" href="https://keras.io/api/preprocessing/image/#imagedatagenerator-class">ImageDataGenerator</a>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-119.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Be careful that the augmentation technique you use changes the entire class of an image. For example, the image of a 3 flipped over the y-axis doesn’t make sense! [Source](https://bair.berkeley.edu/blog/2019/06/07/data_aug/" rel="noopener)</em></p>
<h2 id="heading-2-add-more-layers">2. Add More Layers</h2>
<p>Adding more layers to your model increases its ability to learn your dataset’s features more deeply. This means that it will be able to recognize subtle differences that you, as a human, might not have picked up on.</p>
<p>This hack entirely relies on the nature of the task you are trying to solve.</p>
<p>For complex tasks, such as differentiating between the breeds of cats and dogs, adding more layers makes sense because your model will be able to learn the subtle features that differentiate a poodle from a Shih Tzu.</p>
<p>For simple tasks, such as classifying cats and dogs, a simple model with few layers will do.</p>
<p>More layers -&gt; More nuanced model.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-120.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Photo by [Unsplash](https://unsplash.com/@alvannee?utm_source=medium&amp;utm_medium=referral" rel="photo-creator noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener"&gt;Alvan Nee on &lt;a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral" rel="photo-source noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener noopener)</em></p>
<h2 id="heading-3-change-your-image-size">3. Change Your Image Size</h2>
<p>When you preprocess your images for training and evaluation, there is a lot of experimentation you can do with regards to the image size.</p>
<p>If you choose an image size that is too small, your model will not be able to pick up on the distinctive features that help with image recognition.</p>
<p>Conversely, if your images are too big, it increases the computational resources required by your computer and/or your model might not be sophisticated enough to process them.</p>
<p>Common image sizes include 64x64, 128x128, 28x28 (MNIST), and 224x224 (VGG-16).</p>
<p>Keep in mind that most preprocessing algorithms do not consider the aspect ratio of the image, so smaller-sized images might appear to have shrunk over a certain axis.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-121.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Converting an image from a large resolution to a small size, like 28x28, usually ends up with a lot of pixelation that tends to have negative effects on your model’s performance. [Source](https://dribbble.com/shots/4829233-Pixelated-Mona-Lisa" rel="noopener)</em></p>
<h2 id="heading-4-increase-epochs">4. Increase Epochs</h2>
<p><em>Epochs</em> are basically how many times you pass the entire dataset through the neural network. Incrementally train your model with more epochs with intervals of +25, +100, and so on.</p>
<p>Increasing epochs makes sense only if you have a lot of data in your dataset. However, your model will eventually reach a point where increasing epochs will not improve accuracy.</p>
<p>At this point, you should consider playing around with your model’s learning rate. This little hyperparameter dictates whether your model reaches its global minimum (the ultimate goal for neural nets) or gets stuck in a local minimum.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-122.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Global Minimum is the ultimate goal for neural networks. [Source](https://www.dna-ghost.com/single-post/2018/03/13/Neural-network-Escaping-from-variety-of-non-global-minimum-traps" rel="noopener)</em></p>
<h2 id="heading-5-decrease-colour-channels">5. Decrease Colour Channels</h2>
<p>Colour channels reflect the dimensionality of your image arrays. Most colour (RGB) images are composed of three colour channels, while grayscale images have just one channel.</p>
<p>The more complex the colour channels are, the more complex the dataset is and the longer it will take to train the model.</p>
<p>If colour is not such a significant factor in your model, you can go ahead and convert your colour images to grayscale.</p>
<p>You can even consider other colour spaces, like HSV and L<em>a</em>b.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-123.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>RGB images are composed of three colour channels: red, green, and blue. [Source](https://www.youtube.com/watch?v=ZqUotba3V5Y" rel="noopener)</em></p>
<h2 id="heading-6-transfer-learning">6. Transfer Learning</h2>
<p>Transfer learning involves the use of a pre-trained model, such as YOLO and ResNet, as a starting point for most computer vision and natural language processing tasks.</p>
<p>Pre-trained models are state-of-the-art deep learning models that were trained on millions and millions of samples, and often for months. These models have an astonishingly huge capability of detecting nuances in different images.</p>
<p>These models can be used as a base for your model. Most models are so good that you won’t need to add convolutional and pooling Layers.</p>
<p>Read more about <a target="_blank" href="https://machinelearningmastery.com/transfer-learning-for-deep-learning/">using transfer learning</a>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/11/image-124.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Transfer learning can greatly improve your model’s accuracy from ~50% to 90%! Source: [Nvidia blog](https://www.nvidia.com/content/dam/en-zz/en_sg/ai-innovation-day-2019/assets/pdf/9_NVIDIA-Transfer-Learning-Toolkit-for-Intelligent-Video-Analytics.pdf" rel="noopener)</em></p>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>The hacks above offer a base for you to optimize a model. To really fine tune a model, you’ll need to consider tuning the various hyperparameters and functions involved in your model, such as the learning rate (as discussed above), activation functions, loss functions, and so on.</p>
<p>This hack comes as an “I hope you know what you’re doing” warning because there is a wider scope to mess up your model.</p>
<h3 id="heading-always-save-your-models">Always Save Your Models</h3>
<p>Always save your model every time you make a change to your deep learning model. This will help you reuse a previous configuration of the model if it provides greater accuracy.</p>
<p>Most deep learning frameworks like Tensorflow and Pytorch have a “save model” method.</p>
<pre><code class="lang-python"><span class="hljs-comment"># In Tensorflow</span>
model.save(<span class="hljs-string">'model.h5'</span>) <span class="hljs-comment"># Saves the entire model to a single artifact</span>

<span class="hljs-comment"># In Pytorch</span>
torch.save(model, PATH)
</code></pre>
<p>There are countless other ways to further optimize your deep learning, but the hacks described above serve as a base in the optimization part of deep learning.</p>
<p><a target="_blank" href="http://twitter.com/jasmcaus"><em>Tweet at me</em></a> letting me know what your favourite hack is!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to use the Google Cloud Vision API and ClickSend to keep tabs on your pets ]]>
                </title>
                <description>
                    <![CDATA[ By Namratha Subramanya Just like people, dogs are scared by all kinds of things. Most often, it’s a result of having a negative experience or not being handled when their natural fears surface. In this article, we’ll create a way to make sure your do... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-the-google-cloud-vision-api-and-clicksend-to-keep-tabs-on-your-pets-6024b4daac29/</link>
                <guid isPermaLink="false">66c355e739357f944697658c</guid>
                
                    <category>
                        <![CDATA[ Google ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image recognition ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ technology ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Fri, 06 Jul 2018 16:17:07 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*4bblfUcScLKK4bWm3_FE0A.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Namratha Subramanya</p>
<p>Just like people, dogs are scared by all kinds of things. Most often, it’s a result of having a negative experience or not being handled when their natural fears surface. In this article, we’ll create a way to make sure your dog is safe when you are away.</p>
<p>You can attach a camera to your dog’s collar that can capture images, and use Vision API to detect and recognize the images.</p>
<p>Let’s say your dog is scared of cats, and you want to make sure your little furry friend is safe from cats while playing in the backyard in your absence. You could build an application where you could get SMS alerts to your device when cats are recognized by Cloud Vision API.</p>
<p>In this tutorial, you’ll learn how to recognize an image using Google Cloud Vision API and alert the user with an SMS using ClickSend API. PubNub forms the skeleton of the application and interconnects the features.</p>
<p><a target="_blank" href="https://github.com/namrathasubramanya/PubNub-VisionAPI-ClickSend"><strong>The full project GitHub repo is available here.</strong></a></p>
<h3 id="heading-lets-get-building">Let’s Get Building</h3>
<p>Assume your laptop’s webcam is the camera fixed to your dog’s collar. Below is the code that opens your webcam and takes pictures for you. You could set a time interval to capture images frequently. These images go into a canvas element and can be saved on your device. You can find the code for clicking and saving the images below.</p>
<h3 id="heading-cloud-vision-api">Cloud Vision API</h3>
<p>The Google Cloud Vision API enables developers to understand the content of an image through its powerful machine learning models. To get started with implementing the Vision API, you need to create a new project <a target="_blank" href="https://console.cloud.google.com/cloud-resource-manager?_ga=2.203919383.-603090119.1528760418">here</a>. Before you create a new project, you need to set up your billing account. After this, you need to enable Vision API.</p>
<p>For more details, check this quick start <a target="_blank" href="https://cloud.google.com/vision/docs/quickstart">link</a>.</p>
<p>Run the following command in your terminal:</p>
<pre><code>pip install --upgrade google-cloud-vision
</code></pre><p>To run the client library, you must first set up authentication by creating a service account <a target="_blank" href="https://console.cloud.google.com/apis/credentials">here</a> and setting an environment variable.</p>
<ul>
<li>From the <strong>Service account</strong> drop-down list, select <strong>the New service account</strong>.</li>
<li>Enter a name into the <strong>Service account name</strong> field.</li>
<li>Do not select a value from the <strong>Role</strong> drop-down list. No role is required to access this service.</li>
<li>Click <strong>Create</strong>. A note appears, warning that this service account has no role.</li>
<li>Click <strong>Create without role</strong>. A JSON file that contains your key will download to your computer.</li>
</ul>
<p>Now set the environment variable <code>GOOGLE_APPLICATION_CREDENTIALS</code> to the file path of the JSON file that contains your service account key. This can be done as follows:</p>
<p>For Linux/Mac OS:</p>
<pre><code><span class="hljs-keyword">export</span> GOOGLE_APPLICATION_CREDENTIALS=<span class="hljs-string">"[PATH]"</span>
</code></pre><p>For Windows:</p>
<pre><code>set GOOGLE_APPLICATION_CREDENTIALS=[PATH]
</code></pre><p>Now you are all ready to run the code that recognizes your images. Here’s the Python code that takes the snapshots from the directory where you have saved them (mine is Downloads) and responds with labels.</p>
<p>The result of the image recognition is sent to the user using <a target="_blank" href="https://www.pubnub.com/docs/tutorials/pubnub-publish-subscribe/?utm_source=Syndication&amp;utm_medium=Medium&amp;utm_campaign=SYN-CY18-Q2-Medium-June-28">PubNub Real-time Messaging</a>. You just need to subscribe your device to a channel, say, <code>alert_notify</code> to which Vision API’s sends the results of the image recognition.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/0*u-0bRBK1pv8V8YsA.jpg" alt="Image" width="800" height="246" loading="lazy"></p>
<h3 id="heading-web-notification-alert-using-pubnub">Web Notification Alert using PubNub</h3>
<p>You’ll now have to initialize your PubNub keys. <a target="_blank" href="https://www.pubnub.com/docs/tutorials/pubnub-publish-subscribe/?utm_source=Syndication&amp;utm_medium=Medium&amp;utm_campaign=SYN-CY18-Q2-Medium-June-28">Sign up for a PubNub account</a> and create a project in the <a target="_blank" href="https://admin.pubnub.com/#/login/?utm_source=Syndication&amp;utm_medium=Medium&amp;utm_campaign=SYN-CY18-Q2-Medium-June-28">Admin Dashboard</a>.</p>
<p>Now you can publish the alert message inside your Python code which you can send as a web push notification to your device. The device, in turn, subscribes to <code>alert_notify</code> channel and receives the alert message from your camera.</p>
<p>You can design the web push notification using the <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/notification">Notification API</a> in HTML5.</p>
<h3 id="heading-clicksend-api">ClickSend API</h3>
<p><a target="_blank" href="https://developers.clicksend.com/">ClickSend API</a> allows developers to integrate SMS, voice, fax, posts, or email into their applications. You could send an SMS to your mobile device along with web push notifications using PubNub. The ClickSend API is well-documented for developers.</p>
<p>You can use ClickSend’s <a target="_blank" href="https://clicksendhttpapiv2.docs.apiary.io/#">HTTP API</a>. Every time Vision API recognizes an image, you get an SMS to your device.</p>
<h3 id="heading-congrats">Congrats!</h3>
<p>Now that you have set up Cloud Vision API and ClickSend API to communicate with each other through PubNub’s Publish-Subscribe, you will be able to receive web notifications and SMS alerts sent to your device every time your camera captures an image of a cat. Undoubtedly, this is a great starting point for building applications using different APIs and connecting them through PubNub.</p>
<p>_Originally published at <a target="_blank" href="https://www.pubnub.com/blog/image-recognition-using-vision-api-and/?utm_source=Syndication&amp;utm_medium=Medium&amp;utm_campaign=SYN-CY18-Q2-Medium-June-28">www.pubnub.com</a>._</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to build an image recognition iOS app with Apple’s CoreML and Vision APIs ]]>
                </title>
                <description>
                    <![CDATA[ By Mark Mansur With the release of CoreML and new Vision APIs at this year’s Apple World Wide Developers Conference, machine learning has never been easier to get into. Today I’m going to show you how to build a simple image recognition app. We will ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/ios-coreml-vision-image-recognition-3619cf319d0b/</link>
                <guid isPermaLink="false">66c35871cf1314a450f0d6dc</guid>
                
                    <category>
                        <![CDATA[ image recognition ]]>
                    </category>
                
                    <category>
                        <![CDATA[ iOS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ mobile app development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Fri, 01 Sep 2017 18:54:19 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*vm51CWzLgOE2mTHwWdQENw.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Mark Mansur</p>
<p>With the release of <a target="_blank" href="https://developer.apple.com/documentation/coreml">CoreML</a> and new Vision APIs at this year’s Apple World Wide Developers Conference, machine learning has never been easier to get into. Today I’m going to show you how to build a simple image recognition app.</p>
<p>We will learn how to gain access to the iPhone’s camera and how to pass what the camera is seeing into a machine learning model for analysis. We’ll do all this programmatically, without the use of storyboards! Crazy, I know.</p>
<p>Here is a look at what we are going to accomplish today:</p>
<pre><code class="lang-swift"><span class="hljs-comment">//</span>
<span class="hljs-comment">//  ViewController.swift</span>
<span class="hljs-comment">//  cameraTest</span>
<span class="hljs-comment">//</span>
<span class="hljs-comment">//  Created by Mark Mansur on 2017-08-01.</span>
<span class="hljs-comment">//  Copyright © 2017 Mark Mansur. All rights reserved.</span>
<span class="hljs-comment">//</span>
<span class="hljs-keyword">import</span> UIKit
<span class="hljs-keyword">import</span> AVFoundation
<span class="hljs-keyword">import</span> Vision

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ViewController</span>: <span class="hljs-title">UIViewController</span>, <span class="hljs-title">AVCaptureVideoDataOutputSampleBufferDelegate</span> </span>{
    <span class="hljs-keyword">let</span> label: <span class="hljs-type">UILabel</span> = {
        <span class="hljs-keyword">let</span> label = <span class="hljs-type">UILabel</span>()
        label.textColor = .white
        label.translatesAutoresizingMaskIntoConstraints = <span class="hljs-literal">false</span>
        label.text = <span class="hljs-string">"Label"</span>
        label.font = label.font.withSize(<span class="hljs-number">30</span>)
        <span class="hljs-keyword">return</span> label
    }()

    <span class="hljs-keyword">override</span> <span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">viewDidLoad</span><span class="hljs-params">()</span></span> {
        <span class="hljs-keyword">super</span>.viewDidLoad()

        setupCaptureSession()

        view.addSubview(label)
        setupLabel()
    }

    <span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">setupCaptureSession</span><span class="hljs-params">()</span></span> {
        <span class="hljs-keyword">let</span> captureSession = <span class="hljs-type">AVCaptureSession</span>()

        <span class="hljs-comment">// search for available capture devices</span>
        <span class="hljs-keyword">let</span> availableDevices = <span class="hljs-type">AVCaptureDevice</span>.<span class="hljs-type">DiscoverySession</span>(deviceTypes: [.builtInWideAngleCamera], mediaType: <span class="hljs-type">AVMediaType</span>.video, position: .back).devices

        <span class="hljs-comment">// setup capture device, add input to our capture session</span>
        <span class="hljs-keyword">do</span> {
            <span class="hljs-keyword">if</span> <span class="hljs-keyword">let</span> captureDevice = availableDevices.first {
                <span class="hljs-keyword">let</span> captureDeviceInput = <span class="hljs-keyword">try</span> <span class="hljs-type">AVCaptureDeviceInput</span>(device: captureDevice)
                captureSession.addInput(captureDeviceInput)
            }
        } <span class="hljs-keyword">catch</span> {
            <span class="hljs-built_in">print</span>(error.localizedDescription)
        }

        <span class="hljs-comment">// setup output, add output to our capture session</span>
        <span class="hljs-keyword">let</span> captureOutput = <span class="hljs-type">AVCaptureVideoDataOutput</span>()
        captureOutput.setSampleBufferDelegate(<span class="hljs-keyword">self</span>, queue: <span class="hljs-type">DispatchQueue</span>(label: <span class="hljs-string">"videoQueue"</span>))
        captureSession.addOutput(captureOutput)

        <span class="hljs-keyword">let</span> previewLayer = <span class="hljs-type">AVCaptureVideoPreviewLayer</span>(session: captureSession)
        previewLayer.frame = view.frame
        view.layer.addSublayer(previewLayer)

        captureSession.startRunning()
    }

    <span class="hljs-comment">// called everytime a frame is captured</span>
    <span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">captureOutput</span><span class="hljs-params">(<span class="hljs-number">_</span> output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection)</span></span> {
        <span class="hljs-keyword">guard</span> <span class="hljs-keyword">let</span> model = <span class="hljs-keyword">try</span>? <span class="hljs-type">VNCoreMLModel</span>(<span class="hljs-keyword">for</span>: <span class="hljs-type">Resnet50</span>().model) <span class="hljs-keyword">else</span> {<span class="hljs-keyword">return</span>}

        <span class="hljs-keyword">let</span> request = <span class="hljs-type">VNCoreMLRequest</span>(model: model) { (finishedRequest, error) <span class="hljs-keyword">in</span>

            <span class="hljs-keyword">guard</span> <span class="hljs-keyword">let</span> results = finishedRequest.results <span class="hljs-keyword">as</span>? [<span class="hljs-type">VNClassificationObservation</span>] <span class="hljs-keyword">else</span> { <span class="hljs-keyword">return</span> }
            <span class="hljs-keyword">guard</span> <span class="hljs-keyword">let</span> <span class="hljs-type">Observation</span> = results.first <span class="hljs-keyword">else</span> { <span class="hljs-keyword">return</span> }

            <span class="hljs-type">DispatchQueue</span>.main.async(execute: {
                <span class="hljs-keyword">self</span>.label.text = <span class="hljs-string">"\(Observation.identifier)"</span>
            })
        }
        <span class="hljs-keyword">guard</span> <span class="hljs-keyword">let</span> pixelBuffer: <span class="hljs-type">CVPixelBuffer</span> = <span class="hljs-type">CMSampleBufferGetImageBuffer</span>(sampleBuffer) <span class="hljs-keyword">else</span> { <span class="hljs-keyword">return</span> }

        <span class="hljs-comment">// executes request</span>
        <span class="hljs-keyword">try</span>? <span class="hljs-type">VNImageRequestHandler</span>(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])
    }

    <span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">setupLabel</span><span class="hljs-params">()</span></span> {
        label.centerXAnchor.constraint(equalTo: view.centerXAnchor).isActive = <span class="hljs-literal">true</span>
        label.bottomAnchor.constraint(equalTo: view.bottomAnchor, constant: -<span class="hljs-number">50</span>).isActive = <span class="hljs-literal">true</span>
    }
}
</code></pre>
<h3 id="heading-step-1-create-a-new-project">🙌🏻 Step 1: Create a new project.</h3>
<p>Fire up Xcode and create a new single view application. Give it a name, perhaps “ImageRecognition.” Choose swift as the main language and save your new project.</p>
<h3 id="heading-step-2-say-goodbye-to-the-storyboard">👋 Step 2 : Say goodbye to the storyboard.</h3>
<p>For this tutorial, we are going to do everything programmatically, without the need for the storyboard. Maybe I’ll explain why in another article.</p>
<p>Delete <code>main.storyboard</code>.</p>
<p>Navigate to <code>info.plist</code> and scroll down to Deployment Info. We need to tell Xcode we are no longer using the storyboard.</p>
<p>Delete the main interface.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*W-p1_py_aMgNrnBh4ljJOg.png" alt="Image" width="800" height="271" loading="lazy"></p>
<p>Without the storyboard we need to manually create the app window and root view controller.</p>
<p>Add the following to the <code>application()</code> function in <code>AppDelegate.swift</code>:</p>
<pre><code class="lang-swift">
<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">application</span><span class="hljs-params">(<span class="hljs-number">_</span> application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey: <span class="hljs-keyword">Any</span>]?)</span></span> -&gt; <span class="hljs-type">Bool</span> {
        <span class="hljs-comment">// Override point for customization after application launch.</span>

        window = <span class="hljs-type">UIWindow</span>()
        window?.makeKeyAndVisible()
        <span class="hljs-keyword">let</span> vc = <span class="hljs-type">ViewController</span>()

        window?.rootViewController = vc
        <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>
    }
</code></pre>
<p>We manually create the app window with <code>UIWindow()</code>, create our view controller, and tell the window to use it as its root view controller.</p>
<p>The app should now build and run without the storyboard 😎</p>
<h3 id="heading-step-3-set-up-avcapturesession">⚙️ Step 3: Set up AVCaptureSession.</h3>
<p>Before we start, import UIKit, AVFoundation and Vision. The AVCaptureSession object handles capture activity and manages the flow of data between input devices (such as the rear camera) and outputs.</p>
<p>We are going to start by creating a function to setup our capture session.</p>
<p>Create <code>setupCaptureSession()</code> inside <code>ViewController.swift</code> and instantiate a new <code>AVCaptureSession</code>.</p>
<pre><code class="lang-swift"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">setupCaptureSession</span><span class="hljs-params">()</span></span> {

        <span class="hljs-comment">// creates a new capture session</span>
        <span class="hljs-keyword">let</span> captureSession = <span class="hljs-type">AVCaptureSession</span>()
}
</code></pre>
<p>Don’t forget to call this new function from <code>ViewDidLoad()</code>.</p>
<pre><code class="lang-swift"><span class="hljs-keyword">override</span> <span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">viewDidLoad</span><span class="hljs-params">()</span></span> {
        <span class="hljs-keyword">super</span>.viewDidLoad()

        setupCaptureSession()
}
</code></pre>
<p>Next, we are going to need a reference to the rear view camera. We can use a <code>DiscoverySession</code> to query available capture devices based on our search criteria.</p>
<p>Add the following code:</p>
<pre><code class="lang-swift"><span class="hljs-comment">// search for available capture devices</span>
<span class="hljs-keyword">let</span> availableDevices = <span class="hljs-type">AVCaptureDevice</span>.<span class="hljs-type">DiscoverySession</span>(deviceTypes: [.builtInWideAngleCamera], mediaType: <span class="hljs-type">AVMediaType</span>.video, position: .back).devices
</code></pre>
<p><code>AvailableDevices</code> now contains a list of available devices matching our search criteria.</p>
<p>We now need to gain access to our <code>captureDevice</code> and add it as an input to our <code>captureSession</code>.</p>
<p>Add an input to the capture session.</p>
<pre><code class="lang-swift"><span class="hljs-comment">// get capture device, add device input to capture session</span>
<span class="hljs-keyword">do</span> {
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">let</span> captureDevice = availableDevices.first {
        captureSession.addInput(<span class="hljs-keyword">try</span> <span class="hljs-type">AVCaptureDeviceInput</span>(device: captureDevice))
    }
} <span class="hljs-keyword">catch</span> {
    <span class="hljs-built_in">print</span>(error.localizedDescription)
}
</code></pre>
<p>The first available device will be the rear facing camera. We create a new <code>AVCaptureDeviceInput</code> using our capture device and add it to the capture session.</p>
<p>Now that we have our input setup, we can get started on how to output what the camera is capturing.</p>
<p>Add a video output to our capture session.</p>
<pre><code class="lang-swift"><span class="hljs-comment">// setup output, add output to our capture session</span>
<span class="hljs-keyword">let</span> captureOutput = <span class="hljs-type">AVCaptureVideoDataOutput</span>()
captureSession.addOutput(captureOutput)
</code></pre>
<p><code>AVCaptureVideoDataOutput</code> is an output that captures video. It also provides us access to the frames being captured for processing with a delegate method we will see later.</p>
<p>Next, we need to add the capture session’s output as a sublayer to our view.</p>
<p>Add capture session output as a sublayer to the view controllers’ view.</p>
<pre><code class="lang-swift"><span class="hljs-keyword">let</span> previewLayer = <span class="hljs-type">AVCaptureVideoPreviewLayer</span>(session: captureSession)
previewLayer.frame = view.frame
view.layer.addSublayer(previewLayer)

captureSession.startRunning()
</code></pre>
<p>We create a layer based on our capture session and add this layer as a sublayer to our view. <code>CaptureSession.startRunning()</code> starts the flow from inputs to the outputs that we connected earlier.</p>
<h3 id="heading-step-4-permission-to-use-the-camera-permission-granted">📷 Step 4: Permission to use the camera? Permission granted.</h3>
<p>Nearly everyone has opened an app for the first time and has been prompted to allow the app to use the camera. Starting in iOS 10, our app will crash if we don’t prompt the user before attempting to access the camera.</p>
<p>Navigate to <code>info.plist</code> and add a new key named <code>NSCameraUsageDescription</code>. In the value column, simply explain to the user why your app needs camera access.</p>
<p>Now, when the user launches the app for the first time they will be prompted to allow access to the camera.</p>
<h3 id="heading-step-5-getting-the-model">📊 Step 5: Getting the model.</h3>
<p>The heart of this project is most likely the machine learning model. The model must be able to take in an image and give us back a prediction of what the image is. You can find free trained models <a target="_blank" href="https://developer.apple.com/machine-learning/">here</a>. The one I chose is ResNet50.</p>
<p>Once you obtain your model, drag and drop it into Xcode. It will automatically generate the necessary classes, providing you an interface to interact with your model.</p>
<h3 id="heading-step-6-image-analysis">🏞 Step 6: Image analysis.</h3>
<p>To analyze what the camera is seeing, we need to somehow gain access to the frames being captured by the camera.</p>
<p>Conforming to the <code>AVCaptureVideoDataOutputSampleBufferDelegate</code> gives us an interface to interact with and be notified every time a frame is captured by the camera.</p>
<p>Conform <code>ViewController</code> to the <code>AVCaptureVideoDataOutputSampleBufferDelegate</code>.</p>
<p>We need to tell our Video output that ViewController is its sample buffer delegate.</p>
<p>Add the following line in <code>SetupCaptureSession()</code>:</p>
<pre><code class="lang-swift">captureOutput.setSampleBufferDelegate(<span class="hljs-keyword">self</span>, queue: <span class="hljs-type">DispatchQueue</span>(label: <span class="hljs-string">"videoQueue"</span>))
</code></pre>
<p>Add the following function:</p>
<pre><code class="lang-swift"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">captureOutput</span><span class="hljs-params">(<span class="hljs-number">_</span> output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection)</span></span> {
        <span class="hljs-keyword">guard</span> <span class="hljs-keyword">let</span> model = <span class="hljs-keyword">try</span>? <span class="hljs-type">VNCoreMLModel</span>(<span class="hljs-keyword">for</span>: <span class="hljs-type">Resnet50</span>().model) <span class="hljs-keyword">else</span> { <span class="hljs-keyword">return</span> }
        <span class="hljs-keyword">let</span> request = <span class="hljs-type">VNCoreMLRequest</span>(model: model) { (finishedRequest, error) <span class="hljs-keyword">in</span>
            <span class="hljs-keyword">guard</span> <span class="hljs-keyword">let</span> results = finishedRequest.results <span class="hljs-keyword">as</span>? [<span class="hljs-type">VNClassificationObservation</span>] <span class="hljs-keyword">else</span> { <span class="hljs-keyword">return</span> }
            <span class="hljs-keyword">guard</span> <span class="hljs-keyword">let</span> <span class="hljs-type">Observation</span> = results.first <span class="hljs-keyword">else</span> { <span class="hljs-keyword">return</span> }

            <span class="hljs-type">DispatchQueue</span>.main.async(execute: {
                <span class="hljs-keyword">self</span>.label.text = <span class="hljs-string">"\(Observation.identifier)"</span>
            })
        }
        <span class="hljs-keyword">guard</span> <span class="hljs-keyword">let</span> pixelBuffer: <span class="hljs-type">CVPixelBuffer</span> = <span class="hljs-type">CMSampleBufferGetImageBuffer</span>(sampleBuffer) <span class="hljs-keyword">else</span> { <span class="hljs-keyword">return</span> }

        <span class="hljs-comment">// executes request</span>
        <span class="hljs-keyword">try</span>? <span class="hljs-type">VNImageRequestHandler</span>(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])
    }
</code></pre>
<p>Each time a frame is captured, the delegate is notified by calling <code>captureOutput()</code>. This is a perfect place to do our image analysis with CoreML.</p>
<p>First, we create a <code>VNCoreMLModel</code> which is essentially a CoreML model used with the vision framework. We create it with a Resnet50 Model.</p>
<p>Next, we create our vision request. In the completion handler, we update the onscreen UILabel with the identifier returned by the model. We then convert the frame passed to us from a <code>CMSampleBuffer</code> to a <code>CVPixelBuffer</code>. Which is the format our model needs for analysis.</p>
<p>Lastly, we perform the Vision request with a <code>VNImageRequestHandler</code>.</p>
<h3 id="heading-step-7-create-a-label">🗒 Step 7: Create a label.</h3>
<p>The last step is to create a <code>UILabel</code> containing the model’s prediction.</p>
<p>Create a new <code>UILabel</code> and position it using constraints.</p>
<pre><code class="lang-swift"><span class="hljs-keyword">let</span> label: <span class="hljs-type">UILabel</span> = {
        <span class="hljs-keyword">let</span> label = <span class="hljs-type">UILabel</span>()
        label.textColor = .white
        label.translatesAutoresizingMaskIntoConstraints = <span class="hljs-literal">false</span>
        label.text = <span class="hljs-string">"Label"</span>
        label.font = label.font.withSize(<span class="hljs-number">30</span>)
        <span class="hljs-keyword">return</span> label
    }()

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">setupLabel</span><span class="hljs-params">()</span></span> {
        label.centerXAnchor.constraint(equalTo: view.centerXAnchor).isActive = <span class="hljs-literal">true</span>
        label.bottomAnchor.constraint(equalTo: view.bottomAnchor, constant: -<span class="hljs-number">50</span>).isActive = <span class="hljs-literal">true</span>
}
</code></pre>
<p>Don’t forget to add the label as a subview and call <code>setupLabel()</code> from within <code>ViewDidLoad()</code>.</p>
<pre><code class="lang-swift">view.addSubview(label)
setupLabel()
</code></pre>
<p>You can download the completed project from <a target="_blank" href="https://github.com/markmansur/CoreML-Vision-demo">GitHub here</a>.</p>
<p>Like what you see? Give this post a thumbs up 👍, follow me on <a target="_blank" href="https://twitter.com/MarkMansur2">Twitter</a>, <a target="_blank" href="https://github.com/markmansur">GitHub</a>, or check out <a target="_blank" href="http://markmansur.me/">my personal page</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
