<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ image processing - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ image processing - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Fri, 22 May 2026 17:39:53 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/image-processing/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Build and Deploy an Image Hosting Service on Sevalla ]]>
                </title>
                <description>
                    <![CDATA[ When most people think of image hosting, they imagine uploading photos to a cloud service and getting back a simple link. It feels seamless, but behind that experience sits a powerful set of technologies. At the core is something called object storag... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-and-deploy-an-image-hosting-service-on-sevalla/</link>
                <guid isPermaLink="false">68d691dd9aa70c44b703deb0</guid>
                
                    <category>
                        <![CDATA[ JavaScript ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Fri, 26 Sep 2025 13:15:09 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1758890260515/c4b83d17-c783-425c-ab11-50961e44ea58.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>When most people think of image hosting, they imagine uploading photos to a cloud service and getting back a simple link.</p>
<p>It feels seamless, but behind that experience sits a powerful set of technologies. At the core is something called object storage, which is a different way of handling files compared to traditional databases or file systems.</p>
<p>In this article, we’ll build a complete image hosting service using <a target="_blank" href="https://nodejs.org/en">Node.js</a> and Express, connect it to object storage, and finally, deploy the whole project to <a target="_blank" href="https://sevalla.com/">Sevalla</a>.</p>
<p>By the end, you will have a working application that lets users upload images and retrieve them through hosted URLs, all running live on the cloud.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents</strong></h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-object-storage">What is Object Storage?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-we-will-be-building">What We Will Be Building</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-the-project">How to Set Up the Project</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-your-object-storage">How to Create Your Object Storage</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-deploy-your-project-on-sevalla">How to Deploy Your Project on Sevalla</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-this-project-matters">Why This Project Matters</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-what-is-object-storage"><strong>What is Object Storage?</strong></h2>
<p>To understand why our project is designed the way it is, we need to first understand object storage.</p>
<p>Traditional file storage systems save files in a hierarchy of folders, like your computer’s file explorer. Block storage systems, often used in databases, split data into chunks and manage them for speed and reliability.</p>
<p>Object storage is different. It treats each file, whether an image, video, or document, as a single object. Each object is stored with its metadata and a unique identifier inside a flat structure, usually called a bucket.</p>
<p>This flat architecture makes object storage scalable almost without limit. Instead of worrying about file paths or directories, you simply place an object in a bucket and get back an identifier.</p>
<p><a target="_blank" href="https://aws.amazon.com/s3/">Amazon S3</a> is the industry standard for object storage, offering massive scale, global replication, and advanced features, but it comes with added complexity and often unpredictable costs. Sevalla’s object storage, on the other hand, is designed for developers who want the same durability and scalability without the steep learning curve.</p>
<p>It provides a simpler setup, and is compatible with S3, so interacting with it is same as using a S3 bucket without the additional setup and complexity. While S3 is ideal for enterprises with petabytes of data, Sevalla’s solution is perfect for projects like image hosting, blogs, or mobile apps where ease of use and speed matter most.</p>
<h2 id="heading-what-we-will-be-building"><strong>What We Will Be Building</strong></h2>
<p>We will create a simple yet practical image hosting service. At its core, the service allows a user to send an image through an HTTP request. The server will accept this image, process it, and store it in object storage.</p>
<p>The usefulness of such a project goes far beyond a coding exercise. If you are building a blog, you could use this service to store images for your posts without worrying about file management on your web server.</p>
<p>If you are developing a mobile app that requires profile pictures or image sharing, this backend can serve as your foundation. Even if you simply want to understand how cloud-native applications handle file uploads, this project gives you a clear, hands-on experience.</p>
<p>By the end, you will not just have code running locally. We will deploy the application on Sevalla, meaning your image hosting service will be live, scalable, and accessible to anyone with a link.</p>
<h2 id="heading-how-to-set-up-the-project"><strong>How to Set Up the Project</strong></h2>
<p>Let us start by setting up a Node.js project. You can <a target="_blank" href="https://github.com/manishmshiva/image-host">clone this repository</a> if you don’t want to setup the project from scratch.</p>
<p>Create a new project directory, initialize it with npm, and install the required dependencies.</p>
<pre><code class="lang-plaintext">npm init -y
npm i express multer dotenv @aws-sdk/client-s3 @aws-sdk/s3-request-presigner
</code></pre>
<p>We will use <a target="_blank" href="https://expressjs.com/">Express</a> for our web server, <a target="_blank" href="https://www.npmjs.com/package/multer">Multer</a> for handling file uploads, and the <a target="_blank" href="https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/welcome.html">AWS SDK</a> to connect to object storage. Multer acts as middleware, giving us easy access to uploaded files. The AWS SDK gives us programmatic access to object storage, allowing us to upload files and generate links.</p>
<p>Let’s write a quick <code>index.html</code> and put it inside the <code>public/</code> directory to serve as the UI for file upload.</p>
<pre><code class="lang-xml"><span class="hljs-meta">&lt;!doctype <span class="hljs-meta-keyword">html</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">html</span> <span class="hljs-attr">lang</span>=<span class="hljs-string">"en"</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">head</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">meta</span> <span class="hljs-attr">charset</span>=<span class="hljs-string">"utf-8"</span> /&gt;</span> <span class="hljs-comment">&lt;!-- Set character encoding --&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">meta</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"viewport"</span> <span class="hljs-attr">content</span>=<span class="hljs-string">"width=device-width,initial-scale=1"</span> /&gt;</span> <span class="hljs-comment">&lt;!-- Mobile-friendly --&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">title</span>&gt;</span>Pic Host<span class="hljs-tag">&lt;/<span class="hljs-name">title</span>&gt;</span>

  <span class="hljs-comment">&lt;!-- Simple CSS styling for layout and form --&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">style</span>&gt;</span><span class="css">
    <span class="hljs-selector-pseudo">:root</span> { <span class="hljs-attribute">color-scheme</span>: light dark; } <span class="hljs-comment">/* Support dark/light themes */</span>
    <span class="hljs-selector-tag">body</span> { 
      <span class="hljs-attribute">font-family</span>: system-ui, sans-serif; 
      <span class="hljs-attribute">max-width</span>: <span class="hljs-number">560px</span>; 
      <span class="hljs-attribute">margin</span>: <span class="hljs-number">4rem</span> auto; 
      <span class="hljs-attribute">padding</span>: <span class="hljs-number">0</span> <span class="hljs-number">1rem</span>; 
    }
    <span class="hljs-selector-tag">h1</span> { <span class="hljs-attribute">font-size</span>: <span class="hljs-number">1.25rem</span>; <span class="hljs-attribute">margin-bottom</span>: <span class="hljs-number">1rem</span>; }
    <span class="hljs-selector-tag">form</span>, <span class="hljs-selector-class">.card</span> { 
      <span class="hljs-attribute">border</span>: <span class="hljs-number">1px</span> solid <span class="hljs-number">#9993</span>; 
      <span class="hljs-attribute">padding</span>: <span class="hljs-number">1rem</span>; 
      <span class="hljs-attribute">border-radius</span>: <span class="hljs-number">12px</span>; 
    }
    <span class="hljs-selector-tag">input</span><span class="hljs-selector-attr">[type=<span class="hljs-string">"file"</span>]</span> { <span class="hljs-attribute">margin</span>: .<span class="hljs-number">5rem</span> <span class="hljs-number">0</span> <span class="hljs-number">1rem</span>; }
    <span class="hljs-selector-tag">button</span> { 
      <span class="hljs-attribute">padding</span>: .<span class="hljs-number">6rem</span> <span class="hljs-number">1rem</span>; 
      <span class="hljs-attribute">border-radius</span>: <span class="hljs-number">10px</span>; 
      <span class="hljs-attribute">border</span>: <span class="hljs-number">1px</span> solid <span class="hljs-number">#9995</span>; 
      <span class="hljs-attribute">background</span>: <span class="hljs-number">#0000FF</span>; 
      <span class="hljs-attribute">cursor</span>: pointer; 
    }
    <span class="hljs-selector-id">#result</span> { <span class="hljs-attribute">margin-top</span>: <span class="hljs-number">1rem</span>; <span class="hljs-attribute">display</span>: none; }
    <span class="hljs-selector-id">#result</span> <span class="hljs-selector-tag">a</span> { <span class="hljs-attribute">word-break</span>: break-all; } <span class="hljs-comment">/* Break long URLs nicely */</span>
  </span><span class="hljs-tag">&lt;/<span class="hljs-name">style</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">head</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">body</span>&gt;</span>
  <span class="hljs-comment">&lt;!-- Page heading --&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">h1</span>&gt;</span>Simple Image Host<span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>

  <span class="hljs-comment">&lt;!-- Upload form --&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">form</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"uploadForm"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"card"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">label</span> <span class="hljs-attr">for</span>=<span class="hljs-string">"file"</span>&gt;</span>Choose image<span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span><span class="hljs-tag">&lt;<span class="hljs-name">br</span>/&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"file"</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"file"</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"file"</span> <span class="hljs-attr">accept</span>=<span class="hljs-string">"image/*"</span> <span class="hljs-attr">required</span> /&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">br</span>/&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"submit"</span>&gt;</span>Upload<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
    <span class="hljs-comment">&lt;!-- Status text (uploading, success, error) --&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"status"</span> <span class="hljs-attr">aria-live</span>=<span class="hljs-string">"polite"</span> <span class="hljs-attr">style</span>=<span class="hljs-string">"margin-top:.75rem;"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">form</span>&gt;</span>

  <span class="hljs-comment">&lt;!-- Result card: hidden until an image is uploaded --&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"result"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"card"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">strong</span>&gt;</span>Share this page:<span class="hljs-tag">&lt;/<span class="hljs-name">strong</span>&gt;</span> 
      <span class="hljs-tag">&lt;<span class="hljs-name">a</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"pageUrl"</span> <span class="hljs-attr">href</span>=<span class="hljs-string">"#"</span> <span class="hljs-attr">target</span>=<span class="hljs-string">"_blank"</span> <span class="hljs-attr">rel</span>=<span class="hljs-string">"noopener"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">a</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>

  <span class="hljs-comment">&lt;!-- Client-side JavaScript --&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">script</span>&gt;</span><span class="javascript">
    <span class="hljs-keyword">const</span> form = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">'uploadForm'</span>);   <span class="hljs-comment">// Form element</span>
    <span class="hljs-keyword">const</span> statusEl = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">'status'</span>);   <span class="hljs-comment">// Upload status</span>
    <span class="hljs-keyword">const</span> result = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">'result'</span>);     <span class="hljs-comment">// Result box</span>
    <span class="hljs-keyword">const</span> pageUrlEl = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">'pageUrl'</span>); <span class="hljs-comment">// Share link</span>
    <span class="hljs-keyword">const</span> directUrlEl = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">'directUrl'</span>); <span class="hljs-comment">// (unused here)</span>

    <span class="hljs-comment">// Event listener for form submission</span>
    form.addEventListener(<span class="hljs-string">'submit'</span>, <span class="hljs-keyword">async</span> (e) =&gt; {
      e.preventDefault(); <span class="hljs-comment">// Prevent full-page reload</span>
      statusEl.textContent = <span class="hljs-string">'Uploading...'</span>; 
      result.style.display = <span class="hljs-string">'none'</span>;

      <span class="hljs-keyword">const</span> fd = <span class="hljs-keyword">new</span> FormData(); <span class="hljs-comment">// FormData object for sending file</span>
      <span class="hljs-keyword">const</span> file = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">'file'</span>).files[<span class="hljs-number">0</span>];
      <span class="hljs-keyword">if</span> (!file) {
        statusEl.textContent = <span class="hljs-string">'Pick a file first.'</span>;
        <span class="hljs-keyword">return</span>;
      }
      fd.append(<span class="hljs-string">'file'</span>, file); <span class="hljs-comment">// Attach file to request</span>

      <span class="hljs-keyword">try</span> {
        <span class="hljs-comment">// Send file to backend /upload route</span>
        <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">'/upload'</span>, { <span class="hljs-attr">method</span>: <span class="hljs-string">'POST'</span>, <span class="hljs-attr">body</span>: fd });
        <span class="hljs-keyword">if</span> (!res.ok) <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">'Upload failed'</span>);
        <span class="hljs-keyword">const</span> data = <span class="hljs-keyword">await</span> res.json();

        <span class="hljs-comment">// Show returned page URL</span>
        pageUrlEl.textContent = data.pageUrl;
        pageUrlEl.href = data.pageUrl;

        <span class="hljs-comment">// Display result card and reset form</span>
        result.style.display = <span class="hljs-string">'block'</span>;
        statusEl.textContent = <span class="hljs-string">'Done!'</span>;
        form.reset();
      } <span class="hljs-keyword">catch</span> (err) {
        <span class="hljs-comment">// Handle error</span>
        statusEl.textContent = <span class="hljs-string">'Error: '</span> + err.message;
      }
    });
  </span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">body</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">html</span>&gt;</span>
</code></pre>
<p>When a user visits the page, they’ll see a simple upload form with a file picker. They can select an image from their computer and click Upload. Then JavaScript intercepts the form submission using <code>addEventListener('submit')</code>, prevents the browser from doing a full page refresh, and instead, packages the selected file into a <code>FormData</code> object.</p>
<p>That file is then sent to the server with a <code>fetch</code> call to the <code>/upload</code> route. If the server responds successfully, the JSON returned contains a <code>pageUrl</code>. This URL is displayed inside the result card, which was initially hidden. The user can now copy this link and share it with others.</p>
<p>If something goes wrong, like no file being selected, the server erroring out, or the upload failing, the script updates the status message to inform the user.</p>
<p>Here’s how it looks to the user.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757306506845/aed05c76-954e-4bae-a995-8efc2da89f10.jpeg" alt="Index.html" class="image--center mx-auto" width="1100" height="477" loading="lazy"></p>
<p>Now let’s create the backend using <code>server.js</code> file.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> path <span class="hljs-keyword">from</span> <span class="hljs-string">"path"</span>; <span class="hljs-comment">// For working with file paths</span>
<span class="hljs-keyword">import</span> express <span class="hljs-keyword">from</span> <span class="hljs-string">"express"</span>; <span class="hljs-comment">// Web framework to handle HTTP routes</span>
<span class="hljs-keyword">import</span> multer <span class="hljs-keyword">from</span> <span class="hljs-string">"multer"</span>; <span class="hljs-comment">// Middleware for handling file uploads</span>
<span class="hljs-keyword">import</span> crypto <span class="hljs-keyword">from</span> <span class="hljs-string">"crypto"</span>; <span class="hljs-comment">// Used to generate random unique IDs</span>
<span class="hljs-keyword">import</span> dotenv <span class="hljs-keyword">from</span> <span class="hljs-string">"dotenv"</span>; <span class="hljs-comment">// Loads environment variables from .env file</span>
<span class="hljs-keyword">import</span> { fileURLToPath } <span class="hljs-keyword">from</span> <span class="hljs-string">"url"</span>; <span class="hljs-comment">// For handling ES module file paths</span>
<span class="hljs-keyword">import</span> {
  S3Client,
  PutObjectCommand,
  HeadObjectCommand,
  GetObjectCommand,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/client-s3"</span>; <span class="hljs-comment">// AWS SDK commands for S3 operations</span>
<span class="hljs-keyword">import</span> { getSignedUrl } <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/s3-request-presigner"</span>; <span class="hljs-comment">// To generate temporary signed URLs</span>

dotenv.config(); <span class="hljs-comment">// Load environment variables</span>

<span class="hljs-comment">// Setup paths for __dirname and __filename in ES modules</span>
<span class="hljs-keyword">const</span> __filename = fileURLToPath(<span class="hljs-keyword">import</span>.meta.url);
<span class="hljs-keyword">const</span> __dirname = path.dirname(__filename);

<span class="hljs-comment">// Bucket name from environment</span>
<span class="hljs-keyword">const</span> S3_BUCKET = process.env.S3_BUCKET;

<span class="hljs-comment">// Create an S3 client (works with Sevalla-compatible storage as well)</span>
<span class="hljs-keyword">const</span> s3 = <span class="hljs-keyword">new</span> S3Client({
  <span class="hljs-attr">region</span>: <span class="hljs-string">"auto"</span>, <span class="hljs-comment">// Auto-region for Sevalla</span>
  <span class="hljs-attr">endpoint</span>: process.env.ENDPOINT, <span class="hljs-comment">// Custom endpoint for object storage</span>
  <span class="hljs-attr">credentials</span>: {
    <span class="hljs-attr">accessKeyId</span>: process.env.AWS_ACCESS_KEY_ID, <span class="hljs-comment">// From .env</span>
    <span class="hljs-attr">secretAccessKey</span>: process.env.AWS_SECRET_ACCESS_KEY, <span class="hljs-comment">// From .env</span>
  },
});

<span class="hljs-comment">// Initialize Express app</span>
<span class="hljs-keyword">const</span> app = express();

<span class="hljs-comment">// Serve static files (like index.html, CSS, JS) from "public" folder</span>
app.use(express.static(path.join(__dirname, <span class="hljs-string">"public"</span>)));

<span class="hljs-comment">// Multer setup: store uploaded files in memory (not on disk)</span>
<span class="hljs-comment">// Limit file size to 10MB</span>
<span class="hljs-keyword">const</span> upload = multer({
  <span class="hljs-attr">storage</span>: multer.memoryStorage(),
  <span class="hljs-attr">limits</span>: { <span class="hljs-attr">fileSize</span>: <span class="hljs-number">10</span> * <span class="hljs-number">1024</span> * <span class="hljs-number">1024</span> },
});

<span class="hljs-comment">// ---------- ROUTE 1: GET / ----------</span>
<span class="hljs-comment">// Serves the main HTML file (upload form)</span>
app.get(<span class="hljs-string">"/"</span>, <span class="hljs-function">(<span class="hljs-params">req, res</span>) =&gt;</span> {
  res.sendFile(path.join(__dirname, <span class="hljs-string">"public"</span>, <span class="hljs-string">"index.html"</span>));
});

<span class="hljs-comment">// ---------- ROUTE 2: POST /upload ----------</span>
<span class="hljs-comment">// Handles image uploads and stores them in object storage</span>
app.post(<span class="hljs-string">"/upload"</span>, upload.single(<span class="hljs-string">"file"</span>), <span class="hljs-keyword">async</span> (req, res) =&gt; {
  <span class="hljs-keyword">try</span> {
    <span class="hljs-comment">// Check if file exists</span>
    <span class="hljs-keyword">if</span> (!req.file) <span class="hljs-keyword">return</span> res.status(<span class="hljs-number">400</span>).json({ <span class="hljs-attr">error</span>: <span class="hljs-string">"file is required"</span> });

    <span class="hljs-comment">// Generate a random ID for the file</span>
    <span class="hljs-keyword">const</span> id = crypto.randomUUID().replace(<span class="hljs-regexp">/-/g</span>, <span class="hljs-string">""</span>);
    <span class="hljs-keyword">const</span> key = id;

    <span class="hljs-comment">// Create a PutObjectCommand to upload file to S3/Sevalla</span>
    <span class="hljs-keyword">const</span> put = <span class="hljs-keyword">new</span> PutObjectCommand({
      <span class="hljs-attr">Bucket</span>: S3_BUCKET,
      <span class="hljs-attr">Key</span>: key,
      <span class="hljs-attr">Body</span>: req.file.buffer,
      <span class="hljs-attr">ContentType</span>: req.file.mimetype,
      <span class="hljs-attr">Metadata</span>: {
        <span class="hljs-attr">originalname</span>: req.file.originalname || <span class="hljs-string">""</span>,
      },
    });

    <span class="hljs-comment">// Upload the file</span>
    <span class="hljs-keyword">await</span> s3.send(put);

    <span class="hljs-comment">// Build a page URL for retrieving the image later</span>
    <span class="hljs-keyword">const</span> baseUrl = <span class="hljs-string">`<span class="hljs-subst">${req.protocol}</span>://<span class="hljs-subst">${req.get(<span class="hljs-string">"host"</span>)}</span>`</span>;
    <span class="hljs-keyword">const</span> pageUrl = <span class="hljs-string">`<span class="hljs-subst">${baseUrl}</span>/i/<span class="hljs-subst">${id}</span>`</span>;

    <span class="hljs-comment">// Respond with the page URL</span>
    res.json({ id, pageUrl });
  } <span class="hljs-keyword">catch</span> (err) {
    <span class="hljs-built_in">console</span>.error(err);
    res.status(<span class="hljs-number">500</span>).json({ <span class="hljs-attr">error</span>: <span class="hljs-string">"upload_failed"</span> });
  }
});

<span class="hljs-comment">// ---------- ROUTE 3: GET /i/:id ----------</span>
<span class="hljs-comment">// Redirects to a signed URL for secure access to the uploaded file</span>
app.get(<span class="hljs-string">"/i/:id"</span>, <span class="hljs-keyword">async</span> (req, res) =&gt; {
  <span class="hljs-keyword">const</span> { id } = req.params;
  <span class="hljs-keyword">const</span> key = id;

  <span class="hljs-keyword">try</span> {
    <span class="hljs-comment">// Ensure the object exists in storage</span>
    <span class="hljs-keyword">await</span> s3.send(<span class="hljs-keyword">new</span> HeadObjectCommand({ <span class="hljs-attr">Bucket</span>: S3_BUCKET, <span class="hljs-attr">Key</span>: key }));

    <span class="hljs-comment">// Create a signed URL valid for 1 hour</span>
    <span class="hljs-keyword">const</span> command = <span class="hljs-keyword">new</span> GetObjectCommand({ <span class="hljs-attr">Bucket</span>: S3_BUCKET, <span class="hljs-attr">Key</span>: key });
    <span class="hljs-keyword">const</span> signedUrl = <span class="hljs-keyword">await</span> getSignedUrl(s3, command, { <span class="hljs-attr">expiresIn</span>: <span class="hljs-number">3600</span> });

    <span class="hljs-comment">// Redirect user to the signed URL</span>
    <span class="hljs-keyword">return</span> res.redirect(<span class="hljs-number">302</span>, signedUrl);
  } <span class="hljs-keyword">catch</span> (err) {
    <span class="hljs-built_in">console</span>.error(err);
    <span class="hljs-keyword">return</span> res.status(<span class="hljs-number">404</span>).send(<span class="hljs-string">"Not found"</span>);
  }
});

<span class="hljs-comment">// ---------- Boot the Server ----------</span>
app.listen(process.env.PORT || <span class="hljs-number">3000</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Image host server listening for requests...`</span>);
});
</code></pre>
<h3 id="heading-route-1-get">Route 1: <code>GET /</code></h3>
<p>This is the entry point of the app. When you open the browser and go to the root URL, it serves the <code>index.html</code> file from the <code>public</code> folder. That file contains the upload form where the user can select an image and submit it.</p>
<h3 id="heading-route-2-post-upload">Route 2: <code>POST /upload</code></h3>
<p>This is where the magic happens. When a user selects an image and clicks “Upload,” the file is sent to this endpoint. Multer handles the file upload in memory, and then the file is pushed to object storage using the <code>PutObjectCommand</code>. A random unique ID is generated as the key for the file. Once uploaded, the server responds with a <code>pageUrl</code> that can be used to view the uploaded image later.</p>
<h3 id="heading-route-3-get-iid">Route 3: <code>GET /i/:id</code></h3>
<p>This route retrieves an uploaded image. Instead of serving the file directly, it generates a signed URL valid for one hour using <code>getSignedUrl</code>. This signed URL gives temporary access to the file stored in object storage. The server then redirects the user to that signed URL. If the file doesn’t exist, it returns a 404 error.</p>
<p>Before you run this code, we need access to the object storage and add the value in an environment file. The code you see <code>process.env</code> fetches these values and helps us authenticate with the object storage to read and write files.</p>
<h2 id="heading-how-to-create-your-object-storage"><strong>How to Create Your Object Storage</strong></h2>
<p><a target="_blank" href="https://app.sevalla.com/login">Login</a> to Sevalla and click “Object Storage”. Click “Create Object Storage” and give it a name.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757306560384/3e88b143-2fa9-465d-b3d6-e0e54c90a6a3.jpeg" alt="Object Storage Creation" class="image--center mx-auto" width="1100" height="545" loading="lazy"></p>
<p>Once created, click “Settings” and you will see the access key and secret key. We need these four values</p>
<ul>
<li><p>Bucket name</p>
</li>
<li><p>Endpoint URL</p>
</li>
<li><p>Access Key</p>
</li>
<li><p>Secret Key</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757306618051/90970694-3d7c-486f-b32a-54c83ca88c7f.jpeg" alt="Object Storage Access Keys" class="image--center mx-auto" width="1100" height="354" loading="lazy"></p>
<p>Copy them into a file named <code>.env</code> within your project.</p>
<pre><code class="lang-plaintext">AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_ID_HERE
AWS_SECRET_ACCESS_KEY=YOUR_SECRET_ACCESS_KEY_HERE
S3_BUCKET=YOUR_BUCKET_NAME_HERE
ENDPOINT=YOUR_ENDPOINT_URL_HERE
</code></pre>
<p>Additionally, enable public access in the settings so that you can push files from your local environment.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757306660300/7abe369e-f820-4770-82d7-27da03c9b7a9.jpeg" alt="public access enabled" class="image--center mx-auto" width="1100" height="176" loading="lazy"></p>
<h3 id="heading-testing-the-application-locally"><strong>Testing the Application Locally</strong></h3>
<p>Let’s make sure our code works locally.</p>
<pre><code class="lang-bash">node server.js
</code></pre>
<p>Go to <a target="_blank" href="http://localhost:3000/">http://localhost:3000/</a> and try uploading a file. It should give you the URL to view the file after a successful upload.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757306699833/b95b69ed-17f6-4fe9-b0a8-22e15876655d.jpeg" alt="File upload success" class="image--center mx-auto" width="1100" height="610" loading="lazy"></p>
<p>You can visit the URL to see your uploaded file. You can also double check if it has been uploaded using the Object Storage UI.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757306733665/35944857-71bb-4d1a-9e11-85c35c875465.jpeg" alt="Object Storage UI" class="image--center mx-auto" width="1100" height="308" loading="lazy"></p>
<p>Great. We have built a simple image hosting and sharing service. Now let’s get this into the cloud.</p>
<h2 id="heading-how-to-deploy-your-project-on-sevalla"><strong>How to Deploy Your Project on Sevalla</strong></h2>
<p>First, push your project to GitHub or <a target="_blank" href="https://github.com/manishmshiva/image-host">fork my repository</a>. Then log in to your Sevalla dashboard and create a new application.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757306768439/3be4f9ac-abd4-4b98-95e3-22b97a3eea1a.jpeg" alt="Create application" class="image--center mx-auto" width="1100" height="767" loading="lazy"></p>
<p>Connect your GitHub account, choose the repository that contains your image hosting service, and select the branch you want to deploy. Sevalla will automatically detect that it is a Node.js project and install dependencies. It will also run the application on the specified port.</p>
<p>To configure AWS credentials and bucket information, go to the environment variables section in your app and add your <code>AWS_ACCESS_KEY_ID</code>, <code>AWS_SECRET_ACCESS_KEY</code>, <code>AWS_REGION</code>, and <code>S3_BUCKET_NAME</code>. These values will be injected into your application at runtime, ensuring that sensitive data is not hardcoded into your source code.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757306811113/b5e1782d-2bce-4b9e-a654-a131e58a44cd.jpeg" alt="Adding environment variables" class="image--center mx-auto" width="1100" height="500" loading="lazy"></p>
<p>Once environment variables are added, go to “Overview” and click “Deploy”.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757306855343/fbcfcd74-d74e-43ad-9b99-7f02421cf5df.jpeg" alt="fbcfcd74-d74e-43ad-9b99-7f02421cf5df" class="image--center mx-auto" width="1100" height="622" loading="lazy"></p>
<p>Wait for a few minutes. Once the deployment is complete, Sevalla will give you a live URL. Click “Visit APP” to go to your application’s page.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757306886378/97705a1f-c625-4282-9ef0-042c8c01b431.jpeg" alt="Live url" class="image--center mx-auto" width="1100" height="332" loading="lazy"></p>
<p>Congratulations! Your app is now live. You can share the URL with others or even add a custom domain to your app to have your own image hosting solution.</p>
<h2 id="heading-why-this-project-matters"><strong>Why This Project Matters</strong></h2>
<p>This project is more than just a coding exercise. It teaches you how modern applications manage files at scale, introduces you to object storage, and shows how to integrate cloud services into your own projects.</p>
<p>With Sevalla, you also learned how to deploy production-ready applications, giving you the full cycle from local prototype to live cloud service.</p>
<p>For developers building blogs, mobile apps, or even internal tools, the ability to host images reliably and at scale is invaluable. With object storage and a simple Node.js service, you can avoid reinventing the wheel and rely on proven cloud infrastructure.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>We began by exploring object storage and why it is ideal for handling files like images. We then built a Node.js application that accepts uploads, stores them in Sevalla’s Object Storage, and returns accessible URLs. Finally, we deployed the application on Sevalla, turning a local project into a live image hosting service. Along the way, you gained not only working code but also a deeper understanding of how to build cloud-native services.</p>
<p>By completing this project, you now have a working image hosting service you can extend and adapt. You could add features like authentication, image resizing, or even a better front-end interface with drag-and-drop UI. Most importantly, you have experienced how development and deployment fit together in modern software.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use Nano Banana for Image Generation - Explained with Code Examples ]]>
                </title>
                <description>
                    <![CDATA[ AI is changing the image generation and editing process into a smooth workflow. Now, with just a single prompt, you can tell your computer to generate or edit an existing image. Google just launched its new model for image generation or editing, "Nan... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/nano-banana-for-image-generation/</link>
                <guid isPermaLink="false">68cd5897ffbf18457f7bb85a</guid>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tarun Singh ]]>
                </dc:creator>
                <pubDate>Fri, 19 Sep 2025 13:20:23 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1758287738949/b33b68f4-0e84-46df-a85f-9ff6aacfd72c.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>AI is changing the image generation and editing process into a smooth workflow. Now, with just a single prompt, you can tell your computer to generate or edit an existing image. Google just launched its new model for image generation or editing, <a target="_blank" href="https://gemini.google/overview/image-generation/">"Nano Banana" – Gemini 2.5 Flash</a>. It's a powerful, nimble tool that's changing how we think about image generation and manipulation, and it's something you'll definitely want in your developer toolkit.</p>
<p>In this article, you will learn how to use “Nano Banana” for Image Generation using Gemini’s 2.5 Flash Image. So, let’s get started!</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-nano-banana">What is "Nano Banana"?</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-why-nano-banana">Why "Nano Banana"?</a></li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-setting-up-your-project">Setting Up Your Project</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-get-an-api-key-from-google-gemini">Step 1: Get an API key from Google Gemini</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-install-the-sdk-and-other-dependencies">Step 2: Install the SDK and Other Dependencies</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-set-up-your-environment">Step 3: Set Up Your Environment</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-image-generation-amp-editing">Step 4: Image Generation &amp; Editing</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-beyond-the-basics-what-else-can-you-do">Beyond the Basics: What Else Can You Do?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-wrapping-up">Wrapping Up</a></p>
</li>
</ul>
<h2 id="heading-what-is-nano-banana">What is "Nano Banana"?</h2>
<p>Nano Banana is the latest image-editing cum generation tool from Google DeepMind. Forget the formal jargon for a second. Imagine you have an incredibly talented, lightning-fast artist at your beck and call. You can describe <em>anything</em> to them – "an astronaut riding a horse on the Moon" – and <em>poof</em>, it appears. Or, you hand them a picture of your dog and say, "Make the dog wear a cap on his head," and they do it instantly, keeping your cat looking like <em>your</em> dog.</p>
<p>That's essentially Nano Banana. It's an advanced AI model from the Gemini family, specifically engineered for rapid, intelligent image generation and nuanced editing. It understands your natural language commands, enabling you to bring complex visual ideas to life or make surgical changes to existing images with surprising ease.</p>
<h3 id="heading-why-nano-banana">Why "Nano Banana"?</h3>
<p>Because it's small (flash!), packed with goodness, and leaves you feeling like you just peeled back a new layer of creative possibility. It's fast, efficient, and incredibly versatile.</p>
<p><strong>The Superpowers You Get:</strong></p>
<ul>
<li><p><strong>Prompt-Perfect Editing:</strong> Want to change a background, alter a pose, or add a specific object? Just ask. Nano Banana understands and executes.</p>
</li>
<li><p><strong>Character Consistency:</strong> This is a big one. If you're creating a story or a series of images, maintaining the look of a specific character or object is crucial. Nano Banana excels at this, ensuring your protagonist looks the same whether they're in a forest or on the moon.</p>
</li>
<li><p><strong>Visual Mashups (Multi-Image Fusion):</strong> Got a few different visual elements you want to combine seamlessly? It can blend them into a cohesive new image.</p>
</li>
</ul>
<p>and much more!</p>
<p>Interested? Let's get our hands dirty. But wait! To use “Nano Banana, “ you have two ways to do this:</p>
<ol>
<li><p><a target="_blank" href="https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-image-preview">Using Google AI Studio</a>: The simplest and easiest way to generate or edit images in Google Studio. This is a web-based tool that gives you direct access to the Gemini models without writing a single line of code. It's the absolute best place to test and start, and is useful for developers and non-developers, also. Also, there's no need to install libraries, manage API keys, or write any code</p>
</li>
<li><p><strong>Building with the Gemini API:</strong> This is beneficial if you want more custom solutions for your application. For any serious application—whether it's a web app, a mobile app, or a backend service—you'll need to integrate directly with the Gemini API. This is where the real power lies, as it allows you to automate tasks and create interactive experiences.</p>
</li>
</ol>
<p>In this tutorial, you will see how we can use this tool in our own applications, using nothing but Python. So, let’s get started.</p>
<h2 id="heading-how-to-set-up-your-project">How to Set Up Your Project</h2>
<h3 id="heading-step-1-get-an-api-key-from-google-gemini">Step 1: Get an API key from Google Gemini</h3>
<p>The very first step for using “Nano Banana” is to get an API key. Head over to <a target="_blank" href="https://aistudio.google.com/apikey">Google AI Studio</a>, click on “Create API key“, and generate a new one by specifying a project from your existing Google Cloud projects.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757429573699/1c5d1a52-2e63-476b-a957-604542044fc7.png" alt="API key generated from Google Gemini " class="image--center mx-auto" width="1896" height="903" loading="lazy"></p>
<p>Once you have generated an API key, save it securely somewhere.</p>
<h3 id="heading-step-2-install-the-sdk-and-other-dependencies">Step 2: Install the SDK and Other Dependencies</h3>
<p>Open your terminal and run:</p>
<pre><code class="lang-bash">pip install google-generativeai pillow python-dotenv
</code></pre>
<p>We’ll use <code>Pillow</code> for easy image handling and <code>python-dotenv</code> to safely manage our API key.</p>
<h3 id="heading-step-3-set-up-your-environment">Step 3: Set Up Your Environment</h3>
<p>It’s crucial to keep your API key out of your code for security. For this, we usually use environment variables. So, create a file named <code>.env</code> in your project root and add your API key:</p>
<pre><code class="lang-bash">GEMINI_API_KEY=<span class="hljs-string">"YOUR_API_KEY_HERE"</span>
</code></pre>
<h3 id="heading-step-4-image-generation-amp-editing">Step 4: Image Generation &amp; Editing</h3>
<p><strong>Example 1: Text-to-Image Generation</strong></p>
<p>Text-to-Image is like an artist who can draw anything you describe. In this, you simply write the prompt (a sentence or a description), even a very detailed one, and the AI will generate a unique, high-quality image that matches your description. It’s perfect for bringing your most imaginative ideas to life with just a few words.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> google.generativeai <span class="hljs-keyword">as</span> genai
<span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
<span class="hljs-keyword">from</span> io <span class="hljs-keyword">import</span> BytesIO
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

<span class="hljs-comment"># Configuration</span>
load_dotenv()
genai.configure(api_key=os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>))
model = genai.GenerativeModel(<span class="hljs-string">'gemini-2.5-flash-image-preview'</span>)

<span class="hljs-comment"># Prompt, Image, and Response Setup</span>
prompt = <span class="hljs-string">"A golden retriever puppy sitting in a field of daisies, bright and cheerful"</span>
output_filename = <span class="hljs-string">"text_to_image_result.png"</span>

<span class="hljs-comment"># saving image helper function from text prompt response</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">save_image_from_response</span>(<span class="hljs-params">response, filename</span>):</span>
    <span class="hljs-string">"""Helper function to save the image from the API response."""</span>
    <span class="hljs-keyword">if</span> response.candidates <span class="hljs-keyword">and</span> response.candidates[<span class="hljs-number">0</span>].content.parts:
        <span class="hljs-keyword">for</span> part <span class="hljs-keyword">in</span> response.candidates[<span class="hljs-number">0</span>].content.parts:
            <span class="hljs-keyword">if</span> part.inline_data:
                image_data = BytesIO(part.inline_data.data)
                img = Image.open(image_data)
                img.save(filename)
                print(<span class="hljs-string">f"Image successfully saved as <span class="hljs-subst">{filename}</span>"</span>)
                <span class="hljs-keyword">return</span> filename
    print(<span class="hljs-string">"No image data found in the response."</span>)
    <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    print(<span class="hljs-string">f"Generating image for prompt: '<span class="hljs-subst">{prompt}</span>'..."</span>)
    response = model.generate_content(prompt)
    save_image_from_response(response, output_filename)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<p><strong>Output:</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757485705896/50484418-c53c-4d61-8846-2c8875dc2cbd.png" alt="A golden retriever puppy sitting happily in a sunny meadow filled with white daisies, surrounded by bright green grass and a cheerful, vibrant atmosphere." class="image--center mx-auto" width="1024" height="1024" loading="lazy"></p>
<p>The code used in the example handles everything needed to communicate with the Gemini API and save the image.</p>
<ul>
<li><p>First, we import the required libraries and load the API key from <code>.env</code> using <code>load_dotenv()</code>. This makes the key available so we can connect to Google’s service with <code>genai.configure()</code>.</p>
</li>
<li><p>The model we’re using is <code>gemini-2.5-flash-image-preview</code>, which is designed for fast image generation.</p>
</li>
<li><p>We define a <code>prompt</code> <code>(“A golden retriever puppy...”)</code> and a filename for saving the image.</p>
</li>
<li><p>The helper function <code>save_image_from_response(...)</code> looks at the API’s response, extracts the raw image data, and saves it as a PNG file.</p>
</li>
<li><p>In <code>main()</code>, we call the model with the prompt, then pass the response to the helper function to save the result.</p>
</li>
<li><p>The <code>if __name__ == "__main__":</code> block ensures the script runs only when executed directly, not when imported.</p>
</li>
</ul>
<p><strong>Example 2: Image-to-Image Editing</strong></p>
<p>Image-to-Image is like a photo editor. Instead of starting from scratch, you can upload an existing picture and describe how to change it. For instance, you can request background removal, addition of new objects, or even a complete artistic style change.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> google.generativeai <span class="hljs-keyword">as</span> genai
<span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
<span class="hljs-keyword">from</span> io <span class="hljs-keyword">import</span> BytesIO
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

<span class="hljs-comment"># Configuration</span>
load_dotenv()
genai.configure(api_key=os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>))
model = genai.GenerativeModel(<span class="hljs-string">'gemini-2.5-flash-image-preview'</span>)

<span class="hljs-comment"># Prompt, Image, and Response Setup</span>
input_image_path = <span class="hljs-string">"input_dog.png"</span>
prommpt = <span class="hljs-string">"Make the dog wear a small wizard hat and spectacles."</span>
output_filename = <span class="hljs-string">"edited_image_result.png"</span>

<span class="hljs-comment"># saving image helper function from text prompt response</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">save_image_from_response</span>(<span class="hljs-params">response, filename</span>):</span>
    <span class="hljs-string">"""Helper function to save the image from the API response."""</span>
    <span class="hljs-keyword">if</span> response.candidates <span class="hljs-keyword">and</span> response.candidates[<span class="hljs-number">0</span>].content.parts:
        <span class="hljs-keyword">for</span> part <span class="hljs-keyword">in</span> response.candidates[<span class="hljs-number">0</span>].content.parts:
            <span class="hljs-keyword">if</span> part.inline_data:
                image_data = BytesIO(part.inline_data.data)
                img = Image.open(image_data)
                img.save(filename)
                print(<span class="hljs-string">f"Image successfully saved as <span class="hljs-subst">{filename}</span>"</span>)
                <span class="hljs-keyword">return</span> filename
    print(<span class="hljs-string">"No image data found in the response."</span>)
    <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    print(<span class="hljs-string">f"Editing image '<span class="hljs-subst">{input_image_path}</span>' with prompt: '<span class="hljs-subst">{prommpt}</span>'..."</span>)
    <span class="hljs-keyword">try</span>:
        img_to_edit = Image.open(input_image_path)
        response = model.generate_content([prommpt, img_to_edit])
        save_image_from_response(response, output_filename)
    <span class="hljs-keyword">except</span> FileNotFoundError:
        print(<span class="hljs-string">f"Error: The file '<span class="hljs-subst">{input_image_path}</span>' was not found."</span>)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<p><strong>Output:</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757486336530/84cba4bf-91bd-49b7-8fd3-e94b20eabbfb.png" alt="A before and after image of a playful dog wearing a small pointed wizard hat and round spectacles, sitting upright with a charming and magical look, giving a whimsical, storybook-like feel." class="image--center mx-auto" width="1250" height="627" loading="lazy"></p>
<p>This code is very similar to the first example, but the key difference is in the core logic.</p>
<ul>
<li><p><code>input_image_path</code>: This variable now holds the file path to the image you want to edit.</p>
</li>
<li><p><a target="_blank" href="http://Image.open"><code>Image.open</code></a><code>(input_image_path)</code>: This line uses the Pillow library to open your local image file to be used.</p>
</li>
<li><p><code>model.generate_content([prommpt, img_to_edit])</code>: This is the most important part. Unlike before, we now pass a list to the <code>generate_content</code> function that contains both the text prompt and the image object. This tells the API to use the provided image as a starting point for its generation.</p>
</li>
<li><p><code>try...except</code> block: Here, we are handling the errors. It tries to open the image file, and if it fails (because the file isn't there), it will <code>except</code> the <code>FileNotFoundError</code> and print a friendly message to the user instead of crashing.</p>
</li>
</ul>
<p><strong>Example 3: Multi-Image Fusion</strong></p>
<p>Multi-image fusion is like merging two or more images or objects. Upload several images and instruct the AI to blend them into one composite picture seamlessly. This is a tool for creating new scenes, combining people and backgrounds, or creating detailed product mockups.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> google.generativeai <span class="hljs-keyword">as</span> genai
<span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
<span class="hljs-keyword">from</span> io <span class="hljs-keyword">import</span> BytesIO
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

<span class="hljs-comment"># Configuration</span>
load_dotenv()
genai.configure(api_key=os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>))
model = genai.GenerativeModel(<span class="hljs-string">'gemini-2.5-flash-image-preview'</span>)

<span class="hljs-comment"># Prompt, Images, and Response Setup</span>
image1_path = <span class="hljs-string">"dog_image.png"</span>
image2_path = <span class="hljs-string">"cap_image.png"</span>
prompt = <span class="hljs-string">"Make the dog from the first image wear the cap from the second image. The cap should fit realistically on the dog's head."</span>
output_filename = <span class="hljs-string">"dog_with_cap_result.png"</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">save_image_from_response</span>(<span class="hljs-params">response, filename</span>):</span>
    <span class="hljs-string">"""Helper function to save the image from the API response."""</span>
    <span class="hljs-keyword">if</span> response.candidates <span class="hljs-keyword">and</span> response.candidates[<span class="hljs-number">0</span>].content.parts:
        <span class="hljs-keyword">for</span> part <span class="hljs-keyword">in</span> response.candidates[<span class="hljs-number">0</span>].content.parts:
            <span class="hljs-keyword">if</span> part.inline_data:
                image_data = BytesIO(part.inline_data.data)
                img = Image.open(image_data)
                img.save(filename)
                print(<span class="hljs-string">f"Image successfully saved as <span class="hljs-subst">{filename}</span>"</span>)
                <span class="hljs-keyword">return</span> filename
    print(<span class="hljs-string">"No image data found in the response."</span>)
    <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    print(<span class="hljs-string">f"Fusing images '<span class="hljs-subst">{image1_path}</span>' and '<span class="hljs-subst">{image2_path}</span>'..."</span>)
    <span class="hljs-keyword">try</span>:
        img1 = Image.open(image1_path)
        img2 = Image.open(image2_path)
        response = model.generate_content([prompt, img1, img2])
        save_image_from_response(response, output_filename)
    <span class="hljs-keyword">except</span> FileNotFoundError:
        print(<span class="hljs-string">"Error: One or both image files were not found."</span>)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<p><strong>Output:</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757486318798/2fe3ca32-3053-44cc-9350-b1c47abacdd9.png" alt="Three-part image showing a golden retriever puppy in a daisy field, a red baseball cap with the letter “A,” and the final edited version where the puppy is wearing the red cap while sitting happily among the daisies" class="image--center mx-auto" width="1920" height="1080" loading="lazy"></p>
<p>The logic of the code above is an extension of the Image-to-Image example.</p>
<ul>
<li><p><code>image1_path</code> and <code>image2_path</code>: These variables hold the paths to the two images you want to fuse or merge.</p>
</li>
<li><p><code>model.generate_content([prompt, img1, img2])</code>: Here, the list passed to the <code>generate_content</code> function contains three items: the text prompt and both image objects. This tells the AI to use the prompt to combine the elements from both images into a single output.</p>
</li>
</ul>
<p><strong>Example 4: Image Restoration</strong></p>
<p>This feature can restore old, faded, or damaged photos. Upload a picture and request Gemini to restore it. This includes sharpening low-quality images, colorizing old black-and-white photos, and enhancing textures, which can make your memories look new again.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> google.generativeai <span class="hljs-keyword">as</span> genai
<span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
<span class="hljs-keyword">from</span> io <span class="hljs-keyword">import</span> BytesIO
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

<span class="hljs-comment"># Configuration</span>
load_dotenv()
genai.configure(api_key=os.getenv(<span class="hljs-string">"GEMINI_API_KEY"</span>))
model = genai.GenerativeModel(<span class="hljs-string">'gemini-2.5-flash-image-preview'</span>)

<span class="hljs-comment"># Prompt, Image, and Response Setup</span>
input_image_path = <span class="hljs-string">"old_photo.png"</span>
prompt = <span class="hljs-string">"Restore this old, faded photograph. Sharpen the details, remove any scratches or damage, and enhance the colors to make it look like a new, high-quality photo."</span>
output_filename = <span class="hljs-string">"restored_image_result.png"</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">save_image_from_response</span>(<span class="hljs-params">response, filename</span>):</span>
    <span class="hljs-string">"""Helper function to save the image from the API response."""</span>
    <span class="hljs-keyword">if</span> response.candidates <span class="hljs-keyword">and</span> response.candidates[<span class="hljs-number">0</span>].content.parts:
        <span class="hljs-keyword">for</span> part <span class="hljs-keyword">in</span> response.candidates[<span class="hljs-number">0</span>].content.parts:
            <span class="hljs-keyword">if</span> part.inline_data:
                image_data = BytesIO(part.inline_data.data)
                img = Image.open(image_data)
                img.save(filename)
                print(<span class="hljs-string">f"Image successfully saved as <span class="hljs-subst">{filename}</span>"</span>)
                <span class="hljs-keyword">return</span> filename
    print(<span class="hljs-string">"No image data found in the response."</span>)
    <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    print(<span class="hljs-string">f"Attempting to restore image: '<span class="hljs-subst">{input_image_path}</span>'..."</span>)
    <span class="hljs-keyword">try</span>:
        old_photo = Image.open(input_image_path)
        response = model.generate_content([prompt, old_photo])
        save_image_from_response(response, output_filename)
    <span class="hljs-keyword">except</span> FileNotFoundError:
        print(<span class="hljs-string">f"Error: The file '<span class="hljs-subst">{input_image_path}</span>' was not found."</span>)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<p><strong>Output:</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757486506412/201bf046-4c63-46eb-a026-f2a432ca8c3d.png" alt="Side-by-side comparison of an old photograph and a restored version. The left shows a scratched, sepia-toned photo of a vintage car on a rural road, while the right shows the same scene digitally restored in color, with a blue classic car under a bright sky in a golden countryside" class="image--center mx-auto" width="1167" height="572" loading="lazy"></p>
<p>The structure here is identical to the Image-to-Image Editing example because, from a technical perspective, image restoration is a form of image-to-image editing.</p>
<ul>
<li>Now the <code>prompt</code> is where the magic happens. The text prompt explicitly tells the model what to do with the image, outlining the restoration steps like "sharpen the details," "remove scratches," and "enhance the colors." The model's intelligence allows it to understand these abstract instructions and apply them to the visual data to give you a better and a realistic update to your old image.</li>
</ul>
<h2 id="heading-beyond-the-basics-what-else-can-you-do">Beyond the Basics: What Else Can You Do?</h2>
<p>This is just the tip of the iceberg! Nano Banana is incredibly versatile. Here are some ideas for where you can take your projects:</p>
<ul>
<li><p><strong>Batch Processing:</strong> Automate the generation of multiple images from a list of prompts.</p>
</li>
<li><p><strong>Creative Assets:</strong> Design icons, backgrounds, or character sprites for games or apps directly from your Python script.</p>
</li>
<li><p><strong>Data Processing:</strong> Integrate Nano Banana into a data pipeline to programmatically edit or generate images based on data inputs.</p>
</li>
<li><p><strong>AI Art Galleries:</strong> Build a backend service that allows users to submit prompts and receive images.</p>
</li>
</ul>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>"Nano Banana" (Gemini 2.5 Flash Image) isn't just a cool tech tool; it's a practical, powerful tool for developers and creatives alike. With just a few lines of code, you can tap into its capabilities and bring your visual ideas to real life. This streamlined approach makes it easy to get started, experiment, and integrate this visual magic into your projects.</p>
<p>If you found this article helpful and want to discuss AI development, LLMs, or software development, feel free to connect with me on <a target="_blank" href="https://x.com/itsTarun24">X/Twitter</a>, <a target="_blank" href="https://www.linkedin.com/in/tarunsingh24">LinkedIn</a>, or check out my portfolio on my <a target="_blank" href="http://tarunportfolio.vercel.app/blog">Blog</a>. I regularly share insights about AI, development, technical writing, and much more.</p>
<p>Happy coding, and may your creations be as vibrant as a field of fresh bananas!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Enhance Images with Neural Networks ]]>
                </title>
                <description>
                    <![CDATA[ Artificial intelligence is changing how we work with images. What once took hours in Photoshop can now happen in seconds with AI-powered tools. You can take a blurry picture, enlarge it without losing sharpness, fix the lighting, remove unwanted nois... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-enhance-images-with-neural-networks/</link>
                <guid isPermaLink="false">68b8e1073c8fb81fc2265eef</guid>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Thu, 04 Sep 2025 00:44:55 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1756858495684/2742e9b0-87f8-47bf-a01d-2e979e4dfb35.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Artificial intelligence is changing how we work with images. What once took hours in Photoshop can now happen in seconds with AI-powered tools. You can take a blurry picture, enlarge it without losing sharpness, fix the lighting, remove unwanted noise, or even bring color to a black-and-white photo, all with a single click.</p>
<p>The magic you see in these tools is powered by algorithms which are trained AI models that understand how images should look and then reconstruct them accordingly. These models have studied millions of examples to learn patterns, textures, and details, so they can “predict” what’s missing and fill it in naturally.</p>
<p>For developers, photographers, and content creators, knowing the basics of these algorithms can help you pick the right tools for your workflow. Even if you never plan to code an AI model yourself, this knowledge will help you make better choices for image processing, web apps, or creative projects.</p>
<p>Let’s look at five of the most important algorithms used in AI image enhancement today. Along the way, you’ll see real-world tools that use these algorithms and how you can try them yourself.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents</strong></h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-image-colorization">Image Colorization</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-gan-based-image-enhancement">GAN-Based Image Enhancement</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-noise-reduction-denoising-autoencoders">Noise Reduction (Denoising Autoencoders)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-image-upscaling-using-super-resolution">Image Upscaling using Super-Resolution</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-artifact-removal">Artifact Removal</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-these-algorithms-matter-to-developers">Why These Algorithms Matter to Developers</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-image-colorization"><strong>Image Colorization</strong></h2>
<p>Automatic image colorization might be the most visually dramatic AI enhancement of all. It takes a black-and-white image and predicts the colors that should be there, often producing results that look like the photo was taken in full color.</p>
<p>The AI behind this uses <a target="_blank" href="https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns">convolutional neural networks</a> (CNNs) trained on huge datasets of color images. The model sees both the grayscale and the color versions during training, so it learns how certain objects typically appear. For example, it might learn that grass is usually green, the sky is often blue, and human skin falls within a certain range of tones.</p>
<p><img src="https://images-wixmp-ed30a86b8c4ca887773594c2.wixmp.com/f/3f1ef7e7-b08b-4251-ae26-9c4a8646a85a/de2k3n6-e04b7996-7c6d-437d-bca7-16aee0c061f6.png?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1cm46YXBwOjdlMGQxODg5ODIyNjQzNzNhNWYwZDQxNWVhMGQyNmUwIiwiaXNzIjoidXJuOmFwcDo3ZTBkMTg4OTgyMjY0MzczYTVmMGQ0MTVlYTBkMjZlMCIsIm9iaiI6W1t7InBhdGgiOiJcL2ZcLzNmMWVmN2U3LWIwOGItNDI1MS1hZTI2LTljNGE4NjQ2YTg1YVwvZGUyazNuNi1lMDRiNzk5Ni03YzZkLTQzN2QtYmNhNy0xNmFlZTBjMDYxZjYucG5nIn1dXSwiYXVkIjpbInVybjpzZXJ2aWNlOmZpbGUuZG93bmxvYWQiXX0.UJn-AuEJzCsQtiSanUT9M7j6rac6d_8T-goaCiMY2KA" alt="Image Colorization" width="600" height="400" loading="lazy"></p>
<p>One of the most famous models is DeOldify, which combines CNNs with GANs. The GAN setup helps refine the results, making colors more natural and avoiding strange or overly bright tones.</p>
<p>Colorization has practical uses beyond restoring old family photos. It’s used in film restoration, historical projects, digital storytelling, and even concept art.</p>
<p>See <a target="_blank" href="https://www.canva.com/features/colorize-black-and-white/">Image Colorization</a> in action.</p>
<h2 id="heading-gan-based-image-enhancement"><strong>GAN-Based Image Enhancement</strong></h2>
<p>GANs, or <a target="_blank" href="https://developers.google.com/machine-learning/gan/gan_structure">Generative Adversarial Networks</a>, are one of the most powerful AI techniques in image enhancement. They consist of two neural networks: the generator, which tries to create realistic-looking images, and the discriminator, which evaluates them. Over many iterations, the generator becomes extremely good at producing images that pass as real.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756865306217/cc30de30-3124-4a5c-bcc5-75827ec92c6d.png" alt="Image Enhancement" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>In image retouching, GANs can handle many tasks at once, like fixing lighting, improving sharpness, enhancing textures, and even subtly changing elements to make the picture more appealing. Because GANs learn from real-world images, the results often feel more natural than traditional editing filters.</p>
<p>GAN-based retouching is used in professional portrait editing, e-commerce product photos, real estate listings, and even game asset creation. It’s also behind many “one-click enhance” buttons you see in modern apps.</p>
<p>See a GAN powered <a target="_blank" href="https://www.artguru.ai/photo-enhancer/">photo enhancer</a> here.</p>
<h2 id="heading-noise-reduction-denoising-autoencoders"><strong>Noise Reduction (Denoising Autoencoders)</strong></h2>
<p>Noise in images looks like random specks of color or brightness that shouldn’t be there. It often happens in low-light photos or in images taken with high ISO settings. Noise makes photos look grainy and less professional.</p>
<p>Traditional noise removal methods simply blurs the image to hide the noise, but this also destroyed fine details. AI noise reduction works differently.</p>
<p><a target="_blank" href="https://www.geeksforgeeks.org/machine-learning/denoising-autoencoders-in-machine-learning/">Denoising Autoencoders</a>, one of the most common approaches, learn from pairs of images—one clean and one noisy. The AI studies how noise distorts details, then learns to reverse the process.</p>
<p><img src="https://uk.mathworks.com/discovery/denoising/_jcr_content/mainParsys/columns/e4e497e4-fa5c-49a0-afff-3e840fe0a8ca/image.adapt.full.medium.jpg/1743063756357.jpg" alt="Image denoising" width="600" height="400" loading="lazy"></p>
<p>When you pass a noisy photo through a denoising autoencoder, it removes the noise while preserving edges, textures, and important small details.</p>
<p>Noise reduction isn’t just for photography. It’s also used in document scanning to make text easier to read, medical imaging to clarify scans, cleaning up screenshots or UI mockups for presentations</p>
<p>See <a target="_blank" href="https://www.pica-ai.com/resource/denoise-image/">Noise Reduction</a> in action here.</p>
<h2 id="heading-image-upscaling-using-super-resolution"><strong>Image Upscaling using Super-Resolution</strong></h2>
<p>Super-resolution is the process of increasing the resolution of an image to make it sharper and larger without simply stretching the pixels.</p>
<p>In the past, enlarging a small image just made it blurry. AI super-resolution works differently. It studies the image, detects patterns, and then generates new pixels that match what would have been there in a higher-quality original.</p>
<p>One of the first big breakthroughs was <a target="_blank" href="https://medium.com/coinmonks/review-srcnn-super-resolution-3cb3a4f67a7c">SRCNN</a> (Super-Resolution Convolutional Neural Network). SRCNN works by breaking the image into patches, analyzing them, and then predicting what higher-resolution patches should look like. This early approach was effective but sometimes produced overly smooth images.</p>
<p>Then came <a target="_blank" href="https://esrgan.readthedocs.io/en/latest/">ESRGAN</a> (Enhanced Super-Resolution Generative Adversarial Network), which took things further. ESRGAN uses a GAN architecture, a generator creates enhanced images, while a discriminator judges how real they look. Through this back-and-forth training, the generator learns to produce fine textures like hair strands, fabric weaves, or building details that look realistic to the human eye.</p>
<p><img src="https://www.any-video-converter.com/images2020/article/convert-low-resolution-image-to-high-resolution-online.jpg" alt="Image Upscaling" width="600" height="400" loading="lazy"></p>
<p>Super-resolution is widely used in e-commerce (for clearer product photos), printing (turning web images into high-resolution posters), and web apps (making user-uploaded images look professional).</p>
<p>See Super resolution powered <a target="_blank" href="https://www.artguru.ai/image-upscaler/">image upscaler</a> in action.</p>
<h2 id="heading-artifact-removal"><strong>Artifact Removal</strong></h2>
<p>When a JPEG image is heavily compressed, it develops blocky patches, fuzzy edges, and strange halos around lines. These are called compression artifacts, and they appear because JPEG reduces file size by removing fine detail. Traditional fixes blur the image to hide these defects, but that also softens important edges and textures.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756465727105/b74f2d5f-c489-4238-a073-72ce86a5a4a7.png" alt="JPEG Aartifact Removal" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><a target="_blank" href="https://github.com/jiaxi-jiang/FBCNN">FBCNN</a>, or Flexible Blind Convolutional Neural Network, takes a smarter approach. Instead of needing to know the exact compression level beforehand, FBCNN is trained to handle a wide range of artifact severities without extra input. This is what makes it “blind”, it doesn’t require metadata about how the JPEG was compressed. It can adapt its restoration process on the fly.</p>
<p>FBCNN works in two main steps. First, it extracts features from the image, analyzing patterns in edges, textures, and flat areas to identify where artifacts are most likely. Then, it applies a learned mapping to reconstruct what those regions should look like without the damage.</p>
<p>Because it can estimate the compression quality itself, FBCNN avoids the common problem of over-smoothing lightly compressed images or under-restoring heavily compressed ones.</p>
<p>This flexibility makes FBCNN useful in many scenarios: cleaning up low-quality images from social media, restoring graphics and text in screenshots, or preparing old compressed web images for printing. Modern AI tools often integrate FBCNN-style processing as a first step before applying super-resolution or general enhancement.</p>
<p>FBCNN’s ability to adapt without manual tuning makes it one of the most practical and developer-friendly models for real-world JPEG restoration today.</p>
<p>See <a target="_blank" href="https://huggingface.co/spaces/KenjieDec/FBCNN">artifact removal</a> in action.</p>
<h2 id="heading-why-these-algorithms-matter-to-developers"><strong>Why These Algorithms Matter to Developers</strong></h2>
<p>Even if you have never trained your own AI model, understanding these algorithms gives you a better sense of what’s possible and how to apply it. Many of the tools mentioned here offer APIs, which means developers can build them into their own apps and websites.</p>
<p>If you run a social platform, you can automatically enhance user-uploaded images before they appear in feeds. If you build e-commerce platforms, you can clean and upscale product images for better sales conversions. If you work in media archiving, you can restore and preserve images without spending hours on manual edits.</p>
<p>The real value comes from knowing which algorithm is right for the problem you’re solving. Super-resolution for enlarging, denoising for cleaning, colorization for restoration, artifact removal for fixing compression, and GAN retouching for overall beautification.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>AI image enhancement has moved from research labs to everyday tools, making it possible for anyone to transform low-quality images into something sharp, vibrant, and professional. The algorithms behind these tools like super-resolution, denoising, colorization, artifact removal, and GAN retouching are the building blocks of modern visual AI.</p>
<p>Whether you’re a developer looking to integrate image processing into your app or a creator who wants to improve your visuals, knowing how these algorithms work will help you get the most out of AI. This is only the beginning and future models will be even more precise, faster, and capable of things we haven’t yet imagined. Developers who understand these foundations will be ready to make the most of the next wave of AI-powered creativity.</p>
<p><em>Hope you enjoyed this article. Signup for my free AI newsletter</em> <a target="_blank" href="https://www.turingtalks.ai/"><strong><em>TuringTalks.ai</em></strong></a> <em>for more hands-on tutorials on AI. You can also</em> <a target="_blank" href="https://manishshivanandhan.com/"><em>visit my website</em></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Blend Images in Rust Using Pixel Math ]]>
                </title>
                <description>
                    <![CDATA[ For anyone looking to learn about image processing as a programming niche, blending images is a very good place to start. It's one of the simplest yet most rewarding techniques when it comes to image processing. To help your intuition, it's best to i... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-blend-images-in-rust-using-pixel-math/</link>
                <guid isPermaLink="false">66cda9b4e220ecb31e3a2239</guid>
                
                    <category>
                        <![CDATA[ Rust ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Mathematics ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Anshul Sanghi ]]>
                </dc:creator>
                <pubDate>Tue, 27 Aug 2024 10:25:56 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1724689572465/f03e4b74-1091-4673-af5b-c8827e74caf0.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>For anyone looking to learn about image processing as a programming niche, blending images is a very good place to start. It's one of the simplest yet most rewarding techniques when it comes to image processing.</p>
<p>To help your intuition, it's best to imagine an image as a mathematical graph of pixel values plotted along the x and y coordinates. The top right pixel in an image is your origin, which corresponds to an x value of 0 and a y value of 0.</p>
<p>Once you imagine this, any pixel in an image can be read or modified using it's coordinate in this x-y graph. For example, for a square image of size 5px x 5px, the coordinate of the center pixel is 2, 2. You may have expected it to be 3, 3, but image coordinates in this context work similar to array indexes and start from 0 for both axis.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724421916445/8d27ec1d-43f5-4cc3-b706-b9bd2efb05a4.png" alt="mathematical graph with x and y axis" class="image--center mx-auto" width="2786" height="1435" loading="lazy"></p>
<p>Approaching image processing this way also helps you address each pixel individually, making the process much simpler.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>The focus of this article is for you to understand and learn how to blend images using the Rust programming language, without going into the details of the language or it's syntax. So being comfortable writing Rust programs is required.</p>
<p>If you're not familiar with Rust, I highly encourage you to learn the basics. <a target="_blank" href="https://www.freecodecamp.org/news/rust-in-replit/">Here's an interactive Rust course that can get you started.</a></p>
<h2 id="heading-table-of-contents">Table Of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-introduction">Introduction</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-image-blending-works">How Image Blending Works</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-setup">Project Setup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-read-pixel-values">How to Read Pixel Values</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-blend-functions">How to Blend Functions</a></p>
<ol>
<li><p><a class="post-section-overview" href="#heading-average-blend">Average Blend</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-multiply-blend">Multiply Blend</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-lighten-blend">Lighten Blend</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-darken-blend">Darken Blend</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-screen-blend">Screen Blend</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-addition-blend">Addition Blend</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-subtraction-blend">Subtraction Blend</a></p>
</li>
</ol>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-apply-blend-functions-to-images">How to Apply Blend Functions To Images</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-putting-it-all-together">Putting It All Together</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-glossary">Glossary</a></p>
</li>
</ol>
<h2 id="heading-introduction">Introduction</h2>
<p>Image blending refers to the technique of merging pixels from multiple images to create a single output image that is derived from all of its inputs. Depending on which blending operation is used, the image output can vary widely given the same inputs.</p>
<p>This technique serves as the basis for many complex image processing tools, some of which you may already be familiar with. Things such as removing moving people from images if you have multiple images, merging images of the night sky to create star trails, and merging multiple noise-heavy images to create a noise reduced image are all examples of this technique at play.</p>
<p>To achieve the blending of images in this tutorial, we will make use of "pixel math", which while not being a truly standard term, refers to the technique of performing mathematical operations on a pixel or set of pixels to generate an output pixel.</p>
<p>For example, to blend two images using the "average" blend mode, you will perform the mathematical average operation on all input pixels at a given location, to generate the output at the same location.</p>
<p>Pixel math is not limited to point operations, which are basically operations performed during image processing that generate a given output pixel based on input pixel from single or multiple images from the same location in the x-y coordinate system.</p>
<p>In my experience so far, the entirety of image processing field is 99% mathematics and 1% black magic. Mathematical operations on pixels and it's surrounding pixels is the basis of image manipulation techniques such as compression, resizing, blurring and sharpening, noise reduction, and so on.</p>
<h2 id="heading-how-image-blending-works"><strong>How Image Blending Works</strong></h2>
<p>The technique is technically simple to implement. Let's take the example of a simple average blend. Here's how it works:</p>
<ol>
<li><p>Read the pixel data of both images into memory, usually into an array for each image.</p>
<ul>
<li>The array is usually 2 dimensional. Each entry in array is another array for color images, the secondary array holds the 3 pixel values corresponding to Red, Green, and Blue color channels.</li>
</ul>
</li>
<li><p>For each pixel location:</p>
<ol>
<li><p>For each channel:<br> a. Take the value of the channel from the 2nd image, let's consider it <code>y</code>.<br> b. Perform the averaging operation <code>x/2 + y/2</code>.<br> c. Save the output value of this operation as the value of the output channel</p>
</li>
<li><p>Save the result of previous operation as the value of the output pixel.</p>
</li>
</ol>
</li>
<li><p>Construct the output image with the same dimensions from the computed data.</p>
</li>
</ol>
<p>You'll notice that pixel math is performed on a per-channel basis. This is always true for the blend modes we cover in this tutorial, but many techniques involve applying blends between the channels themselves and many times within the same image.</p>
<h2 id="heading-project-setup"><strong>Project Setup</strong></h2>
<p>Let's get started by setting up a project that gives us a good baseline to work with.</p>
<pre><code class="lang-bash">cargo new --bin image-blender
<span class="hljs-built_in">cd</span> image-blender
</code></pre>
<p>You will also need a single dependency to help you perform these operations:</p>
<pre><code class="lang-bash">cargo add image
</code></pre>
<p><code>image</code> is a Rust library we'll use to work with images of all of the standard formats and encodings. It also helps us convert between various formats, and provides easy access to pixel data as buffers.</p>
<p>For more information on the <code>image</code> crate, you can refer to the <a target="_blank" href="https://docs.rs/image/">official documentation</a>.</p>
<p>To follow along, you can use any two images, the only requirement being that they should be of the same size and in the same format. You can also find the images used in this tutorial, along with complete code, <a target="_blank" href="https://github.com/anshulsanghi-blog/blend-images">in the GitHub repository here</a>.</p>
<h2 id="heading-how-to-read-pixel-values"><strong>How to Read Pixel Values</strong></h2>
<p>The first step is to load the images and read their pixel values into a data structure that facilitates our operation. For this tutorial, we're going to use a <code>Vec</code> of arrays (<code>Vec&lt;[u8; 3]&gt;</code>). Each entry in the outer <code>Vec</code> represents a pixel, and the channel-wise values of each pixel are stored in <code>[u8; 3]</code> array.</p>
<p>Let's start by creating a new file to hold this code called <strong>io.rs</strong>.</p>
<pre><code class="lang-rust"><span class="hljs-comment">// src/io.rs</span>

<span class="hljs-keyword">use</span> image::GenericImageView;

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">SourceData</span></span> {
    <span class="hljs-keyword">pub</span> width: <span class="hljs-built_in">usize</span>,
    <span class="hljs-keyword">pub</span> height: <span class="hljs-built_in">usize</span>,
    <span class="hljs-keyword">pub</span> image1: <span class="hljs-built_in">Vec</span>&lt;[<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>]&gt;,
    <span class="hljs-keyword">pub</span> image2: <span class="hljs-built_in">Vec</span>&lt;[<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>]&gt;,
}

<span class="hljs-keyword">pub</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">read_pixel_data</span></span>(image1_path: <span class="hljs-built_in">String</span>, image2_path: <span class="hljs-built_in">String</span>) -&gt; SourceData {
    <span class="hljs-comment">// Open the images</span>
    <span class="hljs-keyword">let</span> image1 = image::open(image1_path).unwrap();
    <span class="hljs-keyword">let</span> image2 = image::open(image2_path).unwrap();

    <span class="hljs-comment">// Compute image dimensions</span>
    <span class="hljs-keyword">let</span> (width, height) = image1.dimensions();
    <span class="hljs-keyword">let</span> (width, height) = (width <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>, height <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>);

    <span class="hljs-comment">// Create arrays to hold input pixel data</span>
    <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> image1_data: <span class="hljs-built_in">Vec</span>&lt;[<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>]&gt; = <span class="hljs-built_in">vec!</span>[[<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>]; width * height];
    <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> image2_data: <span class="hljs-built_in">Vec</span>&lt;[<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>]&gt; = <span class="hljs-built_in">vec!</span>[[<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>]; width * height];

    <span class="hljs-comment">// Iterate over all pixels in the input image, along with their positions in x &amp; y</span>
    <span class="hljs-comment">// coordinates.</span>
    <span class="hljs-keyword">for</span> (x, y, pixel) <span class="hljs-keyword">in</span> image1.to_rgb8().enumerate_pixels() {
        <span class="hljs-comment">// Compute the raw values for each channel in the RGB pixel.</span>
        <span class="hljs-keyword">let</span> [r, g, b] = pixel.<span class="hljs-number">0</span>;

        <span class="hljs-comment">// Compute linear index based on 2D index. This is basically computing index in</span>
        <span class="hljs-comment">// 1D array based on the row and column index of the pixel in the 2D image.</span>
        <span class="hljs-keyword">let</span> index = (y * (width <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span>) + x) <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>;

        <span class="hljs-comment">// Save the channel-wise values in the correct index in data arrays.</span>
        image1_data[index] = [r, g, b];
    }

    <span class="hljs-comment">// Iterate over all pixels in the input image, along with their positions in x &amp; y</span>
    <span class="hljs-comment">// coordinates.</span>
    <span class="hljs-keyword">for</span> (x, y, pixel) <span class="hljs-keyword">in</span> image2.to_rgb8().enumerate_pixels() {
        <span class="hljs-comment">// Compute the raw values for each channel in the RGB pixel.</span>
        <span class="hljs-keyword">let</span> [r, g, b] = pixel.<span class="hljs-number">0</span>;

        <span class="hljs-comment">// Compute linear index based on 2D index. This is basically computing index in</span>
        <span class="hljs-comment">// 1D array based on the row and column index of the pixel in the 2D image.</span>
        <span class="hljs-keyword">let</span> index = (y * (width <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span>) + x) <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>;

        <span class="hljs-comment">// Save the channel-wise values in the correct index in data arrays.</span>
        image2_data[index] = [r, g, b];
    }

    SourceData {
        width,
        height,
        image1: image1_data,
        image2: image2_data,
    }
}
</code></pre>
<h2 id="heading-how-to-blend-functions">How to Blend Functions</h2>
<p>The next step is to implement the blending functions, which are pure functions that take two pixel values as input and return the output value. This is implemented through the <code>BlendOperation</code> trait defined below. Let's create a new file to host all the operations called <strong>operations.rs</strong>.</p>
<pre><code class="lang-rust"><span class="hljs-comment">// src/operations.rs</span>

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">trait</span> <span class="hljs-title">BlendOperation</span></span> {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">perform_operation</span></span>(&amp;<span class="hljs-keyword">self</span>, pixel1: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>], pixel2: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>]) -&gt; [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>];
}
</code></pre>
<p>Next, we need to implement this trait for all of the blending methods we want to support.</p>
<p>For showcasing the result of each of the blending modes, the following two input images are blended together</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724236939605/77d32c76-abf6-4d24-bba7-df40729863b8.jpeg" alt="Source image 1: Fireflies in a dark forest area" class="image--center mx-auto" width="3000" height="1996" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724428339241/3cc70fd2-f6da-4704-8606-97c094a2ff35.jpeg" alt="Source image 2: Fireflies in a bright forest area" class="image--center mx-auto" width="3000" height="1996" loading="lazy"></p>
<h3 id="heading-average-blend">Average Blend</h3>
<p>An average blend involves channel-wise averaging the input pixel values to get the output pixel.</p>
<pre><code class="lang-rust"><span class="hljs-comment">// src/operations.rs</span>

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">AverageBlend</span></span>;

<span class="hljs-keyword">impl</span> BlendOperation <span class="hljs-keyword">for</span> AverageBlend {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">perform_operation</span></span>(&amp;<span class="hljs-keyword">self</span>, pixel1: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>], pixel2: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>]) -&gt; [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>] {
        [
            pixel1[<span class="hljs-number">0</span>] / <span class="hljs-number">2</span> + pixel2[<span class="hljs-number">0</span>] / <span class="hljs-number">2</span>,
            pixel1[<span class="hljs-number">1</span>] / <span class="hljs-number">2</span> + pixel2[<span class="hljs-number">1</span>] / <span class="hljs-number">2</span>,
            pixel1[<span class="hljs-number">2</span>] / <span class="hljs-number">2</span> + pixel2[<span class="hljs-number">2</span>] / <span class="hljs-number">2</span>,
        ]
    }
}
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724236691772/291f14f4-2019-4771-8cd2-b9f9b3cf3f86.jpeg" alt="Result of average blending source images" class="image--center mx-auto" width="3000" height="1996" loading="lazy"></p>
<h3 id="heading-multiply-blend">Multiply Blend</h3>
<p>A multiply blend involves channel-wise multiplication of input pixel values after they've been normalized<a class="post-section-overview" href="#heading-glossary">[¹]</a> to get the output pixel. The output pixel is then rescaled back to the original range by multiplying with 255.</p>
<pre><code class="lang-rust"><span class="hljs-comment">// src/operations.rs</span>

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">MultiplyBlend</span></span>;

<span class="hljs-keyword">impl</span> BlendOperation <span class="hljs-keyword">for</span> MultiplyBlend {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">perform_operation</span></span>(&amp;<span class="hljs-keyword">self</span>, pixel1: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>], pixel2: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>]) -&gt; [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>] {
        [
            ((pixel1[<span class="hljs-number">0</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-number">255</span>. * pixel2[<span class="hljs-number">0</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-number">255</span>.) * <span class="hljs-number">255</span>.) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>,
            ((pixel1[<span class="hljs-number">1</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-number">255</span>. * pixel2[<span class="hljs-number">1</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-number">255</span>.) * <span class="hljs-number">255</span>.) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>,
            ((pixel1[<span class="hljs-number">2</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-number">255</span>. * pixel2[<span class="hljs-number">2</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-number">255</span>.) * <span class="hljs-number">255</span>.) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>,
        ]
    }
}
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724236703622/9aff3ffd-9a63-4b76-9675-d7db4ccee89b.jpeg" alt="Result of multiply blending source images" class="image--center mx-auto" width="3000" height="1996" loading="lazy"></p>
<h3 id="heading-lighten-blend">Lighten Blend</h3>
<p>Lighten blend involves channel-wise comparison of input pixel values, selecting the pixel with higher value (intensity) as the output pixel.</p>
<pre><code class="lang-rust"><span class="hljs-comment">// src/operations.rs</span>

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">LightenBlend</span></span>;

<span class="hljs-keyword">impl</span> BlendOperation <span class="hljs-keyword">for</span> LightenBlend {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">perform_operation</span></span>(&amp;<span class="hljs-keyword">self</span>, pixel1: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>], pixel2: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>]) -&gt; [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>] {
        [
            pixel1[<span class="hljs-number">0</span>].max(pixel2[<span class="hljs-number">0</span>]),
            pixel1[<span class="hljs-number">1</span>].max(pixel2[<span class="hljs-number">1</span>]),
            pixel1[<span class="hljs-number">2</span>].max(pixel2[<span class="hljs-number">2</span>]),
        ]
    }
}
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724236726111/5d1607fb-2740-46b8-906d-1ffb482a0561.jpeg" alt="Result of lighten blending source images" class="image--center mx-auto" width="3000" height="1996" loading="lazy"></p>
<h3 id="heading-darken-blend">Darken Blend</h3>
<p>Darken blend is the opposite operation of lighten blend. It involves channel-wise comparison of input pixel values, selecting the pixel with least value (intensity) as the output pixel.</p>
<pre><code class="lang-rust"><span class="hljs-comment">// src/operations.rs</span>

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">DarkenBlend</span></span>;

<span class="hljs-keyword">impl</span> BlendOperation <span class="hljs-keyword">for</span> DarkenBlend {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">perform_operation</span></span>(&amp;<span class="hljs-keyword">self</span>, pixel1: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>], pixel2: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>]) -&gt; [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>] {
        [
            pixel1[<span class="hljs-number">0</span>].min(pixel2[<span class="hljs-number">0</span>]),
            pixel1[<span class="hljs-number">1</span>].min(pixel2[<span class="hljs-number">1</span>]),
            pixel1[<span class="hljs-number">2</span>].min(pixel2[<span class="hljs-number">2</span>]),
        ]
    }
}
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724236746972/18307fa1-1a77-4d39-b233-a7a6d87233d0.jpeg" alt="Result of darken blending source images" class="image--center mx-auto" width="3000" height="1996" loading="lazy"></p>
<h3 id="heading-screen-blend">Screen Blend</h3>
<p>Screen blend refers to multiplying the inverse of two images, and then inverting the result. In our implementation, the pixels first need to be normalized<a class="post-section-overview" href="#heading-glossary">[¹]</a>. The normalized<a class="post-section-overview" href="#heading-glossary">[¹]</a> values are then inverted by subtracting them from 1, then they're multiplied and inverted again.</p>
<p>Finally, the output is multiplied by 255 to de-normalize the output pixel value.</p>
<pre><code class="lang-rust"><span class="hljs-comment">// src/operations.rs</span>

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">ScreenBlend</span></span>;

<span class="hljs-keyword">impl</span> BlendOperation <span class="hljs-keyword">for</span> ScreenBlend {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">perform_operation</span></span>(&amp;<span class="hljs-keyword">self</span>, pixel1: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>], pixel2: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>]) -&gt; [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>] {
        [
            ((<span class="hljs-number">1</span>. - ((<span class="hljs-number">1</span>. - (pixel1[<span class="hljs-number">0</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-number">255</span>.)) * (<span class="hljs-number">1</span>. - (pixel2[<span class="hljs-number">0</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-number">255</span>.)))) * <span class="hljs-built_in">u8</span>::MAX <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span>) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>,
            ((<span class="hljs-number">1</span>. - ((<span class="hljs-number">1</span>. - (pixel1[<span class="hljs-number">1</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-number">255</span>.)) * (<span class="hljs-number">1</span>. - (pixel2[<span class="hljs-number">1</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-number">255</span>.)))) * <span class="hljs-built_in">u8</span>::MAX <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span>) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>,
            ((<span class="hljs-number">1</span>. - ((<span class="hljs-number">1</span>. - (pixel1[<span class="hljs-number">2</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-number">255</span>.)) * (<span class="hljs-number">1</span>. - (pixel2[<span class="hljs-number">2</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-number">255</span>.)))) * <span class="hljs-built_in">u8</span>::MAX <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span>) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>,
        ]
    }
}
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724236758380/fd531b6e-729c-4db4-987e-f503478ff950.jpeg" alt="Result of screen blending source images" class="image--center mx-auto" width="3000" height="1996" loading="lazy"></p>
<h3 id="heading-addition-blend">Addition Blend</h3>
<p>Addition blend involves adding the input values and then clamping the result to the maximum range of the color depth we're targeting. In this case, that would be 0-255 as we're targeting 8-bit color depth.</p>
<p>We also have to convert the values to u16 in order to avoid loss of value due to overflow. We can also use normalized<a class="post-section-overview" href="#heading-glossary">[¹]</a> values here to achieve the same result.</p>
<pre><code class="lang-rust"><span class="hljs-comment">// src/operations.rs</span>

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">AdditionBlend</span></span>;

<span class="hljs-keyword">impl</span> BlendOperation <span class="hljs-keyword">for</span> AdditionBlend {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">perform_operation</span></span>(&amp;<span class="hljs-keyword">self</span>, pixel1: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>], pixel2: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>]) -&gt; [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>] {
        [
            (pixel1[<span class="hljs-number">0</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">u16</span> + pixel2[<span class="hljs-number">0</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">u16</span>).clamp(<span class="hljs-number">0</span>, <span class="hljs-built_in">u8</span>::MAX <span class="hljs-keyword">as</span> <span class="hljs-built_in">u16</span>) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>,
            (pixel1[<span class="hljs-number">1</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">u16</span> + pixel2[<span class="hljs-number">1</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">u16</span>).clamp(<span class="hljs-number">0</span>, <span class="hljs-built_in">u8</span>::MAX <span class="hljs-keyword">as</span> <span class="hljs-built_in">u16</span>) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>,
            (pixel1[<span class="hljs-number">2</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">u16</span> + pixel2[<span class="hljs-number">2</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">u16</span>).clamp(<span class="hljs-number">0</span>, <span class="hljs-built_in">u8</span>::MAX <span class="hljs-keyword">as</span> <span class="hljs-built_in">u16</span>) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>,
        ]
    }
}
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724236766684/05f01177-024d-4196-a9fa-5274bb56a0f4.jpeg" alt="Result of addition blending source images" class="image--center mx-auto" width="3000" height="1996" loading="lazy"></p>
<h3 id="heading-subtraction-blend">Subtraction Blend</h3>
<p>Addition blend involves subtracting the input values and then clamping the result to the maximum range of the color depth we're targeting. In this case, that would be 0-255 as we're targeting 8-bit color depth.</p>
<p>We also convert the values to i16 in order to avoid loss of value due to overflow and lack of sign. We can also use normalized<a class="post-section-overview" href="#heading-glossary">[¹]</a> values here to achieve the same result.</p>
<pre><code class="lang-rust"><span class="hljs-comment">// src/operations.rs</span>

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">SubtractionBlend</span></span>;

<span class="hljs-keyword">impl</span> BlendOperation <span class="hljs-keyword">for</span> SubtractionBlend {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">perform_operation</span></span>(&amp;<span class="hljs-keyword">self</span>, pixel1: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>], pixel2: [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>]) -&gt; [<span class="hljs-built_in">u8</span>; <span class="hljs-number">3</span>] {
        [
            (pixel1[<span class="hljs-number">0</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">i16</span> - pixel2[<span class="hljs-number">0</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">i16</span>).clamp(<span class="hljs-number">0</span>, <span class="hljs-built_in">u8</span>::MAX <span class="hljs-keyword">as</span> <span class="hljs-built_in">i16</span>) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>,
            (pixel1[<span class="hljs-number">1</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">i16</span> - pixel2[<span class="hljs-number">1</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">i16</span>).clamp(<span class="hljs-number">0</span>, <span class="hljs-built_in">u8</span>::MAX <span class="hljs-keyword">as</span> <span class="hljs-built_in">i16</span>) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>,
            (pixel1[<span class="hljs-number">2</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">i16</span> - pixel2[<span class="hljs-number">2</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">i16</span>).clamp(<span class="hljs-number">0</span>, <span class="hljs-built_in">u8</span>::MAX <span class="hljs-keyword">as</span> <span class="hljs-built_in">i16</span>) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>,
        ]
    }
}
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724236775603/507ba176-579d-494f-bb56-25a27ed2317f.jpeg" alt="Result of subtraction blending source images" class="image--center mx-auto" width="3000" height="1996" loading="lazy"></p>
<h2 id="heading-how-to-apply-blend-functions-to-images">How to Apply Blend Functions To Images</h2>
<p>The final step is to actually use the blending operations we created previously and apply them to pairs of images.</p>
<p>To achieve this, we need a function that can take the <code>SourceData</code> type we defined previously as input, along with a blending operation as the arguments, and gives us the final output buffer. Let's start by creating a new file for it called <strong>blend.rs</strong>.</p>
<pre><code class="lang-rust"><span class="hljs-comment">// src/blend.rs</span>

<span class="hljs-keyword">use</span> image::{ImageBuffer, Rgb};
<span class="hljs-keyword">use</span> crate::{operations::BlendOperation, SourceData};

<span class="hljs-keyword">impl</span> SourceData {
    <span class="hljs-keyword">pub</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">blend_images</span></span>(&amp;<span class="hljs-keyword">self</span>, operation: <span class="hljs-keyword">impl</span> BlendOperation)  -&gt; ImageBuffer&lt;Rgb&lt;<span class="hljs-built_in">u8</span>&gt;, <span class="hljs-built_in">Vec</span>&lt;<span class="hljs-built_in">u8</span>&gt;&gt; {
        <span class="hljs-keyword">let</span> SourceData {
            width,
            height,
            image1,
            image2,
        } = <span class="hljs-keyword">self</span>;

        <span class="hljs-comment">// Create a new buffer that has the same size as input images, which will serve as our output data</span>
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> buffer = ImageBuffer::new(*width <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span>, *height <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span>);

        <span class="hljs-comment">// Iterate over all pixels in the output buffer, along with their coordinates</span>
        <span class="hljs-keyword">for</span> (x, y, output_pixel) <span class="hljs-keyword">in</span> buffer.enumerate_pixels_mut() {
            <span class="hljs-comment">// Compute linear index form x &amp; y coordinates. In other words, you have the</span>
            <span class="hljs-comment">// row and column indexes here, and you want to compute the array index based</span>
            <span class="hljs-comment">// on these two positions.</span>
            <span class="hljs-keyword">let</span> index = (y * *width <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span> + x) <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>;

            <span class="hljs-comment">// Store pixel values in the given position into variables</span>
            <span class="hljs-keyword">let</span> pixel1 = image1[index];
            <span class="hljs-keyword">let</span> pixel2 = image2[index];

            <span class="hljs-comment">// Compute the blended pixel and convert it into the `Rgb` type, which is then</span>
            <span class="hljs-comment">// assigned to the output pixel in the buffer.</span>
            *output_pixel = Rgb::from(operation.perform_operation(pixel1, pixel2));
        }

        buffer
    }
}
</code></pre>
<h3 id="heading-putting-it-all-together">Putting It All Together</h3>
<p>It's now time to make use of all the new things you've learnt so far, and put them together in <strong>main.rs</strong> file.</p>
<pre><code class="lang-rust"><span class="hljs-comment">// src/main.rs</span>

<span class="hljs-keyword">mod</span> blend;
<span class="hljs-keyword">mod</span> io;
<span class="hljs-keyword">mod</span> operations;

<span class="hljs-keyword">use</span> io::*;
<span class="hljs-keyword">use</span> operations::{
    AdditionBlend, AverageBlend, DarkenBlend, LightenBlend, MultiplyBlend, ScreenBlend,
    SubtractionBlend,
};

<span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">main</span></span>() {
    <span class="hljs-keyword">let</span> source_data = read_pixel_data(<span class="hljs-string">"image1.jpg"</span>.to_string(), <span class="hljs-string">"image2.jpg"</span>.to_string());

    <span class="hljs-keyword">let</span> output_buffer = source_data.blend_images(AdditionBlend);
    output_buffer.save(<span class="hljs-string">"addition.jpg"</span>).unwrap();

    <span class="hljs-keyword">let</span> output_buffer = source_data.blend_images(AverageBlend);
    output_buffer.save(<span class="hljs-string">"average.jpg"</span>).unwrap();

    <span class="hljs-keyword">let</span> output_buffer = source_data.blend_images(DarkenBlend);
    output_buffer.save(<span class="hljs-string">"darken.jpg"</span>).unwrap();

    <span class="hljs-keyword">let</span> output_buffer = source_data.blend_images(LightenBlend);
    output_buffer.save(<span class="hljs-string">"lighten.jpg"</span>).unwrap();

    <span class="hljs-keyword">let</span> output_buffer = source_data.blend_images(MultiplyBlend);
    output_buffer.save(<span class="hljs-string">"multiply.jpg"</span>).unwrap();

    <span class="hljs-keyword">let</span> output_buffer = source_data.blend_images(ScreenBlend);
    output_buffer.save(<span class="hljs-string">"screen.jpg"</span>).unwrap();

    <span class="hljs-keyword">let</span> output_buffer = source_data.blend_images(SubtractionBlend);
    output_buffer.save(<span class="hljs-string">"subtraction.jpg"</span>).unwrap();
}
</code></pre>
<p>You can now run the program using the following command, and you should have all the images generated and saved in the project folder:</p>
<pre><code class="lang-bash">cargo run --release
</code></pre>
<p>As you might have guessed already, this implementation only works for 8-bit RGB images. This code, however, can be extended very easily to support the other color formats such as 8-bit Luma (Monochrome), 16-bit RGB (Many RAW camera images), and so on.</p>
<p>I highly encourage you to try that out. You can also reach out to me for help with anything in this tutorial or with extending the code in this tutorial. I'd be happy to answer all your queries. Email is the best way to reach me, you can email me at <a target="_blank" href="mailto:anshul@anshulsanghi.tech">anshul@anshulsanghi.tech</a>.</p>
<h3 id="heading-glossary">Glossary</h3>
<p>Normalization refers to the process of rescaling the pixel values so that the values are in floating point format and are in the range of 0-1. For example, for an 8 bit image, the color black is represented by 0 (0 in de-normalized value) and the color white is represented by 1 (255 in de-normalized value). Intermediary decimal values between 0 &amp; 1 represent different intensities of the pixel between black and white. Normalization is done for many different reasons such as:</p>
<ul>
<li><p>Preventing overflows during calculations.</p>
</li>
<li><p>Re-scaling images to the same range irrespective of their individual color depth.</p>
</li>
<li><p>Expanding possible dynamic range of the image.</p>
</li>
</ul>
<h3 id="heading-enjoying-my-work"><strong>Enjoying my work?</strong></h3>
<p>Consider buying me a coffee to support my work!</p>
<p><a target="_blank" href="https://www.buymeacoffee.com/anshulsanghi"><img src="https://img.buymeacoffee.com/button-api/?text=Buy%20me%20a%20coffee&amp;emoji=%E2%98%95&amp;slug=anshulsanghi&amp;button_colour=FFDD00&amp;font_colour=000000&amp;font_family=Cookie&amp;outline_colour=000000&amp;coffee_colour=ffffff" alt="?text=Buy%20me%20a%20coffee&amp;emoji=%E2%98%95&amp;slug=anshulsanghi&amp;button_colour=FFDD00&amp;font_colour=000000&amp;font_family=Cookie&amp;outline_colour=000000&amp;coffee_colour=ffffff" width="235" height="50" loading="lazy"></a></p>
<p>Till next time, happy coding and wishing you clear skies!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Rust Tutorial – How to Build a Naïve Star Detector for Images ]]>
                </title>
                <description>
                    <![CDATA[ Star detection is a crucial step in many of the processing and analysis routines that we perform on astronomical images. It is extremely important for a process called plate-solving, which is the process of figuring out which part of the sky an image... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/rust-tutorial-naive-star-detector-for-images/</link>
                <guid isPermaLink="false">66bb57a029aa951a4c0628bc</guid>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Rust ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Anshul Sanghi ]]>
                </dc:creator>
                <pubDate>Tue, 16 Apr 2024 19:34:07 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/04/cover.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Star detection is a crucial step in many of the processing and analysis routines that we perform on astronomical images. It is extremely important for a process called plate-solving, which is the process of figuring out which part of the sky an image shows, or which part of the sky your telescope is pointed at. </p>
<p>All modern telescope mounts can make use of plate solving software to automatically figure out where they're pointed at, and in which direction they need to move to point at the correct location.</p>
<p>Star detection, sometimes, is also used in correcting the effect of atmosphere on the sharpness of targets such as galaxies. It is also crucial for combining astronomical images from multiple nights, telescopes, locations and so on into a single output image that has a very high signal-to-noise ratio.</p>
<p>With this tutorial, I'd like to introduce a very naïve technique for detecting stars in an image.</p>
<h3 id="heading-a-quick-note">A quick note:</h3>
<p>Star detection is a very complex topic, and I've only scratched the surface both in my own understanding and in this article. </p>
<p>The steps I use and describe in this article are derived from public documentation on existing real world applications (both for star detection and for edge detection), as well as some blog posts from incredibly knowledgeable people (which I link to at the end of the article, be sure to check them out).</p>
<p>As such, this implementation is intended for learning purposes only.</p>
<h2 id="heading-before-you-read"><strong>Before You Read</strong></h2>
<h3 id="heading-prerequisites-for-the-first-part-of-the-tutorial"><strong>Prerequisites for the first part of the tutorial</strong></h3>
<p>The process described builds upon the concept of <a target="_blank" href="https://www.freecodecamp.org/news/multi-scale-analysis-of-images-in-rust/">multi-scale processing of images using a trous wavelet transform</a>. If you're not aware of what that is, I encourage you to learn more about it using my previous article that I just linked to, and then come back to this one.</p>
<p>This article also assumes that you have a basic understanding of <a target="_blank" href="https://en.wikipedia.org/wiki/Centroid">Centroids</a>. Just knowing what they mean is enough, as you don't have to calculate them yourself. Since the article focuses on image processing and analysis, a basic understanding of how pixels work in digital format is helpful, but not mandatory.</p>
<h3 id="heading-prerequisites-for-the-second-part-of-this-tutorial"><strong>Prerequisites for the second part of this tutorial</strong></h3>
<p>Here, we focus on implementing the algorithm using the Rust programming language, without going much into the details of the language itself. So being comfortable writing Rust programs, and comfortable reading crate documentations is required.</p>
<p>If this is not you, you can still read Part 1 and learn the technique, and then maybe you'll want to then try it out in a language of your choice. </p>
<p>If you're not familiar with Rust, I highly encourage you to learn the basics. <a target="_blank" href="https://www.freecodecamp.org/news/rust-in-replit/">Here's an interactive Rust course</a> that can get you started.</p>
<h2 id="heading-table-of-contents">Table of contents</h2>
<ol>
<li><a class="post-section-overview" href="#heading-how-star-detection-works-1">How Star Detection Works</a><ol>
<li><a class="post-section-overview" href="#heading-what-is-star-detection">What is Star Detection?</a></li>
<li><a class="post-section-overview" href="#heading-how-star-detection-works-1">How Star Detection Works</a></li>
<li><a class="post-section-overview" href="#heading-an-intermediary-look-at-the-process">An Intermediary Look at the Process</a></li>
<li><a class="post-section-overview" href="#heading-picking-it-apart">Picking It Apart</a></li>
</ol>
</li>
<li><a class="post-section-overview" href="#heading-how-to-implement-it-in-rust">How to Implement it in Rust</a><ol>
<li><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></li>
<li><a class="post-section-overview" href="#heading-how-to-read-and-decompose-the-input-image">How to read and decompose the input image</a></li>
<li><a class="post-section-overview" href="#heading-noise-reduction">Noise reduction</a></li>
<li><a class="post-section-overview" href="#heading-how-to-optimize-the-threshold-and-binarization">How to optimize the threshold and binarization</a></li>
<li><a class="post-section-overview" href="#heading-how-to-construct-polygons-around-stars">How to construct polygons around stars</a></li>
<li><a class="post-section-overview" href="#heading-how-to-detect-star-size-and-location-using-contours">How to detect star size and location using contours</a></li>
<li><a class="post-section-overview" href="#heading-how-to-encapsulate-the-process">How to encapsulate the process</a></li>
<li><a class="post-section-overview" href="#heading-how-to-test-the-implementation-on-astronomical-images">How to test the implementation on astronomical images</a></li>
<li><a class="post-section-overview" href="#heading-how-to-optimize-minimum-star-count">How to optimize minimum star count</a></li>
<li><a class="post-section-overview" href="#heading-but-there-is-one-more-thing">But there is one more thing...</a></li>
</ol>
</li>
<li><a class="post-section-overview" href="#heading-further-reading">Further Reading</a></li>
<li><a class="post-section-overview" href="#heading-wrapping-up">Wrapping Up</a></li>
</ol>
<h2 id="heading-how-star-detection-works">How Star Detection Works</h2>
<p>Since this process involves a lot of steps, let's see how it works, with an increasing level of detail about what actually happens as we go along. With each increasing level, we'll be unwrapping the black box bit by bit.</p>
<h3 id="heading-what-is-star-detection">What is Star Detection?</h3>
<p>Star detection, at it's simplest form, involves isolating the stars from the rest of the image, and then performing edge detection on it.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/m42-star-detection-1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>1. Input image</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/level-1-1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>2. Detected stars visualised using green circles</em></p>
<h3 id="heading-how-star-detection-works-1">How Star Detection Works</h3>
<p>First, you try to extract away the pixels that you think might be stars from the rest of the pixels in the image. This new image, that only contains the extracted pixels, is then analysed using edge detection techniques to find the star positions in 2D space.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/m42-star-detection-2.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>1. Input image</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/level-1-2-1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>2. Extracted pixels that are potentially stars</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/level-1-1-1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>3. Detected stars visualised using green circles</em></p>
<h3 id="heading-an-intermediary-look-at-the-process">An Intermediary Look At The Process</h3>
<p>Then, you decompose your input image into multiple layers, each layer containing a part of the original data such that adding all layers gives us back the original data. </p>
<p>You then isolate the layers that would only contain small sized structures, such as noise and stars, and throw away the rest of the data.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/layers-of-structure.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Different layers of structure in the image that the input is decomposed into. We throw away the final layer and retain the rest in this example</em></p>
<p>With this filtered data, you find the edges in the image using the contouring technique (which is explained in the next section). Each contour gives us multiple "points" in the 2D space. You then try to draw a closed shape using the points you have. </p>
<p>Once you've done this, all you need is to find the center of this shape and you have the location of the stars.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/m42-star-detection-5.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>1. Input image</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/level-3-1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>2. Image after decomposing into layers and throwing away large scale data</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/level-1-2-2.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>3. Image after binarisation</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/polygon.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>4. Detected contours visualised using green outlines</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/level-1-1-2.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>5. Detected stars visualised using green circles</em></p>
<h3 id="heading-picking-it-apart">Picking It Apart</h3>
<p>Using a multi-scale analysis technique facilitated by the à trous transform algorithm, you break down the image into multiple layers, each containing different scaled structures from the original image. You take the layers containing smaller scale structures and throw away the rest. </p>
<p>To these layers, you apply a bilateral denoising filter to reduce noise so that you can ensure that you're only left with stars and not noise that the algorithm might pick up as stars later on.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/decomposed-image.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Different layers of structure in the image that the input is decomposed into. We throw away the final layer and retain the rest in this example.</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/m42-star-detection-5-1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>1. Input image</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/level-3-1-1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>2. Image after decomposing into layers and throwing away large scale data</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/level-4-1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>3. Noise reduced image</em></p>
<p>Once you've filtered out the noise, you binarize your image using thresholding. Thresholding and binarization is the process of converting all of the pixels to either pure black or pure white, so that they're easier to work with. You can do this by selecting a certain intensity value, and all pixels with intensity less than this become black and all pixels with intensity more than this become white. </p>
<p>To find the optimum intensity value to binarize the image with, you define a minimum number of stars that you expect to find in the image, which is usually determined based on what you actually need to do with your star locations. </p>
<p>In our example, we'll start with a minimum of 500 and slowly push it to the limit of the sample image to see what happens.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/level-1-2-3.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Binarizing noise-reduced and wavelet filtered image</em></p>
<p>This makes the process of edge detection (which is the next step in our process) using contouring much more reliable. </p>
<p>Contouring is a term that describes the process of figuring out where the structures are in your image, and drawing a border along those structures – these are known as contours. </p>
<p>It is similar to edge-detection, but edge-detection helps you differentiate between individual neighbouring pixels, whereas contours are designed to work with a complete boundary of any structures in an image.</p>
<p>The library we'll be using finds the contours in an image using the algorithm proposed by Suzuki and Abe: <a target="_blank" href="https://www.sciencedirect.com/science/article/abs/pii/0734189X85900167">Topological Structural Analysis of Digitized Binary Images by Border Following</a>. Contouring in this manner will give you a collection of points that lie on the border of each contour.</p>
<p>For each contour it finds, you create a polygon by joining all of the border points within that contour. If this shape is an open shape, then you just extrapolate the final border to create a polygon, which needs to be a closed shape. You then use the centroid formulae on this polygon to find the center of mass of your shape, which gives you the center of your star (in most cases). </p>
<p>You also need to find the euclidean distances between the center of mass and each border point, the longest of which becomes the size of the star.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/polygon-1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Contouring the binarized image to find closed polygons around stars visualised here using green outlines</em></p>
<p>Once you have your star size, you reject any stars that are either smaller than 1 pixel or larger than 24 pixels. These are educated guesses that I use, and they seem to give me the best results for sample images (but this is definitely a potential point of improvement). </p>
<p>After all of this, you should have the x and y coordinates of the star, as well as its size in pixels.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/level-1-1-3.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Detected stars visualised using green circles around them</em></p>
<p>We're going to stop there, but there's a lot more that you can do after this step to remove false-positives and fix the centroid/size of stars. </p>
<h2 id="heading-how-to-implement-it-in-rust">How to Implement it in Rust</h2>
<p>Let's create a new library project:</p>
<pre><code class="lang-shell">cargo new --lib stardetect-rs &amp;&amp; cd stardetect-rs
</code></pre>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p>You need a couple of dependencies to get started. Let's add them and I'll explain why you need them:</p>
<pre><code class="lang-shell">cargo add image imageproc image-dwt geo
</code></pre>
<ul>
<li><code>image</code> is a Rust library we'll use to work with images of all of the standard formats and encodings. It also helps us convert between various formats, and provides easy access to pixel data as buffers.</li>
<li><code>imageproc</code> is another library by the people who created the <code>image</code> library. It's an extension for the same as it implements image processing functions and algorithms for the <code>image</code> lib.</li>
<li><code>image-dwt</code> is my own library (shameless plug) that implements the <a target="_blank" href="https://www.freecodecamp.org/news/multi-scale-analysis-of-images-in-rust/">à trous wavelet decomposition algorithm</a> for <code>image</code> crate. This is needed to break down our image into multiple scales that I mentioned previously.</li>
<li><code>geo</code> is a Rust library that allows us to easily work with geometric types (like points in 2d space), shapes (such as polygons), and algorithms implemented for them. We use this library to build our polygon based on contour data, and to also find the centroid of the polygon that I described above. It also helps us compute euclidean distances between points, which we use for determining star size.</li>
</ul>
<h3 id="heading-how-to-read-and-decompose-the-input-image">How to read and decompose the input image</h3>
<p>You start by reading the input image and decomposing it so that you're only left with stars (and noise).</p>
<p>You need to define a new struct that will act as a wrapper for your input image, and add a constructor for it to create an instance of this struct based on input:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// lib.rs</span>
<span class="hljs-keyword">use</span> image::{DynamicImage, GrayImage};

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">StarDetect</span></span> {
    source: GrayImage,
}

<span class="hljs-keyword">impl</span> <span class="hljs-built_in">From</span>&lt;DynamicImage&gt; <span class="hljs-keyword">for</span> StarDetect {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">from</span></span>(source: DynamicImage) -&gt; <span class="hljs-keyword">Self</span> {
        <span class="hljs-keyword">Self</span> {
            source: source.to_luma8(),
        }
    }
}
</code></pre>
<p>You then need to add the ability to extract the first <code>n</code> layers from wavelet decomposition of your image:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// lib.rs</span>

<span class="hljs-keyword">use</span> image_dwt::kernels::LinearInterpolationKernel;
<span class="hljs-keyword">use</span> image_dwt::recompose::{OutputLayer, RecomposableWaveletLayers};
<span class="hljs-keyword">use</span> image_dwt::transform::ATrousTransform;

<span class="hljs-keyword">impl</span> StarDetect {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">extract_small_scale_structures</span></span>(&amp;<span class="hljs-keyword">mut</span> <span class="hljs-keyword">self</span>) {
        <span class="hljs-keyword">let</span> (width, height) = <span class="hljs-keyword">self</span>.source.dimensions();

        <span class="hljs-comment">// Decompose the image into 8 layers</span>
        <span class="hljs-keyword">let</span> filtered_image = ATrousTransform::new(
            &amp;DynamicImage::ImageLuma8(<span class="hljs-keyword">self</span>.source.clone()),
            <span class="hljs-number">8</span>,
            LinearInterpolationKernel,
        )
        <span class="hljs-comment">// Filter out the residue image and keep the rest</span>
        .filter(|item| item.pixel_scale.is_some())
        <span class="hljs-comment">// Recompose the first 3 layers into a grayscale image.</span>
        .recompose_into_image(width <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>, height <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>, OutputLayer::Grayscale);

        <span class="hljs-comment">// Update the source image that we will work with</span>
        <span class="hljs-comment">// going forward.</span>
        <span class="hljs-keyword">self</span>.source = filtered_image.to_luma8();
    }
}
</code></pre>
<h3 id="heading-noise-reduction">Noise reduction</h3>
<p>Now that you have the input image (which should only contain noise and stars), let's get rid of the noise:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// lib.rs</span>

<span class="hljs-keyword">impl</span> StarDetect {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">apply_noise_reduction</span></span>(&amp;<span class="hljs-keyword">mut</span> <span class="hljs-keyword">self</span>) {
        <span class="hljs-keyword">self</span>.source = imageproc::filter::bilateral_filter(&amp;<span class="hljs-keyword">self</span>.source, <span class="hljs-number">10</span>, <span class="hljs-number">10</span>., <span class="hljs-number">3</span>.);
    }
}
</code></pre>
<p>Next, you need to determine the optimum threshold value for a given minimum star count. You find it by picking a value and iteratively optimising it until you hit a star count that's more than the minimum.</p>
<h3 id="heading-how-to-optimize-the-threshold-and-binarization">How to optimize the threshold and binarization</h3>
<p>Start by creating a new file <code>threshold.rs</code> and defining a trait with necessary methods. You need a method to optimise your threshold value and another for performing the binarization operation:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// threshold.rs</span>

<span class="hljs-keyword">pub</span>(<span class="hljs-keyword">crate</span>) <span class="hljs-class"><span class="hljs-keyword">trait</span> <span class="hljs-title">ThresholdingExtensions</span></span> {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">optimize_threshold_for_star_count</span></span>(&amp;<span class="hljs-keyword">self</span>, min_star_count: <span class="hljs-built_in">usize</span>) -&gt; <span class="hljs-built_in">u8</span>;
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">binarize</span></span>(&amp;<span class="hljs-keyword">mut</span> <span class="hljs-keyword">self</span>, threshold: <span class="hljs-built_in">u8</span>);
}
</code></pre>
<p>Let's implement both of these:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// threshold.rs</span>

<span class="hljs-keyword">use</span> crate::centroid::find_star_centres_and_size;
<span class="hljs-keyword">use</span> crate::StarDetect;

<span class="hljs-keyword">impl</span> ThresholdingExtensions <span class="hljs-keyword">for</span> StarDetect {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">optimize_threshold_for_star_count</span></span>(&amp;<span class="hljs-keyword">self</span>, min_star_count: <span class="hljs-built_in">usize</span>) -&gt; <span class="hljs-built_in">u8</span> {
        <span class="hljs-comment">// Current star count</span>
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> star_count = <span class="hljs-number">0</span>;

        <span class="hljs-comment">// Starting threshold value</span>
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> threshold = <span class="hljs-built_in">u8</span>::MAX;

        <span class="hljs-comment">// Iterate until you've found the best threshold</span>
        <span class="hljs-keyword">while</span> star_count &lt; min_star_count {
            <span class="hljs-comment">// Panic if we reach the 0 intensity value while iterating.</span>
            <span class="hljs-comment">// This means that there are fewer stars than we hoped for.</span>
            <span class="hljs-keyword">if</span> threshold == <span class="hljs-number">0</span> {
                <span class="hljs-built_in">panic!</span>(<span class="hljs-string">"Maximum iteration count reached"</span>);
            }

            <span class="hljs-comment">// Reduce threshold to 95% of its previous value.</span>
            <span class="hljs-comment">// Using this, we check finer and finer differences</span>
            <span class="hljs-comment">// in threshold for each iteration.</span>
            threshold = (<span class="hljs-number">0.95</span> * threshold <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span>) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u8</span>;

            <span class="hljs-comment">// Clone the source data since we need to modify it</span>
            <span class="hljs-comment">// without affecting original data.</span>
            <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> source = <span class="hljs-keyword">self</span>.clone();

            <span class="hljs-comment">// Binarize the source data image using current threshold</span>
            ThresholdingExtensions::binarize(&amp;<span class="hljs-keyword">mut</span> source, threshold);

            <span class="hljs-comment">// Find the number of stars detected with the current threshold</span>
            star_count = find_star_centres_and_size(&amp;source.source).len();
        }

        threshold
    }

    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">binarize</span></span>(&amp;<span class="hljs-keyword">mut</span> <span class="hljs-keyword">self</span>, threshold: <span class="hljs-built_in">u8</span>) {
        <span class="hljs-comment">// Iterate over every pixel in source image</span>
        <span class="hljs-keyword">for</span> pixel <span class="hljs-keyword">in</span> <span class="hljs-keyword">self</span>.source.iter_mut() {
            <span class="hljs-keyword">if</span> *pixel &gt; threshold {
                <span class="hljs-comment">// If pixel intensity is greater than threshold</span>
                <span class="hljs-comment">// set it to maximum intensity instead.</span>
                *pixel = <span class="hljs-built_in">u8</span>::MAX;
            } <span class="hljs-keyword">else</span> {
                <span class="hljs-comment">// Otherwise, set it to 0 intensity.</span>
                *pixel = <span class="hljs-number">0</span>;
            }
        }
    }
}
</code></pre>
<p>You might notice that we use the <code>find_star_centres_and_size</code> function when trying to find the optimised threshold value. We'll get to that shortly, as we need to declare some types that will hold the state of our computation before we implement the function.</p>
<p>Create a new file <code>centroid.rs</code>.</p>
<p>Define a new struct that will hold the coordinates and size of the star:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// centroid.rs</span>

<span class="hljs-keyword">use</span> imageproc::point::Point;

<span class="hljs-meta">#[derive(Eq, PartialEq, Copy, Clone, Debug)]</span>
<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">StarCenter</span></span> {
    coord: Point&lt;<span class="hljs-built_in">u32</span>&gt;,
    radius: <span class="hljs-built_in">u32</span>,
}

<span class="hljs-keyword">impl</span> StarCenter {
    <span class="hljs-keyword">pub</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">coord</span></span>(&amp;<span class="hljs-keyword">self</span>) -&gt; &amp;Point&lt;<span class="hljs-built_in">u32</span>&gt; {
        &amp;<span class="hljs-keyword">self</span>.coord
    }
    <span class="hljs-keyword">pub</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">radius</span></span>(&amp;<span class="hljs-keyword">self</span>) -&gt; <span class="hljs-built_in">u32</span> {
        <span class="hljs-keyword">self</span>.radius
    }
}
</code></pre>
<p>We've also defined methods to retrieve these fields. <code>Point</code> is a type provided to you by <code>imageproc</code> crate to store coordinates in an image.</p>
<h3 id="heading-how-to-construct-polygons-around-stars">How to construct polygons around stars</h3>
<p>We're going to implement this function inside out. We first need a way to construct our polygon from contours. Let's implement that:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// centroid.rs</span>

<span class="hljs-keyword">use</span> geo::LineString;
<span class="hljs-keyword">use</span> imageproc::contours::Contour;

<span class="hljs-keyword">pub</span>(<span class="hljs-keyword">crate</span>) <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">construct_closed_polygon</span></span>(contour: &amp;Contour&lt;<span class="hljs-built_in">u32</span>&gt;) -&gt; LineString&lt;<span class="hljs-built_in">f32</span>&gt; {
    <span class="hljs-comment">// Create a new line string that connects all points</span>
    <span class="hljs-comment">// in the contour. This can create either an open</span>
    <span class="hljs-comment">// or a closed shape.</span>
    <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> line_string = LineString::from_iter(contour.points.iter().map(|point| Coord {
        x: point.x <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span>,
        y: point.y <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span>,
    }));

    <span class="hljs-comment">// If it is an open shape, close the shape to create a</span>
    <span class="hljs-comment">// polygon. This does nothing otherwise.</span>
    line_string.close();

    line_string
}
</code></pre>
<p><code>Contour</code> is a type provided by the <code>imageproc</code> crate, which is what it returns as the result of contouring operation on an image. It contains a list of points that lie on the border of the contour.</p>
<p><code>LineString</code> is a type provided by <code>geo</code> and is defined by them as "An ordered collection of two or more <a target="_blank" href="https://docs.rs/geo/latest/geo/geometry/struct.Coord.html"><code>Coord</code></a>s, representing a path between locations.". In this case, we use this type to construct the polygon shape.</p>
<h3 id="heading-how-to-detect-star-size-and-location-using-contours">How to detect star size and location using contours</h3>
<p>Next, you need a way to compute the <code>StarCenter</code> type we declared previously from contour data:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// centroid.rs</span>

<span class="hljs-keyword">use</span> geo::{Centroid, Coord, EuclideanDistance};

<span class="hljs-keyword">pub</span>(<span class="hljs-keyword">crate</span>) <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">filter_map_contour_to_star_centers</span></span>(contour: &amp;Contour&lt;<span class="hljs-built_in">u32</span>&gt;) -&gt; <span class="hljs-built_in">Option</span>&lt;StarCenter&gt; {
    <span class="hljs-comment">// If there are no points in the contour</span>
    <span class="hljs-comment">// it is not a star.</span>
    <span class="hljs-keyword">if</span> contour.points.is_empty() {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>;
    }

    <span class="hljs-keyword">if</span> contour.points.len() == <span class="hljs-number">1</span> {
        <span class="hljs-comment">// If there's only 1 point in the contour</span>
        <span class="hljs-comment">// consider it to be the center of the star</span>
        <span class="hljs-comment">// of size 1px.</span>
        <span class="hljs-keyword">let</span> center = contour.points.first().unwrap();
        <span class="hljs-keyword">let</span> radius = <span class="hljs-number">1_u32</span>;

        <span class="hljs-keyword">return</span> <span class="hljs-literal">Some</span>(StarCenter {
            coord: *center,
            radius,
        });
    }

    <span class="hljs-comment">// Otherwise, construct a polygon around the star based on</span>
    <span class="hljs-comment">// contour information.</span>
    <span class="hljs-keyword">let</span> polygon = construct_closed_polygon(contour);

    <span class="hljs-comment">// Find the centre of gravity of this polygon (centroid)</span>
    <span class="hljs-keyword">let</span> center = polygon.centroid().unwrap();

    <span class="hljs-comment">// Find the radius of the star based on maximum distance between</span>
    <span class="hljs-comment">// the centroid and any of the points in contour.</span>
    <span class="hljs-keyword">let</span> radius = polygon.points().fold(<span class="hljs-number">0</span>., |distance, point| {
        point.euclidean_distance(&amp;center).max(distance)
    });

    <span class="hljs-comment">// If the radius is less than 1px or more than 24px</span>
    <span class="hljs-comment">// we reject it as a non-star.</span>
    <span class="hljs-keyword">if</span> !(<span class="hljs-number">1</span>. ..=<span class="hljs-number">24</span>.).contains(&amp;radius) {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>;
    }

    <span class="hljs-comment">// Construct star center based on previously computed information</span>
    <span class="hljs-literal">Some</span>(StarCenter {
        coord: Point {
            x: center.x() <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span>,
            y: center.y() <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span>,
        },
        radius: radius <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span>,
    })
}
</code></pre>
<p>This function utilises the <code>construct_closed_polygon</code> function you defined previously to compute the final star centers and sizes. Now for the easy part: let's implement the missing <code>find_star_centres_and_size</code>:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// centroid.rs</span>

<span class="hljs-keyword">use</span> image::GrayImage;

<span class="hljs-keyword">pub</span>(<span class="hljs-keyword">crate</span>) <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">find_star_centres_and_size</span></span>(image: &amp;GrayImage) -&gt; <span class="hljs-built_in">Vec</span>&lt;StarCenter&gt; {
    <span class="hljs-comment">// Compute the contours in source image</span>
    <span class="hljs-keyword">let</span> contours = imageproc::contours::find_contours::&lt;<span class="hljs-built_in">u32</span>&gt;(image);

    contours
        .iter()
        <span class="hljs-comment">// Iterate over all contours and create a list</span>
        <span class="hljs-comment">// of star center and size data.</span>
        .filter_map(filter_map_contour_to_star_centers)
        .collect()
}
</code></pre>
<h3 id="heading-how-to-encapsulate-the-process">How to encapsulate the process</h3>
<p>All you need now is to implement one last method on the <code>StarDetect</code> struct that encapsulates the entire process:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// lib.rs</span>

<span class="hljs-keyword">use</span> crate::centroid::{find_star_centres_and_size, StarCenter};
<span class="hljs-keyword">use</span> crate::threshold::ThresholdingExtensions;

<span class="hljs-keyword">impl</span> StarDetect {
    <span class="hljs-keyword">pub</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">find_stars</span></span>(&amp;<span class="hljs-keyword">mut</span> <span class="hljs-keyword">self</span>, min_stars: <span class="hljs-built_in">usize</span>) -&gt; <span class="hljs-built_in">Vec</span>&lt;StarCenter&gt; {
        <span class="hljs-keyword">self</span>.extract_small_scale_structures();
        <span class="hljs-keyword">self</span>.apply_noise_reduction();

        <span class="hljs-keyword">let</span> threshold = <span class="hljs-keyword">self</span>.optimize_threshold_for_star_count(min_stars);
        <span class="hljs-keyword">self</span>.binarize(threshold);

        find_star_centres_and_size(&amp;<span class="hljs-keyword">self</span>.source)
    }
}
</code></pre>
<p>This method only calls the functions we've written so far. The user of your library will only need to call this function and nothing else.</p>
<p>You can now use what you've created to find stars in an image. For this article going forward, the image I'll be using to demonstrate is shown below. If you'd like to follow along, you can download the image I'll be using from <a target="_blank" href="https://anshulsanghi-assets.s3.ap-south-1.amazonaws.com/m42-star-detection.jpg"><strong>here</strong></a>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/m42-star-detection.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>M42 Orion Nebula, The Dark Horse Nebula, The Flaming Star Nebula And The Surrounding H-Alpha Gas</em></p>
<p>As you might notice, we have a wide range of star shapes, sizes and colors in this image, but the same goes for noise and other large-scale nebulae structures too.</p>
<h3 id="heading-how-to-test-the-implementation-on-astronomical-images">How to test the implementation on astronomical images</h3>
<p>Create a new file <code>main.rs</code> and declare it as a binary target in the <code>Cargo.toml</code> file. It should look like this:</p>
<pre><code class="lang-toml"><span class="hljs-section">[package]</span>
<span class="hljs-attr">name</span> = <span class="hljs-string">"stardetector"</span>
<span class="hljs-attr">version</span> = <span class="hljs-string">"0.1.0"</span>
<span class="hljs-attr">edition</span> = <span class="hljs-string">"2021"</span>

<span class="hljs-section">[[bin]]</span>
<span class="hljs-attr">name</span> = <span class="hljs-string">"stardetector"</span>
<span class="hljs-attr">path</span> = <span class="hljs-string">"src/main.rs"</span>

<span class="hljs-comment"># See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html</span>

<span class="hljs-section">[dependencies]</span>
<span class="hljs-attr">geo</span> = <span class="hljs-string">"0.28.0"</span>
<span class="hljs-attr">image</span> = <span class="hljs-string">"0.25.1"</span>
<span class="hljs-attr">image-dwt</span> = <span class="hljs-string">"0.3.2"</span>
<span class="hljs-attr">imageproc</span> = <span class="hljs-string">"0.24.0"</span>
</code></pre>
<p>You can finally use the lib we created to process the sample image. The final code in <code>main.rs</code> should look like this:</p>
<pre><code class="lang-rust"><span class="hljs-keyword">use</span> image::Rgba;
<span class="hljs-keyword">use</span> stardetector::StarDetect;

<span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">main</span></span>() {
    <span class="hljs-comment">// Load the image as mutable. You need mutability so that</span>
    <span class="hljs-comment">// you can draw on this image.</span>
    <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> image = image::open(<span class="hljs-string">"m42-star-detection.jpg"</span>).unwrap();

    <span class="hljs-comment">// Create a new star detector instance. You clone the image</span>
    <span class="hljs-comment">// here because you need to also draw on the image for</span>
    <span class="hljs-comment">// visualisation purposes in this example.</span>
    <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> star_detector = StarDetect::from(image.clone());

    <span class="hljs-comment">// Run the star finder function with a minimum star count of</span>
    <span class="hljs-comment">// 500</span>
    <span class="hljs-keyword">let</span> stars = star_detector.find_stars(<span class="hljs-number">500</span>);

    <span class="hljs-comment">// Iterate over all stars you've found</span>
    <span class="hljs-keyword">for</span> star <span class="hljs-keyword">in</span> stars {
        <span class="hljs-comment">// Draw a hollow circle on the image so that you</span>
        <span class="hljs-comment">// can see what the algorithm found</span>
        imageproc::drawing::draw_hollow_circle_mut(
            &amp;<span class="hljs-keyword">mut</span> image,
            (star.coord().x <span class="hljs-keyword">as</span> <span class="hljs-built_in">i32</span>, star.coord().y <span class="hljs-keyword">as</span> <span class="hljs-built_in">i32</span>),
            <span class="hljs-comment">// Extend the radius by 4px so that it's easier to see</span>
            <span class="hljs-comment">// in the visualisation.</span>
            star.radius() <span class="hljs-keyword">as</span> <span class="hljs-built_in">i32</span> + <span class="hljs-number">4</span>,
            <span class="hljs-comment">// Draw the circle with a pure green color</span>
            Rgba([<span class="hljs-number">0</span>, <span class="hljs-built_in">u8</span>::MAX, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>]),
        );
    }

    <span class="hljs-comment">// Save the image with star positions annotated with</span>
    <span class="hljs-comment">// green circles.</span>
    image.save(<span class="hljs-string">"annotated.jpg"</span>).unwrap();
}
</code></pre>
<p>Ensure that the downloaded image is present at the root of this project folder.</p>
<p>We can finally run the program and see what it gives us:</p>
<pre><code class="lang-shell">cargo run --release
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/annotated.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>A part of the orion region with detected stars annotated with green circles</em></p>
<p>That looks pretty good! If we zoom in to a small part of the image:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/annotated-zoomed.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>A part of the orion region with detected stars annotated with green circles</em></p>
<p>We can see that there are some minor issues with the algorithm, such as stars that are very close to each other and have an overlap of their halos are considered as a single star. The problem is quite an interesting one.</p>
<p>There are various techniques to solve this issue, but they're out of the scope of this article.</p>
<h3 id="heading-how-to-optimize-minimum-star-count">How to optimize minimum star count</h3>
<p>Let's crank up the minimum star count to <strong>1000</strong> and see what happens:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/annotated1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>A part of the orion region with detected stars annotated with green circles</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/annotated1-zoomed.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>A part of the orion region with detected stars annotated with green circles</em></p>
<p>This time, it picked up many of the fainter stars since the threshold had to be lower to accommodate for the higher minimum star count.</p>
<p>It's time to crank it up further! Let's try <strong>2000</strong>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/annotated2.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>A part of the orion region with detected stars annotated with green circles</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/annotated2-zoomed.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>A part of the orion region with detected stars annotated with green circles</em></p>
<p>It picked up even more stars this time, but it has also started hallucinating some stars where there are none. This is being caused by lower threshold retaining more noise in the image, which is then picked up as a star. But noise isn't as visible in the final image unless you really pixel-peep, which is why it appears that the algorithm is hallucinating stars.</p>
<p><strong>Noise</strong>, in this particular situation, not only refers to the noise in the traditional sense – but also to any pixels that do no belong to a star for this particular purpose.</p>
<h3 id="heading-but-there-is-one-more-thing">But there is one more thing...</h3>
<p>Let's crank the minimum star count up to the absolute maximum for this particular image, which I found to be <strong>3500</strong>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/annotated3.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>A part of the orion region with detected stars annotated with green circles</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/annotated3-zoomed.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>A part of the orion region with detected stars annotated with green circles</em></p>
<p>The algorithm now seems to have failed us miserably, which is expected when the noise is too high. There are too many false-positives for this data to be of any use at all.</p>
<p>I wanted to show you this anyway because it shows you the flaws in the algorithm. It also shows you what star detection on noise signal looks like and why we need to pre-process an image to remove everything that isn't a star before we run the star detection.</p>
<p>We're going to stop here for the implementation, but there's many resources you can find below if you're interested in learning more about the topic.</p>
<p>The complete code for everything I talked about today can be found here: <a target="_blank" href="https://github.com/anshulsanghi-blog/stardetector">https://github.com/anshulsanghi-blog/stardetector</a></p>
<h2 id="heading-further-reading"><strong>Further Reading</strong></h2>
<p>These are some of the resources that were very helpful to me when I was trying to figure out how star detection works. The resources are more about plate-solving (the process of figuring out the exact coordinates in the night sky of things in an image), but star detection is a crucial part of that process.</p>
<ul>
<li><a target="_blank" href="https://olegignat.com/how-plate-solving-works/">How astronomic plate-solving works</a> by Oleg Ignat</li>
<li><a target="_blank" href="https://pixinsight.com/doc/tools/StarAlignment/StarAlignment.html#description_002">Star Detection during StarAlignment Process In PixInsight</a></li>
</ul>
<p>Stars have pretty interesting characteristics, some of which are unique such as their <a target="_blank" href="https://en.wikipedia.org/wiki/Point_spread_function">point-spread function</a> estimates. These characteristics can be implemented to further improve the star detection and filtering of false-positives.</p>
<p>In addition, I've created a Rust library that implements this algorithm, but has some additional features already, and more robust processes are in the works.</p>
<p>Things such as handling RGB images properly instead of converting them to grayscale are already implemented. It also has the ability to work with RAW images.</p>
<p>I'm also soon going to be working on performance improvements for the same.</p>
<p>If you want to learn more, or contribute to the library, feel free to do so. The repository can be found here: <a target="_blank" href="https://github.com/anshap1719/stardetect">https://github.com/anshap1719/stardetect</a>  </p>
<h2 id="heading-wrapping-up"><strong>Wrapping Up</strong></h2>
<p>I hope you enjoyed the journey so far. If image processing and analysis techniques or their implementation in Rust is something that interests you, then stay tuned for more as these are the topics I love writing about.</p>
<p>Also, feel free to <strong><a target="_blank" href="mailto:contact@anshulsanghi.tech">contact me</a></strong> if you have any questions or opinions on this topic.</p>
<h3 id="heading-enjoying-my-work"><strong>Enjoying my work?</strong></h3>
<p>Consider buying me a coffee to support my work!</p>


<p>Till next time, happy coding and wishing you clear skies!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Rust Tutorial – Learn Multi-Scale Processing of Astronomical Images ]]>
                </title>
                <description>
                    <![CDATA[ Recently, there's been a massive amount of effort put into developing novel image processing techniques. And many of them are derived from digital signal processing methods such as Fourier and Wavelet transforms.  These techniques have not only enabl... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/multi-scale-analysis-of-images-in-rust/</link>
                <guid isPermaLink="false">66bb5795f55324ca867c88e2</guid>
                
                    <category>
                        <![CDATA[ algorithms ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Rust ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Anshul Sanghi ]]>
                </dc:creator>
                <pubDate>Wed, 10 Apr 2024 15:48:11 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/04/Watermark.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Recently, there's been a massive amount of effort put into developing novel image processing techniques. And many of them are derived from digital signal processing methods such as Fourier and Wavelet transforms. </p>
<p>These techniques have not only enabled a wide range of image processing techniques such as noise reduction, sharpening, and dynamic-range extension, but have also enabled many techniques used in compute vision such as edge detection, object detection, and so on.</p>
<p>Multi-scale analysis is one of the newer techniques (relatively speaking) that has been adopted in a wide range of applications, especially in the astronomical image and data processing applications. This technique, which is based on Wavelet transform, allows us to divide our data into multiple signals, that all add up to make the final signal. </p>
<p>We can then perform our processing or analysis work on this individual sub-signals, allowing us to do targeted operations that do not affect other sub-signals. </p>
<p>In this tutorial, we'll first be exploring what the technique is all about, through the lens of a particular algorithm for performing multi-scale analysis on images. We'll then move on to looking at how we can implement what we discussed in the first part in Rust programming language and recreate the examples you see in the first half of the article.</p>
<h2 id="heading-before-you-read">Before You Read:</h2>
<h3 id="heading-prerequisites-for-part-1">Prerequisites for Part 1:</h3>
<p>The technique described is derived from the concept of "Wavelet Transforms". You don't need to know everything about it, but a very basic understanding will help you grasp the material better.</p>
<p>Since the article focuses on image processing and analysis, a basic understanding of how pixels work in digital format is helpful, but not mandatory.</p>
<h3 id="heading-prerequisites-for-part-2">Prerequisites for Part 2:</h3>
<p>Here, we focus on implementing the algorithm using the Rust programming language, without going much into the details of the language itself. So being comfortable writing Rust programs, and comfortable reading crate documentations is required.</p>
<p>If this is not you, you can still read Part 1 and learn the technique, and then maybe you'll want to then try it out in a language of your choice. If you're not familiar with Rust, I highly encourage you to learn the basics. <a target="_blank" href="https://www.freecodecamp.org/news/rust-in-replit/">Here's an interactive Rust course</a> that can get you started.</p>
<h2 id="heading-table-of-contents">Table Of Contents</h2>
<ol>
<li><a class="post-section-overview" href="#heading-part-1-understanding-the-multi-scale-processing-technique-and-algorithm">Part 1: Understanding Multi-Scale Processing Technique And Algorithm</a><ol>
<li><a class="post-section-overview" href="#heading-what-is-multi-scale-image-processing">What is multi-scale image processing</a></li>
<li><a class="post-section-overview" href="#heading-the-a-trous-wavelet-transform">The <em>À Trous</em> Wavelet Transform</a></li>
<li><a class="post-section-overview" href="#heading-scaling-functions">Scaling Functions</a></li>
<li><a class="post-section-overview" href="#heading-convolution-pixels-at-each-scale">Convolution Pixels At Each Scale</a></li>
<li><a class="post-section-overview" href="#heading-handling-boundary-conditions">Handling Boundary Conditions</a> </li>
<li><a class="post-section-overview" href="#heading-computing-maximum-possible-scales-for-any-given-image">Computing Maximum Possible Scales For Any Given Image</a></li>
<li><a class="post-section-overview" href="#heading-closing-notes">Closing Notes</a></li>
</ol>
</li>
<li><a class="post-section-overview" href="#heading-part-2-how-to-implement-a-trous-tranform-in-rust">Part 2: How to Implement <em>À Trous</em> Tranform in Rust</a><ol>
<li><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></li>
<li><a class="post-section-overview" href="#heading-the-a-trous-transform">The <em>À Trous</em> Transform</a></li>
<li><a class="post-section-overview" href="#heading-iterators-and-the-a-trous-transform">Iterators And The <em>À Trous</em> Transform</a></li>
<li><a class="post-section-overview" href="#heading-convolution">Convolution</a></li>
<li><a class="post-section-overview" href="#heading-implementing-the-iterator">Implementing the Iterator</a></li>
<li><a class="post-section-overview" href="#heading-recomposition">Recomposition</a></li>
<li><a class="post-section-overview" href="#heading-using-the-a-trous-transform">Using The <em>À Trous</em> Transform</a></li>
<li><a class="post-section-overview" href="#heading-further-reading">Further Reading</a></li>
</ol>
</li>
<li><a class="post-section-overview" href="#heading-wrapping-up">Wrapping Up</a></li>
</ol>
<h2 id="heading-part-1-understanding-the-multi-scale-processing-technique-and-algorithm">Part 1: Understanding the Multi-Scale Processing Technique and Algorithm</h2>
<p>So what do we mean when we talk about multi-scale processing or analysis of some data? Well, we usually mean breaking down the input data into multiple signals, each representing a particular scale of information. </p>
<p>Scale, when talking about image analysis, simply refers to the size of structures that we are looking at at any given time. It ignores everything else that's either smaller or larger than the current scale.</p>
<h3 id="heading-what-is-multi-scale-image-processing">What is multi-scale image processing?</h3>
<p>For images, "scales" generally refer to the size in pixels of various structures or details in the image. You'll be able to get an intuitive understanding by looking at the following example:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/Processed.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Messier 33, AKA Triangulum Galaxy</em></p>
<p>Assuming our naïve understanding is correct, we can derive images of at-least the following 3 scales:</p>
<ul>
<li>Very small structures, usually the size of a single pixel. This layer, when separated from the rest of the image, will only contain the noise and some sharp stars for the most part.</li>
<li>Small structures, usually a few pixels in size. This layer, when separated, will contain all of the stars and the very fine details in the galaxy arms.</li>
<li>Large and very large scale structures, usually 100s of pixels in size. This layer, when separated, will contain the general size and shape of the galaxy at the center.</li>
</ul>
<p>Now the question becomes, <strong>why do we need to do all of this in the first place?</strong></p>
<p>The answer is simple: it allows us to make targeted enhancements and changes to an image. </p>
<p>For example, noise reduction on the overall image will usually result in a loss of sharpness in the galaxy. But since we have broken our image down into multiple scales, we can easily apply noise reduction to only the first few layers, as most of the random noise that is easy to remove resides only in lower scale layers. </p>
<p>We then re-combine the noise-reduced low-scale layers with unmodified large-scale ones, and we have an output that gives us noise reduction without a loss in quality.</p>
<p>Another peculiar thing about noise is that it's almost always present in just one of these layers, making noise reduction process both easy and non-destructive.</p>
<p>If you're more of a visual learner, let's see this in practice using the image we used above. We're gonna be working with the following grayscale version of that image, where I've also added random gaussian noise:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/m33-noise-lum-1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Messier 33 AKA Triangulum Galaxy, Converted to grayscale and with added Gaussian noise</em></p>
<p>Performing scale-based layer separation on this image, we get the following results. Note that the results are rescaled to a range where they can be viewed as an image for representational purpose. The actual transform produces pixel values that don't make sense when looked at independently, but all of the techniques and calculations described in this tutorial can still be safely applied without rescale. The recomposition process automatically gives us back the correct range:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/trous-decomposition..jpg" alt="Image" width="600" height="400" loading="lazy">
<em>9-level À Trous Decomposition. From top-left to bottom-right, we have images at the following pixel scales: 1, 2 4, 8, 16, 32, 64, 128, 256 (powers of 2)</em></p>
<ol>
<li>The first and second layers contain the noise and stars. In this particular example, noise is mixed in with the stars. But using the first and second layers, we can easily target areas that are not present in the second layer, as we can be sure that those are where the noise is present in the first layer.</li>
<li>With the third layer, we still see the residue luminance from stars. But if you look closely, we also see very faintly the arms of the galaxy starting to appear.</li>
<li>From the fourth layer onwards, we see the galaxy at varying scales and detail levels, completely without the stars. We start with the finer details (relatively small scale details) and increasingly move on to larger and larger scale samples. By the end, we only see a vague shape where the galaxy used to be.</li>
</ol>
<p>From here on, we can selectively apply noise reduction to the first two layers. Then we can recombine all of the layers to create the following image that has very little noise while preserving the same amount of details in the stars and the galaxy arms:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/wavelet-processed.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Messier 33 AKA Triangulum Galaxy, result of recombining all layers but with noise reduction applied to the pixel scale 1 &amp; 2 layers</em></p>
<p>In its most basic form, multi-scale analysis involves breaking up your source image, commonly referred to as the "signal", into multiple "signals" – each containing the data for a particular scale in the source signal. </p>
<p>Scale, when talking about image signal here, refers to the distance between adjacent pixels that we take when creating the layer from the source image.</p>
<p>In practice, this technique is used as the one of the first steps in all kinds of astronomical data analysis and image processing. </p>
<p>As an example, you can use the technique to detect locations of stars while ignoring larger structures much more easily than would be possible otherwise.</p>
<h3 id="heading-the-a-trous-wavelet-transform">The <em>À Trous</em> Wavelet Transform</h3>
<p>All of what I've showed you previously, and all of what you're going to see in this tutorial, was achieved with wavelet decomposition and recomposition using the à <em>trous</em> algorithm for discreet wavelet transforms.</p>
<p>This algorithm has been used throughout the years for various applications. But it's become particularly important recently in astronomical image processing applications, where different objects and signals in an image can be completely separated based on structural scales.</p>
<p>Here's how the algorithm works:</p>
<ol>
<li>We start with the source image input and number of levels to decompose into n.</li>
<li>For each level n:<ul>
<li>We convolve the image with our scaling function (we'll see what this is in a bit), where adjacent pixels are considered to be <strong>2<sup>n</sup></strong> units apart from each other, giving us the result <strong>result<sub>n</sub></strong>. This is where the "À Trous" name comes from, which literally translates to "with holes".</li>
<li>The layer output <strong>output<sub>n</sub></strong> is then computed using <strong>input</strong> - <strong>result<sub>n</sub></strong>.</li>
<li>We then update <strong>input</strong> to equal <strong>result<sub>n</sub></strong>. This is also known as residue data which serves as the source data for next layer.</li>
</ul>
</li>
<li>Repeat the above steps for all levels.</li>
<li>In the end, we have 9 wavelet layers, and 1 residue layer. All 10 layers are required for the recomposition.</li>
</ol>
<p>For a more mathematical approach to understanding this algorithm, I encourage you to read about <a target="_blank" href="https://www.eso.org/sci/software/esomidas/doc/user/18NOV/volb/node317.html"><strong>the à trous algorithm here</strong></a><strong>.</strong> </p>
<p>The recomposition process is very straightforward: we just need to add all 10 layers together. We can chose to apply positive or negative <strong>bias</strong> to any of the layers, which is a factor by which to multiply the layer pixel values during recomposition. You can use it either to enhance or diminish the characteristics of that particular layer.</p>
<h3 id="heading-scaling-functions">Scaling Functions</h3>
<p>Scaling functions are specific <a target="_blank" href="https://en.wikipedia.org/wiki/Kernel_(image_processing)">convolution kernels</a> that help us better represent data at a particular scale based on our use case. There are 3 most commonly used scaling functions, which are shown below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/b3spline-level2.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/linear-level2.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/low-scale-level2.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The images above show the 3 most commonly used scaling functions in the À Trous algorithm, visualised using 3rd level decomposition of the triangulum galaxy image used previously:</p>
<ul>
<li>B3 Spline is a very smooth kernel. It is mostly used in isolation of large scale structures. If we wanted to sharpen our galaxy, we would have used this kernel.</li>
<li>Low-scale is a very sharply peaked kernel, and is best at working with small scale structures.</li>
<li>Linear interpolation kernel gives us the best of both worlds, and hence is used when we need to work with both small scale and large scale structures. This is what we have used in all of our previous examples.</li>
</ul>
<h3 id="heading-convolution-pixels-at-each-scale">Convolution Pixels At Each Scale</h3>
<p>I mentioned in the algorithm that at each scale, the pixels in the image are considered to be 2<sup>n</sup> units apart. Let's try to grasp a better understanding of this using the following visualisation:</p>
<p>Consider the following 8px by 8px image. Each pixel is labeled 1 through 64, which is their index.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/1-32x32mm.png" alt="Image" width="600" height="400" loading="lazy">
<em>A representational pixel grid of a 8x8px image</em></p>
<p>We're going to focus on a convolution operation of one of the center pixels only for this example, let's say pixel number 28.</p>
<p><strong>Scale 0:</strong> At scale 0, the value of 2<sup>n</sup> becomes <strong>1</strong>. This means that for convolution, we'll consider pixels that are 1 unit apart from our target center pixel. These pixels are highlighted below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/scale0-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>8x8px grid with pixels that are involved in convolution for pixel number 28 highlighted at scale 0</em></p>
<p><strong>Scale 1:</strong> This is where things get interesting. At scale 1, the value of 2<sup>n</sup> becomes <strong>2</strong>. This means that for convolution, we'll jump directly to pixels that are 2 locations apart from the target pixel:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/scale1-3.png" alt="Image" width="600" height="400" loading="lazy">
<em>8x8px grid with pixels that are involved in convolution for pixel number 28 highlighted at scale 1</em></p>
<p>As you can see, we've created "holes" in our computation of the value of the target pixel by skipping <strong>2<sup>n</sup> - 1</strong> adjacent pixels and selecting the <strong>2<sup>n</sup>th</strong> pixel. This is the basis of the algorithm.</p>
<p>This process is repeated for every pixel in the image, just like a regular convolution process. And each time, we consider increasing distances between pixels for computation of final values at increasing scales. </p>
<p>Let's look at just one more scale.</p>
<p><strong>Scale 2</strong>: This is where things get even more interesting. At scale 2 the value of 2<sup>n</sup> becomes <strong>4</strong>. This means that for convolution, we'll jump directly to pixels that are <strong>4</strong> locations apart from the target pixel:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/scale2-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>8x8px grid with pixels that are involved in convolution for pixel number 28 highlighted at scale 2</em></p>
<p>Wait what? Why are we choosing pixels 1, 4, 8, 25, &amp; 57? 1 &amp; 4 are only 3 locations apart, 25 is only 2 locations apart, and 8 &amp; 57 are not even diagonally aligned with the target pixel. What's going on?</p>
<h3 id="heading-handling-boundary-conditions">Handling Boundary Conditions</h3>
<p>As we've mentioned that this process is executed for all of the pixels in an image, we also need to consider cases where the pixel locations for convolution lie outside of the image.</p>
<p>This is not a concept unique to this algorithm. During convolution, this is referred to as a boundary condition or handling boundary pixels. There are various techniques for dealing with this, and all of them involve virtually extending the image in order to make it seem like we're not encountering the boundary at all.</p>
<p>Some of the techniques are:</p>
<ul>
<li>Extending as much as needed by copying the value of the last row/column</li>
<li>Mirroring the image on all edges and corners</li>
<li>Wrapping the image around the edges.</li>
</ul>
<p>In our example, we're employing the "mirroring" technique. When implementing such an algorithm, we don't need to actually create an extended image. Any boundary handling is implementable using just basic mathematical formulae.</p>
<p>Our extended image, with the correct pixels selected for scale 2, is as follows:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/scale2-mirrored.png" alt="Image" width="600" height="400" loading="lazy">
<em>Source image extended on all edges and corners using the mirroring technique. All of the faded regions represent extended areas.</em></p>
<p>Again, the extension is only logical and is completely computed using formulae, as opposed to actually extending the source image and then checking. We can easily see that with the mirrored images in place, our basic rule of picking pixels that are 2<sup>n</sup> locations apart is still followed.</p>
<h3 id="heading-computing-maximum-possible-scales-for-any-given-image">Computing Maximum Possible Scales for Any Given Image</h3>
<p>If you think about it carefully, you'll see that the maximum layers an image can be decomposed into can be calculated by computing the log<sub>2</sub> of the image width or height (whichever is lower) and throwing away the fractional part. </p>
<p>In our 5x5 image, log<sub>2</sub>(5) ~= <strong>2.32</strong>. If we throw away the fractional part, that leaves us with 2 layers. Similarly, for a 1000x1000px image, log<sub>2</sub>1000 ~= <strong>9.96</strong>, which means we can decompose a 1000x1000 px image into a maximum of 9 layers. It simply implies that our "holes" cannot be larger than the width or height.</p>
<p>Even with the mirroring extension we used above, if the holes are larger than the width of the image, they'll still end up outside of the extended regions, specially for corner or boundary pixels, making it impossible to perform convolution at that scale.</p>
<h3 id="heading-closing-notes">Closing Notes</h3>
<p>Thinking about the examples and visualisations a bit more, you can clearly see how and why this algorithm works, and how it's able to separate out structures in an image based on their sizes. The increasing hole sizes make it so that only structures larger than the hole itself are retained for any given layer.</p>
<p>A big advantage of using this algorithm is the computational cost. Since this doesn't involve Fourier or Wavelet transforms, the computational cost is quite low, relatively speaking. The memory cost, however, is indeed higher. But more often than not that is a good tradeoff.</p>
<p>Another advantage of this algorithm when comparing it to other discreet wavelet transform algorithms is that the size of source image is preserved throughout the entire process. There's no decimation or upscaling happening here, making this algorithm one of the easiest ones to understand and implement.</p>
<p>The algorithm is used in almost all of the astronomical image processing softwares such as <a target="_blank" href="https://pixinsight.com/">PixInsight</a>, <a target="_blank" href="https://siril.org/">Siril</a>, and many others.</p>
<p>This algorithm is also known by other names such as <strong>Stationary Wavelet Transform</strong> and <strong>Starlet Transform</strong>.</p>
<h2 id="heading-part-2-how-to-implement-a-trous-tranform-in-rust">Part 2: How to Implement <em>À Trous</em> Tranform in Rust</h2>
<p>Now I'm going to show you how you can implement this algorithm in Rust. </p>
<p>For the purposes of this tutorial, I'm going to assume that you're pretty familiar with Rust and its basic concepts, such as data-types, iterators, and traits and are comfortable writing programs that use these concepts. </p>
<p>I'm also going to assume that you have an understanding of what convolution and convolution kernels mean in this context.</p>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p>We're going to need a couple of dependencies. Before we get to that, let's quickly create a new project:</p>
<pre><code class="lang-shell">cargo new --lib atrous-rs
cd atrous-rs
</code></pre>
<p>Now let's all of the dependencies we need. We actually only need 2:</p>
<pre><code class="lang-shell">cargo add image ndarray
</code></pre>
<p><strong><code>image</code></strong> is a Rust library we'll use to work with images of all of the standard formats and encodings. It also helps us convert between various formats, and provides easy access to pixel data as buffers.</p>
<p><strong><code>ndarray</code></strong> is a Rust library that helps you you create, manipulate, and work with 2D, 3D, or N-Dimensional arrays. We can use nested Vectors, but using a project like ndarray is better in this case because we need to perform a lot of operations on both individual values as well as their neighbours. Not only is it much easier to do with <strong>ndarray</strong>, but they also have performance optimisations built in for many operations and CPU types.</p>
<p>Although I'll be covering the basic functions/traits/methods/data-types we use from these crates, I'm not going to go into too much detail for them. I encourage you to read the docs instead.</p>
<p>We're actually going to jump straight to algorithm implementation, and come back later to see how we can use it.</p>
<h3 id="heading-the-a-trous-transform">The <em>À Trous</em> Transform</h3>
<p>Create a new file that will hold our implementation. Let's name it <code>transform.rs</code>.</p>
<p>Start with adding the following struct, that will hold the information we need to perform the transform:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// transform.rs</span>

<span class="hljs-keyword">use</span> ndarray::Array2;

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">ATrousTransform</span></span> {
    input: Array2&lt;<span class="hljs-built_in">f32</span>&gt;, <span class="hljs-comment">// `Array2&lt;f32&gt;` is a 2D array where each value is of type `f32`. This will hold our pixel data for input image.</span>
    levels: <span class="hljs-built_in">usize</span>, <span class="hljs-comment">// The number of levels or scales to decompose the image into</span>
    current_level: <span class="hljs-built_in">usize</span>, <span class="hljs-comment">// Current level that we need to generate. This holds the state of our iterator.</span>
    width: <span class="hljs-built_in">usize</span>, <span class="hljs-comment">// Width of input image</span>
    height: <span class="hljs-built_in">usize</span>, <span class="hljs-comment">// Height of input image</span>
}
</code></pre>
<p>We also need a way to create this struct easily. In our case, we want to be able to create it from the input image directly. Also, input image can be of any of the supported format and encoding, but we want a consistent color-type to implement the calculations, so we'll also need to convert the image to our expected format.</p>
<p>It's helpful to extract all of this logic away using the "constructor" pattern in Rust. Let's implement that:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// transform.rs</span>

<span class="hljs-keyword">use</span> image::GenericImageView;

<span class="hljs-keyword">impl</span> ATrousTransform {
    <span class="hljs-keyword">pub</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">new</span></span>(input: &amp;image::DynamicImage, levels: <span class="hljs-built_in">usize</span>) -&gt; <span class="hljs-keyword">Self</span> {
        <span class="hljs-keyword">let</span> (width, height) = input.dimensions();
        <span class="hljs-keyword">let</span> (width, height) = (width <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>, height <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>);

        <span class="hljs-comment">// Create a new 2D array with proper size for each dimension to hold all of our input's pixel data. Method `zeros` takes a "shape" parameter, which is a tuple of (rows_count, columns_count).</span>
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> data = Array2::&lt;<span class="hljs-built_in">f32</span>&gt;::zeros((height, width));

        <span class="hljs-comment">// Convert the image to be a grayscale image where each pixel value is of type `f32`. Loop over all pixels in the input image along with its 2D location.</span>
        <span class="hljs-keyword">for</span> (x, y, pixel) <span class="hljs-keyword">in</span> input.to_luma32f().enumerate_pixels() {
            <span class="hljs-comment">// Put the pixel value at appropriate location in our data array. The `[[]]` syntax is used to provide a 2-dimensional index such as `[[row_index, col_index]]`</span>
            data[[y <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>, x <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>]] = pixel.<span class="hljs-number">0</span>[<span class="hljs-number">0</span>];
        }

        <span class="hljs-keyword">Self</span> {
            input: data,
            levels,
            current_level: <span class="hljs-number">0</span>,
            width,
            height
        }
    }
}
</code></pre>
<p>This takes care of converting the image to grayscale and converting the pixel values to <code>f32</code>. If you're not already aware, for images with floating-point pixel values, the values are always normalized. This means that they are always between 0 and 1 – 0 representing black and 1 representing white.</p>
<h3 id="heading-iterators-and-the-a-trous-transform">Iterators and the <em>À Trous</em> Transform</h3>
<p>Before we continue, let's think about the algorithm for a second. We need to be able to generate images at increasing scales, until we hit the maximum number of levels we need. </p>
<p>We want the consumer of our library to have access to all of these scales, and be able to manipulate them and also easily recombine once they're done. They need to be able to filter layers to ignore structures at certain scales, manipulate or "map" them to change their characteristics, perform operations on them, or even store each image if they so need.</p>
<p>This sounds an awful lot like Iterators! Iterators give us methods like <code>filter</code>, <code>skip</code>, <code>take</code>, <code>map</code>, <code>for_each</code>, and so on, all of which are exactly all we need to work with our layers before recomposition.</p>
<p>One added advantage of Iterators is that it allows you to finish processing each layer all the way through before you move on to the next one. If you're unsure why this is, I suggest reading more about <a target="_blank" href="https://doc.rust-lang.org/book/ch13-02-iterators.html">processing a series of items with Iterators in Rust</a>.</p>
<p>We're going implement the <code>Iterator</code> trait for our <code>ATrousTransform</code> type which should produce a wavelet layer as output for each iteration. </p>
<p>We're going to be implementing the inner-most parts of the algorithm first, and build out from there. So we first need a way to convolve an input data buffer with the scaling function while making sure that adjacent pixels are 2<sup>n</sup> locations apart, which is the first step in our loop.</p>
<h3 id="heading-convolution">Convolution</h3>
<p>We need to define our convolution kernel before we can do anything else. Create a new file <code>kernel.rs</code> and add it to <code>lib.rs</code> with the following contents:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// kernel.rs</span>

<span class="hljs-meta">#[derive(Copy, Clone)]</span>
<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">LinearInterpolationKernel</span></span> {
    values: [[<span class="hljs-built_in">f32</span>; <span class="hljs-number">3</span>]; <span class="hljs-number">3</span>]
}

<span class="hljs-keyword">impl</span> <span class="hljs-built_in">Default</span> <span class="hljs-keyword">for</span> LinearInterpolationKernel {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">default</span></span>() -&gt; <span class="hljs-keyword">Self</span> {
        <span class="hljs-keyword">Self</span> {
            values: [
                [<span class="hljs-number">1</span>. / <span class="hljs-number">16</span>., <span class="hljs-number">1</span>. / <span class="hljs-number">8</span>., <span class="hljs-number">1</span>. / <span class="hljs-number">16</span>.],
                [<span class="hljs-number">1</span>. / <span class="hljs-number">8</span>., <span class="hljs-number">1</span>. / <span class="hljs-number">4</span>., <span class="hljs-number">1</span>. / <span class="hljs-number">8</span>.],
                [<span class="hljs-number">1</span>. / <span class="hljs-number">16</span>., <span class="hljs-number">1</span>. / <span class="hljs-number">8</span>., <span class="hljs-number">1</span>. / <span class="hljs-number">16</span>.],
            ]
        }
    }
}
</code></pre>
<p>We define it using a struct instead of a constant array of arrays because we need to define some tiny helpful methods on it related to index handling. We'll come back to that later.</p>
<p>Create another file <code>convolve.rs</code>. This is where all of the code for handling convolution for individual pixels will go. We'll define a <code>Convolution</code> trait that will define methods needed to perform the convolution on every pixel in current layer.</p>
<pre><code class="lang-rust"><span class="hljs-comment">// convolve.rs</span>

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">trait</span> <span class="hljs-title">Convolution</span></span> {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">compute_pixel_index</span></span>(
        &amp;<span class="hljs-keyword">self</span>,
        distance: <span class="hljs-built_in">usize</span>,
        kernel_index: [<span class="hljs-built_in">isize</span>; <span class="hljs-number">2</span>],
        target_pixel_index: [<span class="hljs-built_in">usize</span>; <span class="hljs-number">2</span>]
    ) -&gt; [<span class="hljs-built_in">usize</span>; <span class="hljs-number">2</span>];

    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">compute_convoluted_pixel</span></span>(
        &amp;<span class="hljs-keyword">self</span>, 
        distance: <span class="hljs-built_in">usize</span>, 
        index: [<span class="hljs-built_in">usize</span>; <span class="hljs-number">2</span>]
    ) -&gt; <span class="hljs-built_in">f32</span>;
}
</code></pre>
<p>You may ask why we need a trait here instead of a simple <code>impl</code> block. We are only working with Grayscale images in this article, but you may want to extend it to implement it for RGB or other color modes as well.</p>
<p>Now, you need to implement this trait for your <code>ATrousTransform</code> struct:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// convolve.rs</span>

<span class="hljs-keyword">impl</span> Convolution <span class="hljs-keyword">for</span> ATrousTransform {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">compute_pixel_index</span></span>(
        &amp;<span class="hljs-keyword">self</span>, 
        distance: <span class="hljs-built_in">usize</span>, 
        kernel_index: [<span class="hljs-built_in">isize</span>; <span class="hljs-number">2</span>], 
        target_pixel_index: [<span class="hljs-built_in">usize</span>; <span class="hljs-number">2</span>]
    ) -&gt; [<span class="hljs-built_in">usize</span>; <span class="hljs-number">2</span>] {
        <span class="hljs-keyword">let</span> [kernel_index_x, kernel_index_y] = kernel_index;

        <span class="hljs-comment">// Compute the actual distance of adjacent pixel</span>
        <span class="hljs-comment">// by multiplying their relative position with the</span>
        <span class="hljs-comment">// size of the hole.</span>
        <span class="hljs-keyword">let</span> x_distance = kernel_index_x * distance <span class="hljs-keyword">as</span> <span class="hljs-built_in">isize</span>;
        <span class="hljs-keyword">let</span> y_distance = kernel_index_y * distance <span class="hljs-keyword">as</span> <span class="hljs-built_in">isize</span>;

        <span class="hljs-keyword">let</span> [x, y] = target_pixel_index;

        <span class="hljs-comment">// Compute the index of adjacent pixel in the 2D</span>
        <span class="hljs-comment">// image based on the index of current pixel.</span>
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> x = x <span class="hljs-keyword">as</span> <span class="hljs-built_in">isize</span> + x_distance;
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> y = y <span class="hljs-keyword">as</span> <span class="hljs-built_in">isize</span> + y_distance;

        <span class="hljs-comment">// If x index is out of bounds, consider x to be</span>
        <span class="hljs-comment">// the nearest boundary location</span>
        <span class="hljs-keyword">if</span> x &lt; <span class="hljs-number">0</span> {
            x = <span class="hljs-number">0</span>;
        } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> x &gt; <span class="hljs-keyword">self</span>.width <span class="hljs-keyword">as</span> <span class="hljs-built_in">isize</span> - <span class="hljs-number">1</span> {
            x = <span class="hljs-keyword">self</span>.width <span class="hljs-keyword">as</span> <span class="hljs-built_in">isize</span> - <span class="hljs-number">1</span>;
        }

        <span class="hljs-comment">// If y index is out of bounds, consider y to be</span>
        <span class="hljs-comment">// the nearest boundary location</span>
        <span class="hljs-keyword">if</span> y &lt; <span class="hljs-number">0</span> {
            y = <span class="hljs-number">0</span>;
        } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> y &gt; <span class="hljs-keyword">self</span>.height <span class="hljs-keyword">as</span> <span class="hljs-built_in">isize</span> - <span class="hljs-number">1</span> {
            y = <span class="hljs-keyword">self</span>.height <span class="hljs-keyword">as</span> <span class="hljs-built_in">isize</span> - <span class="hljs-number">1</span>;
        }

        <span class="hljs-comment">// The final 2D index of pixel.</span>
        [y <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>, x <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>]
    }

    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">compute_convoluted_pixel</span></span>(
        &amp;<span class="hljs-keyword">self</span>, 
        distance: <span class="hljs-built_in">usize</span>, 
        [x, y]: [<span class="hljs-built_in">usize</span>; <span class="hljs-number">2</span>]
    ) -&gt; <span class="hljs-built_in">f32</span> {
        <span class="hljs-comment">// Create new variable to hold the result of convolution</span>
        <span class="hljs-comment">// for current pixel.</span>
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> pixels_sum = <span class="hljs-number">0.0</span>;

        <span class="hljs-keyword">let</span> kernel = LinearInterpolationKernel::default();

        <span class="hljs-comment">// Iterate over relative position of pixels from the center</span>
        <span class="hljs-comment">// pixel to perform convolution with. In other words, </span>
        <span class="hljs-comment">// these are the indexes of neighbouring pixels from the</span>
        <span class="hljs-comment">// center pixel.</span>
        <span class="hljs-keyword">for</span> kernel_index_x <span class="hljs-keyword">in</span> -<span class="hljs-number">1</span>..=<span class="hljs-number">1</span> {
            <span class="hljs-keyword">for</span> kernel_index_y <span class="hljs-keyword">in</span> -<span class="hljs-number">1</span>..=<span class="hljs-number">1</span> {
                <span class="hljs-comment">// Get the computed pixel location that maps to</span>
                <span class="hljs-comment">// the current position in kernel</span>
                <span class="hljs-keyword">let</span> pixel_index = <span class="hljs-keyword">self</span>.compute_pixel_index(
                    distance,
                    [kernel_index_x, kernel_index_y],
                    [x, y]
                );

                <span class="hljs-comment">// Get the multiplicative factor (kernel value) for </span>
                <span class="hljs-comment">// this relative location from the kernel.</span>
                <span class="hljs-keyword">let</span> kernel_value = kernel.value_from_relative_index(
                    kernel_index_x,
                    kernel_index_y
                );

                <span class="hljs-comment">// Multiply the pixel value with kernel scaling</span>
                <span class="hljs-comment">// factor and add it to the pixel sum.</span>
                pixels_sum += kernel_value * <span class="hljs-keyword">self</span>.input[pixel_index];
            }
        }

        <span class="hljs-comment">// Return the value of computed pixel from convolution process.</span>
        pixels_sum
    }
}
</code></pre>
<p>We need to do computations to figure out each pixel's location based on the relative position in the kernel from the center pixel as well as ensure that the "hole size" is also being taken into consideration for the final pixel index. As you might notice, you also want to handle the boundary conditions when computing indexes.</p>
<p>I encourage to take your time here and go through the code and the comments.</p>
<h3 id="heading-implementing-the-iterator">Implementing the Iterator</h3>
<p>It's finally time to implement the <code>Iterator</code> trait for your <code>ATrousTransform</code>:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// transform.rs</span>

<span class="hljs-keyword">impl</span> <span class="hljs-built_in">Iterator</span> <span class="hljs-keyword">for</span> ATrousTransform {
    <span class="hljs-comment">// Our output is an image as well as the current level for each</span>
    <span class="hljs-comment">// iteration. The current level is an `Option` to represent the</span>
    <span class="hljs-comment">// final residue layer after the intermediary layers have been</span>
    <span class="hljs-comment">// generated.</span>
    <span class="hljs-class"><span class="hljs-keyword">type</span> <span class="hljs-title">Item</span></span> = (Array2::&lt;<span class="hljs-built_in">f32</span>&gt;, <span class="hljs-built_in">Option</span>&lt;<span class="hljs-built_in">usize</span>&gt;);

    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">next</span></span>(&amp;<span class="hljs-keyword">mut</span> <span class="hljs-keyword">self</span>) -&gt; <span class="hljs-built_in">Option</span>&lt;Self::Item&gt; {
        <span class="hljs-keyword">let</span> pixel_scale = <span class="hljs-keyword">self</span>.current_level;
        <span class="hljs-keyword">self</span>.current_level += <span class="hljs-number">1</span>;

        <span class="hljs-comment">// We've already generated all the layers. Return None to </span>
        <span class="hljs-comment">// exit the iterator.</span>
        <span class="hljs-keyword">if</span> pixel_scale &gt; <span class="hljs-keyword">self</span>.levels {
            <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>;
        }

        <span class="hljs-comment">// We've generated all intermediary layers, return the </span>
        <span class="hljs-comment">// residue layer.</span>
        <span class="hljs-keyword">if</span> pixel_scale == <span class="hljs-keyword">self</span>.levels {
            <span class="hljs-keyword">return</span> <span class="hljs-literal">Some</span>((<span class="hljs-keyword">self</span>.input.clone(), <span class="hljs-literal">None</span>))
        }

        <span class="hljs-keyword">let</span> (width, height) = (<span class="hljs-keyword">self</span>.width, <span class="hljs-keyword">self</span>.height);

        <span class="hljs-comment">// Distance between adjacent pixels for convolution (also </span>
        <span class="hljs-comment">// referred to as size of "hole").</span>
        <span class="hljs-keyword">let</span> distance = <span class="hljs-number">2_usize</span>.pow(pixel_scale <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span>);

        <span class="hljs-comment">// Create new buffer to hold the computed data for this layer.</span>
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> current_data = Array2::&lt;<span class="hljs-built_in">f32</span>&gt;::zeros((height, width));

        <span class="hljs-comment">// Iterate over each pixel location in the 2D image</span>
        <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> <span class="hljs-number">0</span>..width {
            <span class="hljs-keyword">for</span> y <span class="hljs-keyword">in</span> <span class="hljs-number">0</span>..height {
                <span class="hljs-comment">// Set the current pixel in current layer to</span>
                <span class="hljs-comment">// the result of convolution on the current</span>
                <span class="hljs-comment">// pixel in input data.</span>
                current_data[[y, x]] = <span class="hljs-keyword">self</span>.compute_convoluted_pixel(
                    distance, 
                    [x, y]
                );
            }
        }

        <span class="hljs-comment">// Create current layer by subtracting currently computed pixels </span>
        <span class="hljs-comment">// from previous layer</span>
        <span class="hljs-keyword">let</span> final_data = <span class="hljs-keyword">self</span>.input.clone() - &amp;current_data;

        <span class="hljs-comment">// Set the input layer to equal the current computed layer so </span>
        <span class="hljs-comment">// that it can be used as the "previous layer" in next iteration.</span>
        <span class="hljs-comment">// This is also our residue data for each layer.</span>
        <span class="hljs-keyword">self</span>.input = current_data;

        <span class="hljs-comment">// Return the current layer data as well as current level information.</span>
        <span class="hljs-literal">Some</span>((final_data, <span class="hljs-literal">Some</span>(<span class="hljs-keyword">self</span>.current_level)))
    }
}
</code></pre>
<p>I'm going to point out that there's a lot of potential for optimizing for performance here, but that's out of the scope of this article.</p>
<p>We'll finally look at how we can take all of these layers and reconstruct our input image.</p>
<h3 id="heading-recomposition">Recomposition</h3>
<p>As I've said previously, reconstructing an image that was decomposed with the A Trous transform is as simple as summing all of the layers together.</p>
<p>We're going to define a trait for this. Why we need a trait here should be clear once you look at the implementation.</p>
<p>Create a new file <code>recompose.rs</code> with the following contents:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// recompose.rs</span>

<span class="hljs-keyword">use</span> image::{DynamicImage, ImageBuffer, Luma};
<span class="hljs-keyword">use</span> ndarray::Array2;

<span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">trait</span> <span class="hljs-title">RecomposableLayers</span></span>: <span class="hljs-built_in">Iterator</span>&lt;Item = (Array2&lt;<span class="hljs-built_in">f32</span>&gt;, <span class="hljs-built_in">Option</span>&lt;<span class="hljs-built_in">usize</span>&gt;)&gt; {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">recompose_into_image</span></span>(
        <span class="hljs-keyword">self</span>,
        width: <span class="hljs-built_in">usize</span>,
        height: <span class="hljs-built_in">usize</span>,
    ) -&gt; DynamicImage
        <span class="hljs-keyword">where</span>
            <span class="hljs-keyword">Self</span>: <span class="hljs-built_in">Sized</span>,
    {
        <span class="hljs-comment">// Create a result buffer to hold the pixel data for our output image.</span>
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> result = Array2::&lt;<span class="hljs-built_in">f32</span>&gt;::zeros((height, width));

        <span class="hljs-comment">// For each layer, add the layer data to current value of result buffer.</span>
        <span class="hljs-keyword">for</span> layer <span class="hljs-keyword">in</span> <span class="hljs-keyword">self</span> {
            result += &amp;layer.<span class="hljs-number">0</span>;
        }

        <span class="hljs-comment">// Compute min and max pixel intensity values in the final data so that</span>
        <span class="hljs-comment">// we can perform a "rescale", which normalizes all pixel values to be</span>
        <span class="hljs-comment">// between the range of 0 &amp; 1, as is expected by float 32 images.</span>
        <span class="hljs-keyword">let</span> min_pixel = result.iter().copied().reduce(<span class="hljs-built_in">f32</span>::min).unwrap();
        <span class="hljs-keyword">let</span> max_pixel = result.iter().copied().reduce(<span class="hljs-built_in">f32</span>::max).unwrap();

        <span class="hljs-comment">// Create a new `ImageBuffer`, which is a type provided by `image` crate to</span>
        <span class="hljs-comment">// serve as buffer for pixel data of an image. Here, we're creating a new</span>
        <span class="hljs-comment">// `Luma` ImageBuffer with pixel value of type `u16`. Luma just refers to</span>
        <span class="hljs-comment">// grayscale.</span>
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> result_img: ImageBuffer&lt;Luma&lt;<span class="hljs-built_in">u16</span>&gt;, <span class="hljs-built_in">Vec</span>&lt;<span class="hljs-built_in">u16</span>&gt;&gt; =
            ImageBuffer::new(width <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span>, height <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span>);

        <span class="hljs-comment">// Pre-compute the denominator for scaling computation so that we don't</span>
        <span class="hljs-comment">// repeat this unnecessarily for every iteration.</span>
        <span class="hljs-keyword">let</span> rescale_ratio = max_pixel - min_pixel;

        <span class="hljs-comment">// Iterate over all pixels in the `ImageBuffer` and fill it based on data</span>
        <span class="hljs-comment">// from the `result` buffer after rescaling the value.</span>
        <span class="hljs-keyword">for</span> (x, y, pixel) <span class="hljs-keyword">in</span> result_img.enumerate_pixels_mut() {
            <span class="hljs-keyword">let</span> intensity = result[(y <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>, x <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>)];

            *pixel =
                Luma([((intensity - min_pixel) / rescale_ratio * <span class="hljs-built_in">u16</span>::MAX <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span>) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u16</span>]);
        }

        <span class="hljs-comment">// Convert the `ImageBuffer` into `DynamicImage` and return it</span>
        DynamicImage::ImageLuma16(result_img)
    }
}

<span class="hljs-comment">// Implement this trait for anything that implements the Iterator trait</span>
<span class="hljs-comment">// with the given item type</span>
<span class="hljs-keyword">impl</span>&lt;T&gt; RecomposableLayers <span class="hljs-keyword">for</span> T <span class="hljs-keyword">where</span> T: <span class="hljs-built_in">Iterator</span>&lt;Item = (Array2&lt;<span class="hljs-built_in">f32</span>&gt;, <span class="hljs-built_in">Option</span>&lt;<span class="hljs-built_in">usize</span>&gt;)&gt; {}
</code></pre>
<p>If you haven't noticed, since we implement this trait for a generic, this will work with any iterator, such as <code>Filter</code>, <code>Map</code>, and so on. If you didn't use a trait here, you'll have had to implement the same thing again and again for every built-in iterator type, and your code wouldn't have worked with 3rd party types.</p>
<h3 id="heading-using-the-a-trous-transform">Using the <em>À Trous</em> Transform</h3>
<p>After all of that, it's finally time to reproduce the processing that I showed you for the galaxy image with lots of noise. Create a new file <code>main.rs</code> with the following contents:</p>
<pre><code class="lang-rust"><span class="hljs-keyword">use</span> image::{DynamicImage, ImageBuffer, Luma};
<span class="hljs-keyword">use</span> atrous::recompose::RecomposableLayers;
<span class="hljs-keyword">use</span> atrous::transform::ATrousTransform;

<span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">main</span></span>() {
    <span class="hljs-comment">// Open our noisy image</span>
    <span class="hljs-keyword">let</span> image = image::open(<span class="hljs-string">"m33-noise-lum.jpg"</span>).unwrap();

    <span class="hljs-comment">// Create a new instance of the transform with 9 layers</span>
    <span class="hljs-keyword">let</span> transform = ATrousTransform::new(&amp;image, <span class="hljs-number">9</span>);

    <span class="hljs-comment">// Map over each layer</span>
    transform.map(|(<span class="hljs-keyword">mut</span> buffer, pixel_scale)| {
        <span class="hljs-comment">// Create a new image buffer to hold the pixel data. This</span>
        <span class="hljs-comment">// will be populated from the raw buffer for this layer.</span>
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> new_buffer =
            ImageBuffer::&lt;Luma&lt;<span class="hljs-built_in">u16</span>&gt;, <span class="hljs-built_in">Vec</span>&lt;<span class="hljs-built_in">u16</span>&gt;&gt;::new(buffer.ncols() <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span>, buffer.nrows() <span class="hljs-keyword">as</span> <span class="hljs-built_in">u32</span>);

        <span class="hljs-comment">// Iterate over all pixels of the `ImageBuffer` to populate it. We also</span>
        <span class="hljs-comment">// convert from `f32` pixels to `u16` pixels.</span>
        <span class="hljs-keyword">for</span> (x, y, pixel) <span class="hljs-keyword">in</span> new_buffer.enumerate_pixels_mut() {
            *pixel = Luma([(buffer[[y <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>, x <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>]] * <span class="hljs-built_in">u16</span>::MAX <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span>) <span class="hljs-keyword">as</span> <span class="hljs-built_in">u16</span>])
        }

        <span class="hljs-comment">// If the present layer is a small scale layer (&lt; 3), </span>
        <span class="hljs-comment">// perform noise reduction</span>
        <span class="hljs-keyword">if</span> pixel_scale.is_some_and(|scale| scale &lt; <span class="hljs-number">3</span>) {
            <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> image = DynamicImage::ImageLuma16(new_buffer).to_luma8();

            <span class="hljs-comment">// Bilateral filter is a de-noising filter. Apply it to the image.</span>
            image = imageproc::filter::bilateral_filter(&amp;image, <span class="hljs-number">10</span>, <span class="hljs-number">10</span>., <span class="hljs-number">3</span>.);

            <span class="hljs-comment">// Modify the raw buffer to contain the updated pixel values after</span>
            <span class="hljs-comment">// filtering.</span>
            <span class="hljs-keyword">for</span> (x, y, pixel) <span class="hljs-keyword">in</span> image.enumerate_pixels() {
                buffer[[y <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>, x <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>]] = pixel.<span class="hljs-number">0</span>[<span class="hljs-number">0</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span> / <span class="hljs-built_in">u8</span>::MAX <span class="hljs-keyword">as</span> <span class="hljs-built_in">f32</span>;
            }

            <span class="hljs-comment">// Return the updated buffer.</span>
            (buffer, pixel_scale)
        } <span class="hljs-keyword">else</span> {
            <span class="hljs-comment">// Return the unmodified buffer for larger scale layers.</span>
            (buffer, pixel_scale)
        }
    })
        <span class="hljs-comment">// Call the recomposition method on iterator</span>
        .recompose_into_image(image.width() <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>, image.height() <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>)
        <span class="hljs-comment">// Convert output to 8-bit grayscale image</span>
        .to_luma8()
        <span class="hljs-comment">// Save it to jpg file</span>
        .save(<span class="hljs-string">"noise-reduced.jpg"</span>)
        .unwrap()
}
</code></pre>
<p>You also need to add a new dependency, <code>imageproc</code>, which provides useful image processing implementations on top of the <code>image</code> crate.</p>
<pre><code class="lang-shell">cargo add imageproc
</code></pre>
<p>To make this work, we also need to modify our <code>Cargo.toml</code> to explicitly define both binary and library targets:</p>
<pre><code class="lang-toml">// Cargo.toml

<span class="hljs-section">[package]</span>
<span class="hljs-attr">name</span> = <span class="hljs-string">"atrous-rs"</span>
<span class="hljs-attr">version</span> = <span class="hljs-string">"0.1.0"</span>
<span class="hljs-attr">edition</span> = <span class="hljs-string">"2021"</span>

<span class="hljs-section">[[bin]]</span>
<span class="hljs-attr">name</span> = <span class="hljs-string">"atrous"</span>
<span class="hljs-attr">path</span> = <span class="hljs-string">"src/main.rs"</span>

<span class="hljs-section">[lib]</span>
<span class="hljs-attr">name</span> = <span class="hljs-string">"atrous"</span>
<span class="hljs-attr">path</span> = <span class="hljs-string">"src/lib.rs"</span>

<span class="hljs-comment"># See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html</span>

<span class="hljs-section">[dependencies]</span>
<span class="hljs-attr">image</span> = <span class="hljs-string">"0.25.1"</span>
<span class="hljs-attr">imageproc</span> = <span class="hljs-string">"0.24.0"</span>
<span class="hljs-attr">ndarray</span> = <span class="hljs-string">"0.15.6"</span>
</code></pre>
<p>You may download the test image from <a target="_blank" href="https://anshulsanghi-assets.s3.ap-south-1.amazonaws.com/m33-noise-lum.jpg">here</a>. Move it to the root directory of your project, and run <code>cargo run --release</code>. Once it finishes, you should have a new file <code>noise-reduced.jpg</code> as the output of our process.</p>
<p>And there we have it.</p>
<h2 id="heading-further-reading">Further Reading</h2>
<p>These are some of the resources that were very helpful to me when I was learning about this algorithm and how to use it. I highly encourage anyone who wants a more technical understanding of the algorithm to check these out.</p>
<ul>
<li><a target="_blank" href="https://www.eso.org/sci/software/esomidas/doc/user/18NOV/volb/node317.html">The <em>à trous</em> algorithm</a> </li>
<li><a target="_blank" href="https://www.pixinsight.com/doc/legacy/LE/20_wavelets/a_trous_wavelet_transform/a_trous_wavelet_transform.html">The <em>À Trous</em> Discrete Wavelet Transform In PixInsight</a></li>
<li><a target="_blank" href="https://jstarck.cosmostat.org/publications/books/book2/">Astronomical Image and Data Analysis</a> by Jean-Luc Starck and Fionn Murtagh</li>
<li><a target="_blank" href="https://jstarck.cosmostat.org/publications/books/book-2015/">Sparse Image and Signal Processing: Wavelets and Related Geometric Multiscale Analysis</a> by J.L. Starck, F. Murtagh, and J. Fadili</li>
</ul>
<p>In addition, I've created a Rust library for working with <em>À Trous</em> transform. It closely matches with what I've showed you here, but has some additional features already, and will have even more. </p>
<p>Things such as handling RGB images and working with all the 3 different kernels is already implemented. It also has better logic for handling boundary conditions, where it uses the image folding technique.</p>
<p>I'm also soon going to be working on performance improvements for the same.</p>
<p>If you want to learn more, or contribute to the library, feel free to do so. The repository can be found here: <a target="_blank" href="https://github.com/anshap1719/image-dwt">https://github.com/anshap1719/image-dwt</a>  </p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>I hope you enjoyed the journey so far. If image processing techniques or their implementation in Rust is something that interests you, then stay tuned for more as these are the topics I love writing about.</p>
<p>Also, feel free to <a target="_blank" href="mailto:nitric-brisk.0s@icloud.com"><strong>contact me</strong></a> if you have any questions or opinions on this topic.</p>
<h3 id="heading-enjoying-my-work">Enjoying my work?</h3>
<p>Consider buying me a coffee to support my work!</p>


<p>Till next time, happy coding and wishing you clear skies!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Next.js Image Tutorial – How to Upload, Crop, and Resize Images in the Browser in Next ]]>
                </title>
                <description>
                    <![CDATA[ Two of the most fundamental image editing functions are resizing and cropping. But you should do these carefully because they have the potential to degrade image quality. Cropping always includes removing a portion of the original image, resulting in... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-upload-crop-resize-images-in-the-browser-in-nextjs/</link>
                <guid isPermaLink="false">66b905da2898aa52dab670af</guid>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Next.js ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Idris Olubisi ]]>
                </dc:creator>
                <pubDate>Mon, 18 Apr 2022 17:19:23 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/04/pexels-cottonbro-5083407.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Two of the most fundamental image editing functions are resizing and cropping. But you should do these carefully because they have the potential to degrade image quality.</p>
<p>Cropping always includes removing a portion of the original image, resulting in the loss of some pixels.</p>
<p>This post will teach you how to upload, crop, and resize images in the browser.</p>
<p>I built this project in a <a target="_blank" href="https://codesandbox.io/s/serverless-leaf-vc9rls?file=/pages/index.js">Codesandbox</a>. To get started quickly, fork the <a target="_blank" href="https://codesandbox.io/s/serverless-leaf-vc9rls?file=/pages/index.js">Codesandbox</a> or run the project.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along with this tutorial, you should have some JavaScript and React.js experience. Experience with Next.js isn't a requirement, but it's nice to have.</p>
<p>You also need a <a target="_blank" href="https://cloudinary.com/users/register/free">Cloudinary account</a> to store the media files.</p>
<p><a target="_blank" href="https://cloudinary.com/documentation/image_video_and_file_upload#upload_options_overview">Cloudinary</a> offers a safe and complete API for quickly and efficiently uploading media files from the server, browser, or a mobile application.</p>
<p>Finally you'll need <a target="_blank" href="https://nextjs.org/">Next.js</a>. It's an open-source React-based front-end development web framework that allows server-side rendering and the generation of static websites and applications.</p>
<h2 id="heading-project-setup-and-installation">Project Setup and Installation</h2>
<p>Use the <code>npx create-next-app</code> command to scaffold a new project in a directory of your choice to create a new project.</p>
<p>You can do this with the command:</p>
<pre><code>npx create-next-app &lt;project name&gt;
</code></pre><p>To install the dependencies, use these commands:</p>
<pre><code>cd &lt;project name&gt; 
npm install cloudinary-react
</code></pre><p>Once the app is created, and the dependencies are installed, you'll see a message with instructions for navigating to your site and running it locally.</p>
<p>You can do this with the command:</p>
<pre><code>npm run dev
</code></pre><p>Next.js will start a hot-reloading development environment accessible by default at <code>http://localhost:3000</code>.</p>
<h2 id="heading-how-to-build-the-user-interface">How to Build the User Interface</h2>
<p>For our project, we'll want the user interface to upload, crop, and resize images on the home page. We will do this by updating the <code>pages/index.js</code> file to a component:</p>
<pre><code><span class="hljs-keyword">import</span> React, { useState } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;
<span class="hljs-keyword">import</span> Head <span class="hljs-keyword">from</span> <span class="hljs-string">"next/head"</span>;

<span class="hljs-keyword">const</span> IndexPage = <span class="hljs-function">() =&gt;</span> {

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"main"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"splitdiv"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"leftdiv"</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">h1</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"main-h1"</span>&gt;</span>
            How to Crop, Resize &amp; Upload Image in the Browser using Cloudinary
            Transformation
          <span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"leftdivcard"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">h2</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"main-h2"</span>&gt;</span>Resize Options<span class="hljs-tag">&lt;/<span class="hljs-name">h2</span>&gt;</span>
          <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>

          <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"button"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"leftbutton"</span>&gt;</span>
            Upload Image
          <span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>

        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"splitdiv"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"rightdiv"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">h1</span>&gt;</span> Image will appear here<span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/&gt;</span></span>
  );
};
<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> IndexPage;
</code></pre><p>The current user interface doesn't look that great, though. We'll add some styling with CSS in the <code>style.css</code> file like this:</p>
<pre><code>@<span class="hljs-keyword">import</span> url(<span class="hljs-string">"https://fonts.googleapis.com/css?family=Acme|Lobster"</span>);

<span class="hljs-comment">/* This allow me to have the full width of the page without the initial padding/margin*/</span>
body,
html {
  <span class="hljs-attr">margin</span>: <span class="hljs-number">0</span>;
  padding: <span class="hljs-number">0</span>;
  height: <span class="hljs-number">100</span>%;
  width: <span class="hljs-number">100</span>%;
  font-family: Acme;
  min-width: <span class="hljs-number">700</span>px;
}

.splitdiv {
  <span class="hljs-attr">height</span>: <span class="hljs-number">100</span>%;
  width: <span class="hljs-number">50</span>%;
}

<span class="hljs-comment">/* This part contains all of the left side of the screen */</span>
<span class="hljs-comment">/* ----------------------------------------- */</span>
#leftdiv {
  <span class="hljs-attr">float</span>: left;
  background-color: #fafafa;
  height: <span class="hljs-number">932</span>px;
}

#leftdivcard {
  <span class="hljs-attr">margin</span>: <span class="hljs-number">0</span> auto;
  width: <span class="hljs-number">50</span>%;
  background-color: white;
  margin-top: <span class="hljs-number">25</span>vh;
  transform: translateY(<span class="hljs-number">-50</span>%);
  box-shadow: <span class="hljs-number">10</span>px <span class="hljs-number">10</span>px <span class="hljs-number">1</span>px <span class="hljs-number">0</span>px rgba(<span class="hljs-number">78</span>, <span class="hljs-number">205</span>, <span class="hljs-number">196</span>, <span class="hljs-number">0.2</span>);
  border-radius: <span class="hljs-number">10</span>px;
}

#leftbutton {
  background-color: #<span class="hljs-number">512</span>cf3;
  border-radius: <span class="hljs-number">5</span>px;
  color: #fafafa;
  margin-left: <span class="hljs-number">350</span>px;
}

<span class="hljs-comment">/* ----------------------------------------- */</span>

<span class="hljs-comment">/* This part contains all of the right side of the screen */</span>
<span class="hljs-comment">/* ----------------------------------------- */</span>
#rightdiv {
  <span class="hljs-attr">float</span>: right;
  background-color: #cbcfcf;
  height: <span class="hljs-number">932</span>px;
}

#rightdivcard {
  <span class="hljs-attr">margin</span>: <span class="hljs-number">0</span> auto;
  width: <span class="hljs-number">50</span>%;
  margin-top: <span class="hljs-number">50</span>vh;
  transform: translateY(<span class="hljs-number">-50</span>%);
  background-position: bottom;
  background-size: <span class="hljs-number">20</span>px <span class="hljs-number">2</span>px;
  background-repeat: repeat-x;
}

<span class="hljs-comment">/* ----------------------------------------- */</span>

<span class="hljs-comment">/* Basic styling */</span>
<span class="hljs-comment">/* ----------------------------------------- */</span>

button {
  <span class="hljs-attr">outline</span>: none !important;
  font-family: Lobster;
  margin-bottom: <span class="hljs-number">15</span>px;
  border: none;
  font-size: <span class="hljs-number">20</span>px;
  padding: <span class="hljs-number">8</span>px;
  padding-left: <span class="hljs-number">20</span>px;
  padding-right: <span class="hljs-number">20</span>px;
  margin-top: <span class="hljs-number">-15</span>px;
  cursor: pointer;
}

h1 {
  font-family: Lobster;
  color: #<span class="hljs-number">512</span>cf3;
  text-align: center;
  font-size: <span class="hljs-number">40</span>px;
}

input {
  font-family: Acme;
  font-size: <span class="hljs-number">16</span>px;
  font-family: <span class="hljs-number">15</span>px;
}

input {
  <span class="hljs-attr">width</span>: <span class="hljs-number">30</span>%;
  height: <span class="hljs-number">20</span>px;
  padding: <span class="hljs-number">16</span>px;
  margin-left: <span class="hljs-number">1</span>%;
  margin-right: <span class="hljs-number">2</span>%;
  margin-top: <span class="hljs-number">15</span>px;
  margin-bottom: <span class="hljs-number">10</span>px;
  display: inline-block;
  border: none;
}

<span class="hljs-attr">input</span>:focus {
  <span class="hljs-attr">outline</span>: none !important;
  border: <span class="hljs-number">1</span>px solid #<span class="hljs-number">512</span>cf3;
  box-shadow: <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span>px round #<span class="hljs-number">719</span>ece;
}

<span class="hljs-comment">/* ----------------------------------------- */</span>

.main {
  <span class="hljs-attr">height</span>: <span class="hljs-number">100</span>%;
  width: <span class="hljs-number">100</span>%;
  display: inline-block;
}

.main-h2 {
  padding-top: <span class="hljs-number">20</span>px;
  text-align: center;
}

.body-h1 {
  padding-top: <span class="hljs-number">20</span>px;
  text-align: center;
  color: white;
}

.inner-p {
  <span class="hljs-attr">color</span>: white;
  text-align: center;
}

.main-align {
  text-align: center;
}

.form-control {
  margin-left: <span class="hljs-number">15</span>px;
}
</code></pre><p>Our application should now look like this on http://localhost:3000/:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1650105687298/eeGTDWFHA.png" alt="How to Upload, Crop, &amp; Resize Image in the Browser in Next.js" width="3360" height="1876" loading="lazy"></p>
<h2 id="heading-how-to-create-the-image-upload-widget">How to Create the Image Upload Widget</h2>
<p>Cloudinary's upload widget lets us upload media assets from multiple sources, including Dropbox, Facebook, Instagram, and images that were taken right from our device's camera. We'll use the upload widget in this project.</p>
<p>Create a free cloudinary account to obtain your cloud name and upload_preset. </p>
<p><code>upload_presets</code> allows us to define a set of asset upload choices centrally rather than providing them in each upload call. A Cloudinary <code>cloud name</code> is a unique identifier associated with your Cloudinary account.</p>
<p>First, from a content delivery network (CDN), we will add the Cloudinary widget's JavaScript file in our <code>index.js</code> located in <code>pages/index.js.</code> We will include this file using <code>next/head</code> to include all meta tags, which lets us add data to the Head portion of our HTML document in React.</p>
<p>Next, in the <code>pages/index.js</code> file, we'll import Head from next/head and add the script file.</p>
<pre><code><span class="hljs-keyword">import</span> React, { useState } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;
<span class="hljs-keyword">import</span> Head <span class="hljs-keyword">from</span> <span class="hljs-string">"next/head"</span>;

<span class="hljs-keyword">const</span> IndexPage = <span class="hljs-function">() =&gt;</span> {

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">Head</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">title</span>&gt;</span>How to Crop and Resize Image in the Browser<span class="hljs-tag">&lt;/<span class="hljs-name">title</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">link</span> <span class="hljs-attr">rel</span>=<span class="hljs-string">"icon"</span> <span class="hljs-attr">href</span>=<span class="hljs-string">"/favicon.ico"</span> /&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">meta</span> <span class="hljs-attr">charSet</span>=<span class="hljs-string">"utf-8"</span> /&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">script</span>
          <span class="hljs-attr">src</span>=<span class="hljs-string">"https://widget.Cloudinary.com/v2.0/global/all.js"</span>
          <span class="hljs-attr">type</span>=<span class="hljs-string">"text/javascript"</span>
        &gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">Head</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"main"</span>&gt;</span>
          [...]
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/&gt;</span></span>
  );
};
<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> IndexPage;
</code></pre><p>In the <code>pages/index.js</code> file, we will create an instance of the widget in a method triggered when clicking a button and a state variable <code>imagePublicId.</code>.</p>
<pre><code><span class="hljs-keyword">import</span> React, { useState } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;
<span class="hljs-keyword">import</span> Head <span class="hljs-keyword">from</span> <span class="hljs-string">"next/head"</span>;

<span class="hljs-keyword">const</span> IndexPage = <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-keyword">const</span> [imagePublicId, setImagePublicId] = useState(<span class="hljs-string">""</span>);

  <span class="hljs-keyword">const</span> openWidget = <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-comment">// create the widget</span>
    <span class="hljs-keyword">const</span> widget = <span class="hljs-built_in">window</span>.cloudinary.createUploadWidget(
      {
        <span class="hljs-attr">cloudName</span>: <span class="hljs-string">"olanetsoft"</span>,
        <span class="hljs-attr">uploadPreset</span>: <span class="hljs-string">"w42epls7"</span>
      },
      <span class="hljs-function">(<span class="hljs-params">error, result</span>) =&gt;</span> {
        <span class="hljs-keyword">if</span> (
          result.event === <span class="hljs-string">"success"</span> &amp;&amp;
          result.info.resource_type === <span class="hljs-string">"image"</span>
        ) {
          <span class="hljs-built_in">console</span>.log(result.info);
          setImagePublicId(result.info.public_id);
        }
      }
    );
    widget.open(); <span class="hljs-comment">// open up the widget after creation</span>
  };

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;&gt;</span>
      //...
    <span class="hljs-tag">&lt;/&gt;</span></span>
  );
};
<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> IndexPage;
</code></pre><p>The widget requires our Cloudinary <code>cloud_name</code> and <code>uploadPreset.</code> The <code>createWidget()</code> function creates a new upload widget. On successfully uploading an image, we assign the <code>public_id</code> of the asset to the relevant state variable.</p>
<p>To get our <code>cloudname</code> and <code>uploadPreset,</code> we follow the steps below:</p>
<p>You can get the cloud name from your Cloudinary dashboard, as shown below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1650106671153/wjBrA3_m0.png" alt="How to Upload, Crop, &amp; Resize Image in the Browser in Next.js" width="3360" height="1368" loading="lazy"></p>
<p>You can find an upload preset in the <code>Upload</code> tab of your Cloudinary settings page. You access this by clicking on the gear icon in the top right corner of the dashboard page.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1650106901391/73lFzuxLQ.png" alt="How to Upload, Crop, &amp; Resize Image in the Browser in Next.js" width="2969" height="232" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1650106814185/GqnIFsNYS.png" alt="How to Upload, Crop, &amp; Resize Image in the Browser in Next.js" width="2653" height="738" loading="lazy"></p>
<p>Scroll down to the bottom of the page to the upload presets section, where you'll see your upload preset or the option to create one if you don't have any.</p>
<p>We'll proceed to call the <code>openWidget</code> function in the <code>onClick</code> handler of our image upload button, as shown below:</p>
<pre><code><span class="hljs-comment">//...</span>

<span class="hljs-keyword">const</span> IndexPage = <span class="hljs-function">() =&gt;</span> {
<span class="hljs-comment">//...</span>
  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;&gt;</span>
     //....
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"main"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"splitdiv"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"leftdiv"</span>&gt;</span>
          //...
          <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"leftdivcard"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">h2</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"main-h2"</span>&gt;</span>Resize Options<span class="hljs-tag">&lt;/<span class="hljs-name">h2</span>&gt;</span>
             //...
            <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>

          <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"button"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"leftbutton"</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{openWidget}</span>&gt;</span>
            Upload Image
          <span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>

        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"splitdiv"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"rightdiv"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">h1</span>&gt;</span> Image will appear here<span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/&gt;</span></span>
  );
};
<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> IndexPage;
</code></pre><p>When we open our app in the browser and click the <code>Upload Image</code> button, we should see something like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1650111448538/pglrS-Exs.png" alt="How to Upload, Crop, &amp; Resize Image in the Browser in Next.js" width="3360" height="1881" loading="lazy"></p>
<h2 id="heading-how-to-implement-custom-transformation-functions">How to Implement Custom Transformation Functions</h2>
<p>We need to create a component that handles the transformation depending on the props passed to it. We will create a <code>components/</code> directory in the root folder. Inside it, we will create a file called <code>image.js</code> with the following content:</p>
<pre><code><span class="hljs-keyword">import</span> { CloudinaryContext, Transformation, Image } <span class="hljs-keyword">from</span> <span class="hljs-string">"cloudinary-react"</span>;

<span class="hljs-keyword">const</span> TransformImage = <span class="hljs-function">(<span class="hljs-params">{ crop, image, width, height }</span>) =&gt;</span> {
  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">CloudinaryContext</span> <span class="hljs-attr">cloudName</span>=<span class="hljs-string">"olanetsoft"</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">Image</span> <span class="hljs-attr">publicId</span>=<span class="hljs-string">{image}</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">Transformation</span> <span class="hljs-attr">width</span>=<span class="hljs-string">{width}</span> <span class="hljs-attr">height</span>=<span class="hljs-string">{height}</span> <span class="hljs-attr">crop</span>=<span class="hljs-string">{crop}</span> /&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">Image</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">CloudinaryContext</span>&gt;</span></span>
  );
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> TransformImage;
</code></pre><p>In the code snippet above, we imported <code>CloudinaryContext</code>, a wrapper Cloudinary component used to manage shared information across all its children Cloudinary components. The rendered <code>TransformImage</code> component takes data of the image transformation as props.</p>
<p>The above code block will render the uploaded image when we import it into <code>pages/index.js</code>:</p>
<pre><code><span class="hljs-comment">//...</span>
<span class="hljs-keyword">import</span> TransformImage <span class="hljs-keyword">from</span> <span class="hljs-string">"../components/image"</span>;

<span class="hljs-keyword">const</span> IndexPage = <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-keyword">const</span> [imagePublicId, setImagePublicId] = useState(<span class="hljs-string">""</span>);
  <span class="hljs-keyword">const</span> [alt, setAlt] = useState(<span class="hljs-string">""</span>);
  <span class="hljs-keyword">const</span> [crop, setCrop] = useState(<span class="hljs-string">"scale"</span>);
  <span class="hljs-keyword">const</span> [height, setHeight] = useState(<span class="hljs-number">200</span>);
  <span class="hljs-keyword">const</span> [width, setWidth] = useState(<span class="hljs-number">200</span>);

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;&gt;</span>
     //...
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"main"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"splitdiv"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"leftdiv"</span>&gt;</span>
          //...
       <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"splitdiv"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"rightdiv"</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">h1</span>&gt;</span> Image will appear here<span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"rightdivcard"</span>&gt;</span>
            {imagePublicId ? (
              <span class="hljs-tag">&lt;<span class="hljs-name">TransformImage</span>
                <span class="hljs-attr">crop</span>=<span class="hljs-string">{crop}</span>
                <span class="hljs-attr">image</span>=<span class="hljs-string">{imagePublicId}</span>
                <span class="hljs-attr">width</span>=<span class="hljs-string">{width}</span>
                <span class="hljs-attr">height</span>=<span class="hljs-string">{height}</span>
              /&gt;</span>
            ) : (
              <span class="hljs-tag">&lt;<span class="hljs-name">h1</span>&gt;</span> Image will appear here<span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>
            )}
          <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/&gt;</span></span>
  );
};
<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> IndexPage;
</code></pre><p>Next, we will add the <code>Resize Options</code> radio button so that we can select different resize and crop options with the following code snippet:</p>
<pre><code><span class="hljs-comment">//...</span>

<span class="hljs-keyword">const</span> IndexPage = <span class="hljs-function">() =&gt;</span> {
<span class="hljs-comment">//...</span>

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;&gt;</span>
    //...
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"main"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"splitdiv"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"leftdiv"</span>&gt;</span>
          //...
          <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"leftdivcard"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">h2</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"main-h2"</span>&gt;</span>Resize Options<span class="hljs-tag">&lt;/<span class="hljs-name">h2</span>&gt;</span>

          <span class="hljs-tag">&lt;<span class="hljs-name">label</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"form-control"</span>&gt;</span>Select Crop Type<span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
              <span class="hljs-tag">&lt;<span class="hljs-name">label</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"form-control"</span>&gt;</span>Scale<span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>
              <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
                <span class="hljs-attr">type</span>=<span class="hljs-string">"radio"</span>
                <span class="hljs-attr">value</span>=<span class="hljs-string">"scale"</span>
                <span class="hljs-attr">name</span>=<span class="hljs-string">"crop"</span>
                <span class="hljs-attr">onChange</span>=<span class="hljs-string">{(event)</span> =&gt;</span> setCrop(event.target.value)}
              /&gt;
            <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
              <span class="hljs-tag">&lt;<span class="hljs-name">label</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"form-control"</span>&gt;</span>Crop<span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>
              <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
                <span class="hljs-attr">type</span>=<span class="hljs-string">"radio"</span>
                <span class="hljs-attr">value</span>=<span class="hljs-string">"crop"</span>
                <span class="hljs-attr">name</span>=<span class="hljs-string">"crop"</span>
                <span class="hljs-attr">onChange</span>=<span class="hljs-string">{(event)</span> =&gt;</span> setCrop(event.target.value)}
              /&gt;
            <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
              <span class="hljs-attr">type</span>=<span class="hljs-string">"number"</span>
              <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Height"</span>
              <span class="hljs-attr">onChange</span>=<span class="hljs-string">{(event)</span> =&gt;</span> setHeight(event.target.value)}
            /&gt;
            <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
              <span class="hljs-attr">type</span>=<span class="hljs-string">"number"</span>
              <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Width"</span>
              <span class="hljs-attr">onChange</span>=<span class="hljs-string">{(event)</span> =&gt;</span> setWidth(event.target.value)}
            /&gt;
          <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>

          <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"button"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"leftbutton"</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{openWidget}</span>&gt;</span>
            Upload Image
          <span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>

        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"splitdiv"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"rightdiv"</span>&gt;</span>
          //...
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/&gt;</span></span>
  );
};
<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> IndexPage;
</code></pre><p>In the code snippet above, we:</p>
<ul>
<li>Added crop type and also width and height options</li>
<li>Added an <code>onChange</code> property to keep track of the changes in the height and width input field, respectively</li>
</ul>
<p>Our application's final output should look similar to what we have below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1650112568692/2htjubfOv.png" alt="How to Upload, Crop, &amp; Resize Image in the Browser in Next.js" width="3360" height="1882" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1650112581661/JnEP--CHC.png" alt="How to Upload, Crop, &amp; Resize Image in the Browser in Next.js" width="3360" height="1874" loading="lazy"></p>
<p>Here's the GitHub Repository for the project if you want to have a look at the full code: <a target="_blank" href="https://github.com/Olanetsoft/how-to-upload-crop-and-resize-images-in-the-browser-in-next.js">https://github.com/Olanetsoft/how-to-upload-crop-and-resize-images-in-the-browser-in-next.js</a></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>This post shows how to upload, crop, and resize images in the browser in Next.js.</p>
<h2 id="heading-resources">Resources</h2>
<p> You may find these resources helpful.</p>
<ul>
<li><a target="_blank" href="https://cloudinary.com/documentation/transformation_reference">Cloudinary transformation URL reference</a></li>
<li><a target="_blank" href="https://cloudinary.com/documentation/image_transformations">Cloudinary Image Transformation</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to generate product images for Amazon, Instagram, Zalando, and Tmall ]]>
                </title>
                <description>
                    <![CDATA[ By Anton Garcia Diaz Millions of people have already shifted from traditional tv to online content, and from traditional malls to online stores. Because of this, e-commerce and marketing teams need to deploy and maintain strong online presences for t... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/generation-of-product-images-for-amazon-zalando-tmall-instagram-asos/</link>
                <guid isPermaLink="false">66d45d9c052ad259f07e4a61</guid>
                
                    <category>
                        <![CDATA[ image ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image optimization  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ responsive images ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Tue, 26 Nov 2019 15:54:54 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2019/11/123brand.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Anton Garcia Diaz</p>
<p>Millions of people have already shifted from traditional tv to online content, and from traditional malls to online stores. Because of this, e-commerce and marketing teams need to deploy and maintain strong online presences for their businesses. </p>
<p>This usually means running the brand's own online store and having a presence in different marketplaces that cover different regions and population segments. The never-ending list of possible marketplaces in which to showcase, promote, and sell products just gets longer and longer.</p>
<p>To make matters worse, different marketplaces have different requirements and restrictions on images, which can add a burden for the devops and marketing teams. It's also a source of inconsistency in the public image of a brand.</p>
<p>Here, we'll review the main aspects to consider when setting up a clean pipeline for the seamless production of omnichannel images.</p>
<h2 id="heading-a-single-master-image-through-a-single-pipeline">A single master image through a single pipeline</h2>
<p>To simplify workflows and keep them sustainable, a good practice is to apply the principles of omnichannel to images. This basically means to set a single, easy to configure pipeline for the creation of variants, from the same master or pristine images. Under this approach, we can use <strong>the same product image for every channel</strong>. </p>
<p>Our pipeline should receive master images and produce the derivatives needed to feed the marketplaces. At a minimum, it should cope with a workflow like this.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://store.abraia.me/05bf471cbb3f9fa9ed785718e6f60e28/product-images-for-amazon-zalando-tmall-lamoda-ssg/generation-of-variants/index.html">https://store.abraia.me/05bf471cbb3f9fa9ed785718e6f60e28/product-images-for-amazon-zalando-tmall-lamoda-ssg/generation-of-variants/index.html</a></div>
<p>Of course, a front-end and cloud storage are not necessary. The pipeline may just work by watching a hot folder and creating the variants as master images land there. We'll also take a look at this.</p>
<h2 id="heading-image-transformation-and-optimization">Image transformation and optimization</h2>
<p>Each web channel has its own web design and layout. As for images, this means different and specific aspect ratios. Besides, each marketplace usually has an image policy in place, which limits the resolution and the weight of the image and sets the admissible image format. Usually, it also specifies other style guidelines.</p>
<p>Let's look at the main operations we'll want to accomplish with our pipeline.</p>
<h3 id="heading-resizing-cropping-padding">Resizing, cropping, padding</h3>
<p>To change the aspect ratio of an image we may crop it or pad it. To get a squared image from a vertical one we may cut out the upper and bottom parts or we may fill in the left and right sides with white stripes. </p>
<p>There are open source tools – like ImageMagick – that allow you to perform these operations effectively. Resizing an image with ImageMagick to limit its maximum dimensions to 800 px is as simple as this:</p>
<pre><code>convert input.jpg -resize <span class="hljs-number">800</span>x800 resized.jpg
</code></pre><p>This instruction respects the aspect ratio. If the original image is not squared, then the resized image has one dimension lower than 800 px.  Let's say the image is vertical and we want it for Tmall, which requires a squared image of 800x800 px. Then we may pad it like this:</p>
<pre><code>convert resized.jpg  -gravity center -extent <span class="hljs-number">800</span>x800 padded.jpg
</code></pre><p>Also, we may simply crop it to fit the dimensions:</p>
<pre><code>convert input.jpg -gravity Center  -crop <span class="hljs-number">800</span>x800+<span class="hljs-number">0</span>+<span class="hljs-number">0</span> +repage crop.jpg
</code></pre><p>While some marketplaces like Tmall encourage padding images with white stripes and branding them with logos to use them in category pages, others like Amazon or Lamoda forbid this practice. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/11/crop_pad.png" alt="Image" width="600" height="400" loading="lazy">
<em>Cropping (left), Resizing (center), Resizing and padding (right)</em></p>
<p>When we pad an image to match the aspect ratio, we don’t risk cropping out important parts. In fact, padding is a trick to keep the aspect ratio unchanged. However, the risk is real when we crop the image. </p>
<p>So, it is a good practice to ensure in the studio that we comply with some composition requirements set by each channel. We should produce master images with a view of the product compatible with the different aspect ratios that we'll deliver.</p>
<h3 id="heading-smart-cropping">Smart cropping</h3>
<p>There are algorithms inspired by human attention and aesthetic perception that provide an enhanced protection against bad automatic crops. In the next example, with smart image cropping (white line) we avoid cutting the face unlike a simple center cropping (red line) would do.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/11/smart-cropping-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Example of smart cropping with a <a target="_blank" href="https://abraia.me/workflows/">cloud service</a> vs center cropping</em></p>
<p>This option is available in some cloud services. If we're going to use it, we should verify that it works properly for us because many solutions only use an attention map and do not consider aesthetic aspects. Usually, choosing a number of representative images, making some tests with them, and finally verifying the results is enough to get a good grasp.</p>
<h3 id="heading-overlaying-logos-and-text">Overlaying logos and text</h3>
<p>We may also need to add our brand logo or to add a message to the image by overlaying a vector graphic or a text. Moreover, in many cases we need some content localization strategy in place – like tailoring discounts and language to a market region. Sticking to our example, with ImageMagick we can overlay text on a padded image.</p>
<pre><code>convert -fill black -pointsize <span class="hljs-number">70</span> -gravity center -draw <span class="hljs-string">"rotate -90 text 0,-330 'MyBrandHere'"</span> padded.jpg padded-<span class="hljs-keyword">with</span>-brand.jpg
</code></pre><p>Once we configure it for one image, we may apply it to any other with the same dimensions. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/11/brand.png" alt="Image" width="600" height="400" loading="lazy">
<em>Examples of batch image branding using <a target="_blank" href="https://abraia.me/workflows/">Abraia's cloud service</a></em></p>
<p>Otherwise, handling typographies and different settings may end being tricky in workflows with certain complexity. In this regard, a <a target="_blank" href="https://abraia.me/workflows/">cloud service</a> usually provides a front-end to make the configuration intuitive and fast, and more convenient to handle. It also deals with other stuff like typographies or quality preservation in image recompression. </p>
<h2 id="heading-the-workflows">The workflows</h2>
<p>There are many ways to deploy an image processing pipeline. Depending on the flow rate of images, we may need to support different types of workflows.  </p>
<h3 id="heading-batch-processing">Batch processing</h3>
<p>In the most simple case – when the flow rate is low – a batch image processing solution may be enough. With ImageMagick, we can use <em>mogrify</em> (instead of convert) to process all the images inside a folder. </p>
<p>In certain cases, like image versions with a text in different languages, we may need to code a script, but that's not a big deal either. To make it even easier, we may use a cloud batch processing tool in which we drop images and it gives us back all the variants we need, like in the video at the beginning of the post.</p>
<h3 id="heading-hot-folders">Hot folders</h3>
<p>For in-house deployments where we need something more than simple batch image processing, the use of hot folders may be a good option. In this case, we should set a worker that watches a folder. Any time an image lands the folder the watcher triggers the process that creates all the variants we need.</p>
<p>In this regard, Gulp comes very handy to implement a folder watching pipeline. <a target="_blank" href="https://github.com/abraia/workflows">This GitHub repository brings a ready-to-use implementation of hot folder</a> based on Gulp. It allows us to transform images using Abraia's cloud service or optimize them using Imagemin (an open source solution). Once installed, the watcher is easily started with just one command in the terminal.</p>
<pre><code>$ gulp
</code></pre><p>This video shows the process at work.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://store.abraia.me/05bf471cbb3f9fa9ed785718e6f60e28/product-images-for-amazon-zalando-tmall-lamoda-ssg/hot-folder-gulp/index.html">https://store.abraia.me/05bf471cbb3f9fa9ed785718e6f60e28/product-images-for-amazon-zalando-tmall-lamoda-ssg/hot-folder-gulp/index.html</a></div>
<h3 id="heading-full-cloud">Full cloud</h3>
<p>Cloud services usually bring the most flexible and fast-to-deploy solution. Still, there are different ways to go full cloud. In the most simple approach from a user perspective, an image management and optimization service takes charge of the transformation. It also manages the delivery to end users (through a CDN) or other web channels like marketplaces and social networks. The user only needs to upload the master images and to configure the transformations, usually with an intuitive graphic interface.</p>
<p>In medium to large companies that manage their own cloud, services from different providers are usually combined. In this case, we are likely to have to manage private and public buckets. We can have a service accessing a bucket, creating the variants, and delivering the resources or just returning them to a different bucket. </p>
<p>Also, a cloud pipeline may be partially implemented in-house. In this case we have endless possibilities. However, such development effort only makes sense when no service complies with the requirements and there is a justified need for a tailored solution.</p>
<h2 id="heading-summary">Summary</h2>
<p>Studio shooting and photography retouching are time-consuming and costly operations. Being able to use the same master material everywhere is very important to keep times and cost under control.</p>
<p>We have reviewed the main aspects of a complete pipeline in charge of creating image variants. On one hand, we looked at the transformations you need to perform from resizing, cropping, or padding, to the overlay of texts and graphics. On the other, we looked at the workflows to implement from batch processing to hot folders or full cloud solutions. We have reviewed some important open source resources (like ImageMagick or Gulp) that make it possible to implement a pipeline you develop yourself. </p>
<p>In the end, there are two main factors to consider when deciding whether to use an in-house or a cloud service. First, you must evaluate your willingness to take on the development effort. Second, you need to decide what features you require, from an easy to use interface for the configuration of variants to advanced features like smart cropping.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to build an image type convertor in six lines of Python ]]>
                </title>
                <description>
                    <![CDATA[ By AMR One of the advantage of being a programmer is your ability to build utility tools to improve your life. Unlike a non-programmer, you are probably not spending hours digging through multiple Google search result pages to find a tool that, in th... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-an-image-type-convertor-in-six-lines-of-python-d63c3c33d1db/</link>
                <guid isPermaLink="false">66c350311283974fd2bb0775</guid>
                
                    <category>
                        <![CDATA[ coding ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Wed, 13 Mar 2019 16:26:01 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*qzCM-NW3YK5gYx0tXTMJgQ.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By AMR</p>
<p>One of the advantage of being a programmer is your ability to build utility tools to improve your life. Unlike a non-programmer, you are probably not spending hours digging through multiple Google search result pages to find a tool that, in the first place, was supposed to improve your productivity (<em>irony wins</em>). This likely makes you feel more powerful knowing a programming language — especially if that Programming language is as versatile and awesome as Python is.</p>
<p>One of the points in the <a target="_blank" href="https://www.python.org/dev/peps/pep-0020/#id3">The Zen of Python</a> says:</p>
<blockquote>
<p>Simple is better than complex.</p>
</blockquote>
<p>With this philosophy in place, a lot of niche tool development using Python can be done so succinctly that it makes me wonder if it’s worth calling it a tool at all. Sometimes the word <code>script</code> would be more accurate. Either way, we’re setting out here to build one such <code>script</code> that converts images from one file format (image type) to another — in just 6 lines of Python code.</p>
<blockquote>
<p><em>Disclaimer: The number of lines (6) excludes empty lines and comments</em></p>
</blockquote>
<p>In this tutorial, we’re going to build an image type convertor that converts a PNG image to a JPG image. Before your grey matter cells are rushing to judge whether I’m crazy to build this tool, let me say that this is not just for one image — but for all the images inside a folder. That’d definitely require more manual effort to do without coding <em>(I know you can smell <code>bash</code> ing).</em></p>
<h4 id="heading-python-package">Python Package</h4>
<p>We’re going to use the Python package <code>PIL</code> (which stands for Python Image Library) for this purpose. The original <code>PIL</code> didn’t get any updates for the latest Python version, so some good souls have created <a target="_blank" href="https://python-pillow.org/">a friendly fork called <code>Pillow</code></a> that supports even &gt; Python 3.0.</p>
<p>Install it using <code>pip3 install Pillow</code>.</p>
<h4 id="heading-beginning-script"><strong>Beginning Script</strong></h4>
<p>There are two primary sections in this code. The first section is where we import the required packages, and the second section is where the actual operation happens. The actual operation can be further broken down as follows:</p>
<ul>
<li>Iterate through all the files with the given extension — in our case <code>.png</code> — and repeat all the following:</li>
<li>Open the image file (as an image file)</li>
<li>Convert the image file to a different format ( <code>RGB</code> )</li>
<li>Finally save the file — with the new extension <code>.jpg</code></li>
</ul>
<p><strong>Lines 1 and 2:</strong></p>
<pre><code><span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image  # Python Image Library - Image Processing
</code></pre><pre><code><span class="hljs-keyword">import</span> glob
</code></pre><p>This section just imports the required packages. <code>PIL</code> for Image Processing and <code>glob</code> for iterating through files of the given folder in the OS.</p>
<p><strong>Lines 3–6:</strong></p>
<pre><code># based on SO Answer: https:<span class="hljs-comment">//stackoverflow.com/a/43258974/5086335</span>
</code></pre><pre><code><span class="hljs-keyword">for</span> file <span class="hljs-keyword">in</span> glob.glob(<span class="hljs-string">"*.png"</span>):
</code></pre><pre><code> im = Image.open(file)
</code></pre><pre><code> rgb_im = im.convert(<span class="hljs-string">'RGB'</span>)
</code></pre><pre><code> rgb_im.save(file.replace(<span class="hljs-string">"png"</span>, <span class="hljs-string">"jpg"</span>), quality=<span class="hljs-number">95</span>)
</code></pre><h4 id="heading-fin">FIN</h4>
<p>So that’s the end of our tool! You can save these 6 lines as a <code>.py</code> file and then invoke them in your computer where you’ve got images to convert.</p>
<h4 id="heading-further-development">Further Development</h4>
<p>If you are planning on to improve this script further, you can convert this entire script into a Command Line Interface Tool — then all these details like <code>File Format</code> and <code>Folder Path</code> can be given as arguments thus extending its power further.</p>
<h4 id="heading-references"><strong>References</strong></h4>
<ul>
<li>The complete code used here is available on <a target="_blank" href="https://github.com/amrrs/py_img_convertor">my github</a></li>
<li><a target="_blank" href="https://www.python.org/dev/peps/pep-0020/#id3">Zen of Python</a></li>
<li><a target="_blank" href="https://python-pillow.org/">Pillow</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to use DeepLab in TensorFlow for object segmentation using Deep Learning ]]>
                </title>
                <description>
                    <![CDATA[ By Beeren Sahu Modifying the DeepLab code to train on your own dataset for object segmentation in images _Photo by [Unsplash](https://unsplash.com/photos/FmD8tIkf8bo?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText" rel="noopener" ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-deeplab-in-tensorflow-for-object-segmentation-using-deep-learning-a5777290ab6b/</link>
                <guid isPermaLink="false">66c355a1df235c0b59e2533a</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ TensorFlow ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Mon, 24 Sep 2018 20:20:58 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*mfz-HW5TIBU0AvprtApydQ.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Beeren Sahu</p>
<h4 id="heading-modifying-the-deeplab-code-to-train-on-your-own-dataset-for-object-segmentation-in-images">Modifying the DeepLab code to train on your own dataset for object segmentation in images</h4>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*mfz-HW5TIBU0AvprtApydQ.jpeg" alt="Image" width="800" height="533" loading="lazy">
_Photo by [Unsplash](https://unsplash.com/photos/FmD8tIkf8bo?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText" rel="noopener" target="_blank" title=""&gt;Nick Karvounis on &lt;a href="https://unsplash.com/search/photos/images?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText" rel="noopener" target="<em>blank" title=")</em></p>
<p>I work as a Research Scientist at <a target="_blank" href="http://www.flixstock.com/">FlixStock</a>, focusing on Deep Learning solutions to generate and/or edit images. We identify coherent regions belonging to various objects in an image using Semantic Segmentation.</p>
<p><a target="_blank" href="https://arxiv.org/abs/1706.05587">DeepLab</a> is an ideal solution for Semantic Segmentation. The code is available in TensorFlow.</p>
<p>In this article, I will be sharing how we can train a DeepLab semantic segmentation model for our own data-set in TensorFlow. But before we begin…</p>
<h3 id="heading-what-is-deeplab">What is DeepLab?</h3>
<p><a target="_blank" href="https://arxiv.org/abs/1706.05587">DeepLab</a> is one of the most promising techniques for <strong>semantic image segmentation</strong> with Deep Learning. Semantic segmentation is understanding an image at the pixel level, then assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.</p>
<h3 id="heading-installation">Installation</h3>
<p>DeepLab implementation in TensorFlow is available on GitHub <a target="_blank" href="https://github.com/tensorflow/models/tree/master/research/deeplab">here</a>.</p>
<h3 id="heading-preparing-dataset">Preparing Dataset</h3>
<p>Before you create your own dataset and train DeepLab, you should be very clear about what you want to want to do with it. Here are the two scenarios:</p>
<ul>
<li>Training the model from scratch: you are free to have any number of classes of objects (number of labels) for segmentation. This needs a very long time for training.</li>
<li>Use the pre-trained model: you are free to have any number of classes of objects for segmentation. Use the pre-trained model and only update your classifier weights with transfer learning. This will take far less time for training compared to the prior scenario.</li>
</ul>
<p>Let us name your new dataset as “PQR”. Create a new folder “PQR” as: <code>tensorflow/models/research/deeplab/datasets/PQR</code>.</p>
<p>To start, all you need is input images and their pre-segmented images as ground-truth for training. Input images need to be color images and the segmented images need to be color indexed images. Refer to the PASCAL dataset.</p>
<p>Create a folder named “dataset” inside “PQR”. It should have the following directory structure:</p>
<pre><code>+ dataset    -JPEGImages    -SegmentationClass    -ImageSets+ tfrecord
</code></pre><h4 id="heading-jpegimages">JPEGImages</h4>
<p>It contains all the input color images in <code>*.jpg</code> format.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/0*M5PBchudNjWPqxPP.jpg" alt="Image" width="500" height="335" loading="lazy">
<em>A sample input image from PASCAL VOC dataset</em></p>
<h4 id="heading-segmentationclass">SegmentationClass</h4>
<p>This folder contains all the semantic segmentation annotations images for each of the color input images, which is the ground truth for the semantic segmentation.</p>
<p>These images should be color indexed. Each color index represents a unique class (with unique color) known as a color map.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/0*OjQiFBSrKsYnZzGS." alt="Image" width="800" height="103" loading="lazy">
_Sample Color Map [source: [https://github.com/DrSleep/tensorflow-deeplab-resnet](https://github.com/DrSleep/tensorflow-deeplab-resnet" rel="noopener" target="<em>blank" title=")]</em></p>
<p><strong>Note:</strong> Files in the “SegmentationClass” folder should have the same name as in the “JPEGImage” folder for corresponding image-segmentation file pair.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/0*L3AEyxEId0-95rRq.png" alt="Image" width="500" height="335" loading="lazy">
<em>A sample semantic segmentation ground truth image from PASCAL VOC dataset</em></p>
<h4 id="heading-imagesets">ImageSets</h4>
<p>This folder contains:</p>
<ul>
<li>train.txt: list of image names for the training set</li>
<li>val.txt: list of image names for the validation set</li>
<li>trainval.txt: list of image names for training + validation set</li>
</ul>
<p>Sample <code>*.txt</code> file looks something like this:</p>
<pre><code>pqr_000032pqr_000039pqr_000063pqr_000068pqr_000121
</code></pre><h4 id="heading-remove-the-color-map-in-the-ground-truth-annotations">Remove the color-map in the ground truth annotations</h4>
<p>If your segmentation annotation images are RGB images instead of color indexed images. Here is a Python script that will be of help.</p>
<p>Here, the palette defines the “RGB:LABEL” pair. In this sample code (0,0,0):0 is background and (255,0,0):1 is the foreground class. Note, the new_label_dir is the location where the raw segmentation data is stored.</p>
<p>Next, the task is to convert the image dataset to a TensorFlow record. Make a new copy of the script file<code>./dataset/download_and_convert_voc2012.sh</code> as <code>./dataset/convert_pqr.sh</code>. Below is the modified script.</p>
<p>The converted dataset will be saved at <code>./deeplab/datasets/PQR/tfrecord</code></p>
<h4 id="heading-defining-the-dataset-description">Defining the dataset description</h4>
<p>Open the file <strong>segmentation_dataset.py</strong> present in the <strong>research/deeplab/datasets/</strong> folder. Add the following code segment defining the description for your PQR dataset.</p>
<pre><code>_PQR_SEG_INFORMATION = DatasetDescriptor(    splits_to_sizes={        <span class="hljs-string">'train'</span>: <span class="hljs-number">11111</span>, # number <span class="hljs-keyword">of</span> file <span class="hljs-keyword">in</span> the train folder        <span class="hljs-string">'trainval'</span>: <span class="hljs-number">22222</span>,        <span class="hljs-string">'val'</span>: <span class="hljs-number">11111</span>,    },    num_classes=<span class="hljs-number">2</span>, # number <span class="hljs-keyword">of</span> classes <span class="hljs-keyword">in</span> your dataset    ignore_label=<span class="hljs-number">255</span>, # white edges that will be ignored to be <span class="hljs-class"><span class="hljs-keyword">class</span>)</span>
</code></pre><p>Make the following changes as shown bellow:</p>
<pre><code>_DATASETS_INFORMATION = {    <span class="hljs-string">'cityscapes'</span>: _CITYSCAPES_INFORMATION,    <span class="hljs-string">'pascal_voc_seg'</span>: _PASCAL_VOC_SEG_INFORMATION,    <span class="hljs-string">'ade20k'</span>: _ADE20K_INFORMATION,    <span class="hljs-string">'pqr'</span>: _PQR_SEG_INFORMATION}
</code></pre><h3 id="heading-training">Training</h3>
<p>In order to train the model on your dataset, you need to run the train.py file in the <strong>research/deeplab/</strong> folder. So, we have written a script file train-pqr.sh to do the task for you.</p>
<p>Here, we have used xception_65 for your local training. You can specify the number of training iterations to the variable NUM_ITERATIONS. and set “ — tf_initial_checkpoint” to the location where you have downloaded or pre-trained the model *.ckpt. After training, the final trained model can be found in the TRAIN_LOGDIR directory.</p>
<p>Finally, run the above script from the …/research/deeplab directory.</p>
<pre><code># sh ./train-pqr.sh
</code></pre><p>Voilà! You have successfully trained DeepLab on your dataset.</p>
<p>In the coming months, I will be sharing more of my experiences with Images &amp; Deep Learning. Stay tuned and don’t forget to spare some claps if you like this article. It will encourage me immensely.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Image Augmentation: Make it rain, make it snow. How to modify photos to train self-driving cars ]]>
                </title>
                <description>
                    <![CDATA[ By Ujjwal Saxena Image Augmentation is a technique for taking an image and using it to generating new ones. It’s useful for doing things like training a self-driving car. Think of a person driving a car on a sunny day. If it starts raining, they may ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/image-augmentation-make-it-rain-make-it-snow-how-to-modify-a-photo-with-machine-learning-163c0cb3843f/</link>
                <guid isPermaLink="false">66c357bccf1314a450f0d6b8</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ self-driving cars ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Mon, 09 Apr 2018 04:02:55 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*WIFnuUgYya_oEEGrx650DQ.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Ujjwal Saxena</p>
<p>Image Augmentation is a technique for taking an image and using it to generating new ones. It’s useful for doing things like training a self-driving car.</p>
<p>Think of a person driving a car on a sunny day. If it starts raining, they may initially find it difficult to drive in rain. But slowly they get accustomed to it.</p>
<p>An artificial neural network too finds it confusing to drive in a new environment unless it has seen it earlier. Their are various augmentation techniques like flipping, translating, adding noise, or changing color channel.</p>
<p>In this article, I’ll explore the weather part of this. I used the <strong>OpenCV</strong> library for processing images. I found it pretty easy after a while, and was able to introduce various weather scenarios into an image.</p>
<p>I’ve pushed a fully implemented <strong>Jupyter Notebook</strong> you can play with on <a target="_blank" href="https://github.com/ujjwalsaxena">GitHub</a>.</p>
<p>Lets’ have a look.</p>
<p>I’ll first show you an original test image and will then augment it.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/DPVOfe-5jaoOME91KftyK1dlzvHu2FYMyzrO" alt="Image" width="800" height="456" loading="lazy"></p>
<h3 id="heading-sunny-and-shady"><strong>Sunny and Shady</strong></h3>
<p>After adding random sunny and shady effect, the image’s brightness changes. This is an easy and quick transformation to perform.</p>
<pre><code>def add_brightness(image):    image_HLS = cv2.cvtColor(image,cv2.COLOR_RGB2HLS) ## Conversion to HLS    image_HLS = np.array(image_HLS, dtype = np.float64)     random_brightness_coefficient = np.random.uniform()+<span class="hljs-number">0.5</span> ## generates value between <span class="hljs-number">0.5</span> and <span class="hljs-number">1.5</span>    image_HLS[:,:,<span class="hljs-number">1</span>] = image_HLS[:,:,<span class="hljs-number">1</span>]*random_brightness_coefficient ## scale pixel values up or down <span class="hljs-keyword">for</span> channel <span class="hljs-number">1</span>(Lightness)    image_HLS[:,:,<span class="hljs-number">1</span>][image_HLS[:,:,<span class="hljs-number">1</span>]&gt;<span class="hljs-number">255</span>]  = <span class="hljs-number">255</span> ##Sets all values above <span class="hljs-number">255</span> to <span class="hljs-number">255</span>    image_HLS = np.array(image_HLS, dtype = np.uint8)    image_RGB = cv2.cvtColor(image_HLS,cv2.COLOR_HLS2RGB) ## Conversion to RGB    <span class="hljs-keyword">return</span> image_RGB
</code></pre><p>The brightness of an image can be changed by changing the pixel values of “Lightness”- channel 1 of image in HLS color space. Converting the image back to RGB gives the same image with enhanced or suppressed lighting.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/tny-s9fdRMRzn1zfmy1e3OIK82csRqGJ5Yv1" alt="Image" width="800" height="456" loading="lazy">
<em>Sunny</em></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/D-cHi--aKE1HWME2vjtrkshXS8JIbAvljuOx" alt="Image" width="800" height="456" loading="lazy">
<em>Shady</em></p>
<h3 id="heading-shadows"><strong>Shadows</strong></h3>
<p>To a car, a shadow is nothing but the dark portions of an image, which can also be bright at times. So a self-driving car should always learn to drive with or without shadows. Randomly changing brightness on the hills or in the woods often boggle a car’s perception if not trained properly. This is even more prevalent on sunny days and differently tall buildings in a city, allowing beams of light to peep through.</p>
<p>Brightness is good for perception but uneven, sudden or too much brightness create perception issues. Let’s generate some fake shadows.</p>
<pre><code>def generate_shadow_coordinates(imshape, no_of_shadows=<span class="hljs-number">1</span>):    vertices_list=[]    <span class="hljs-keyword">for</span> index <span class="hljs-keyword">in</span> range(no_of_shadows):        vertex=[]        <span class="hljs-keyword">for</span> dimensions <span class="hljs-keyword">in</span> range(np.random.randint(<span class="hljs-number">3</span>,<span class="hljs-number">15</span>)): ## Dimensionality <span class="hljs-keyword">of</span> the shadow polygon            vertex.append(( imshape[<span class="hljs-number">1</span>]*np.random.uniform(),imshape[<span class="hljs-number">0</span>]<span class="hljs-comment">//3+imshape[0]*np.random.uniform()))        vertices = np.array([vertex], dtype=np.int32) ## single shadow vertices         vertices_list.append(vertices)    return vertices_list ## List of shadow vertices</span>
</code></pre><pre><code>def add_shadow(image,no_of_shadows=<span class="hljs-number">1</span>):    image_HLS = cv2.cvtColor(image,cv2.COLOR_RGB2HLS) ## Conversion to HLS    mask = np.zeros_like(image)     imshape = image.shape    vertices_list= generate_shadow_coordinates(imshape, no_of_shadows) #<span class="hljs-number">3</span> getting list <span class="hljs-keyword">of</span> shadow vertices    <span class="hljs-keyword">for</span> vertices <span class="hljs-keyword">in</span> vertices_list:         cv2.fillPoly(mask, vertices, <span class="hljs-number">255</span>) ## adding all shadow polygons on empty mask, single <span class="hljs-number">255</span> denotes only red channel        image_HLS[:,:,<span class="hljs-number">1</span>][mask[:,:,<span class="hljs-number">0</span>]==<span class="hljs-number">255</span>] = image_HLS[:,:,<span class="hljs-number">1</span>][mask[:,:,<span class="hljs-number">0</span>]==<span class="hljs-number">255</span>]*<span class="hljs-number">0.5</span>   ## <span class="hljs-keyword">if</span> red channel is hot, image<span class="hljs-string">'s "Lightness" channel'</span>s brightness is lowered     image_RGB = cv2.cvtColor(image_HLS,cv2.COLOR_HLS2RGB) ## Conversion to RGB    <span class="hljs-keyword">return</span> image_RGB
</code></pre><p>OpenCV’s <code>fillPoly()</code> function is really handy in this case. Let’s create some random vertices and impose the polygon on an empty mask using <code>fillPoly()</code>. Having done this, the only thing left to do is to check the mask for hot pixels and reduce the “Lightness” in the HLS image wherever these hot pixels are found.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/uUsWjNO5bi7SPGP6DsfUdmtY-onV4tblz7eG" alt="Image" width="800" height="456" loading="lazy">
<em>Random shadow polygon on the road</em></p>
<h3 id="heading-snow"><strong>Snow</strong></h3>
<p>Well this is something new. We often wonder how would our vehicle behave on snowy roads. One way to test that is to get pics of snow clad roads or do something on the images to get a similar effect. This effect is not a complete alternative to snowy roads, but it’s an approach worth trying.</p>
<pre><code>def add_snow(image):    image_HLS = cv2.cvtColor(image,cv2.COLOR_RGB2HLS) ## Conversion to HLS    image_HLS = np.array(image_HLS, dtype = np.float64)     brightness_coefficient = <span class="hljs-number">2.5</span>     snow_point=<span class="hljs-number">140</span> ## increase <span class="hljs-built_in">this</span> <span class="hljs-keyword">for</span> more snow    image_HLS[:,:,<span class="hljs-number">1</span>][image_HLS[:,:,<span class="hljs-number">1</span>]&lt;snow_point] = image_HLS[:,:,<span class="hljs-number">1</span>][image_HLS[:,:,<span class="hljs-number">1</span>]&lt;snow_point]*brightness_coefficient ## scale pixel values up <span class="hljs-keyword">for</span> channel <span class="hljs-number">1</span>(Lightness)    image_HLS[:,:,<span class="hljs-number">1</span>][image_HLS[:,:,<span class="hljs-number">1</span>]&gt;<span class="hljs-number">255</span>]  = <span class="hljs-number">255</span> ##Sets all values above <span class="hljs-number">255</span> to <span class="hljs-number">255</span>    image_HLS = np.array(image_HLS, dtype = np.uint8)    image_RGB = cv2.cvtColor(image_HLS,cv2.COLOR_HLS2RGB) ## Conversion to RGB    <span class="hljs-keyword">return</span> image_RGB
</code></pre><p>Yup! That’s it. This code generally whitens the darkest parts of the image, which are mostly roads, trees, mountains and other landscape features, using the same HLS “Lightness” increase method used in the other approaches above. This technique doesn’t work well for dark images, but you can modify it to do so. Here’s what you get:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/6ZAXeTp2IK9QmJN8f9hUTqwrVMWNbVuOSANj" alt="Image" width="800" height="456" loading="lazy">
<em>winter is here</em></p>
<p>You can tweak some parameters in the code for more or less snow than this. I have tested this on other images too, and this technique gives me chills.</p>
<h3 id="heading-rain"><strong>Rain</strong></h3>
<p>Yes, you heard that right. Why not rain? When humans experience difficulty driving in rain, why should vehicles be spared from that? In fact, this is one of the situations for which I want my self-driving car to be trained the most. Slippery roads and blurred visions are risky, and cars should know how to handle them.</p>
<pre><code>def generate_random_lines(imshape,slant,drop_length):    drops=[]    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">1500</span>): ## If You want heavy rain, <span class="hljs-keyword">try</span> increasing <span class="hljs-built_in">this</span>        <span class="hljs-keyword">if</span> slant&lt;<span class="hljs-number">0</span>:            x= np.random.randint(slant,imshape[<span class="hljs-number">1</span>])        <span class="hljs-keyword">else</span>:            x= np.random.randint(<span class="hljs-number">0</span>,imshape[<span class="hljs-number">1</span>]-slant)        y= np.random.randint(<span class="hljs-number">0</span>,imshape[<span class="hljs-number">0</span>]-drop_length)        drops.append((x,y))    <span class="hljs-keyword">return</span> drops            def add_rain(image):        imshape = image.shape    slant_extreme=<span class="hljs-number">10</span>    slant= np.random.randint(-slant_extreme,slant_extreme)     drop_length=<span class="hljs-number">20</span>    drop_width=<span class="hljs-number">2</span>    drop_color=(<span class="hljs-number">200</span>,<span class="hljs-number">200</span>,<span class="hljs-number">200</span>) ## a shade <span class="hljs-keyword">of</span> gray    rain_drops= generate_random_lines(imshape,slant,drop_length)        <span class="hljs-keyword">for</span> rain_drop <span class="hljs-keyword">in</span> rain_drops:        cv2.line(image,(rain_drop[<span class="hljs-number">0</span>],rain_drop[<span class="hljs-number">1</span>]),(rain_drop[<span class="hljs-number">0</span>]+slant,rain_drop[<span class="hljs-number">1</span>]+drop_length),drop_color,drop_width)    image= cv2.blur(image,(<span class="hljs-number">7</span>,<span class="hljs-number">7</span>)) ## rainy view are blurry        brightness_coefficient = <span class="hljs-number">0.7</span> ## rainy days are usually shady     image_HLS = cv2.cvtColor(image,cv2.COLOR_RGB2HLS) ## Conversion to HLS    image_HLS[:,:,<span class="hljs-number">1</span>] = image_HLS[:,:,<span class="hljs-number">1</span>]*brightness_coefficient ## scale pixel values down <span class="hljs-keyword">for</span> channel <span class="hljs-number">1</span>(Lightness)    image_RGB = cv2.cvtColor(image_HLS,cv2.COLOR_HLS2RGB) ## Conversion to RGB    <span class="hljs-keyword">return</span> image_RGB
</code></pre><p>What I did here is that again I generated random points all over the image and then used the OpenCV’s <code>line()</code> function to generate small lines all over the image. I have also used a random slant in the rain drops to have a feel of actual rain. I have also reduced image’s brightness because rainy days are usually shady, and also blurry because of the rain. You can change the dimension of your blur filter and the number of rain drops for desired effect.</p>
<p>Here is the result:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/buoiIOE1acHFNb6-nFEqEMGH7Tlq7fE82mEV" alt="Image" width="800" height="456" loading="lazy">
<em>Fake rain but not much blur</em></p>
<h3 id="heading-fog"><strong>Fog</strong></h3>
<p>This is yet another scenario that hampers the vision of a self-driving car a lot. Blurry white fluff in the image makes it very difficult to see beyond a certain stretch and reduces the sharpness in the image.</p>
<p>Fog intensity is an important parameter to train a car for how much throttle it should give. For coding such a function, you can take random patches from all over the image, and increase the image’s lightness within those patches. With a simple blur, this gives a nice hazy effect.</p>
<pre><code>def add_blur(image, x,y,hw):    image[y:y+hw, <span class="hljs-attr">x</span>:x+hw,<span class="hljs-number">1</span>] = image[y:y+hw, <span class="hljs-attr">x</span>:x+hw,<span class="hljs-number">1</span>]+<span class="hljs-number">1</span>    image[:,:,<span class="hljs-number">1</span>][image[:,:,<span class="hljs-number">1</span>]&gt;<span class="hljs-number">255</span>]  = <span class="hljs-number">255</span> ##Sets all values above <span class="hljs-number">255</span> to <span class="hljs-number">255</span>    image[y:y+hw, <span class="hljs-attr">x</span>:x+hw,<span class="hljs-number">1</span>] = cv2.blur(image[y:y+hw, <span class="hljs-attr">x</span>:x+hw,<span class="hljs-number">1</span>] ,(<span class="hljs-number">10</span>,<span class="hljs-number">10</span>))    <span class="hljs-keyword">return</span> image
</code></pre><pre><code>def generate_random_blur_coordinates(imshape,hw):    blur_points=[]    midx= imshape[<span class="hljs-number">1</span>]<span class="hljs-comment">//2-hw-100    midy= imshape[0]//2-hw-100    index=1    while(midx&gt;-100 or midy&gt;-100): ## radially generating coordinates        for i in range(250*index):            x= np.random.randint(midx,imshape[1]-midx-hw)            y= np.random.randint(midy,imshape[0]-midy-hw)            blur_points.append((x,y))        midx-=250*imshape[1]//sum(imshape)        midy-=250*imshape[0]//sum(imshape)        index+=1    return blur_points    def add_fog(image):    image_HLS = cv2.cvtColor(image,cv2.COLOR_RGB2HLS) ## Conversion to HLS    mask = np.zeros_like(image)     imshape = image.shape    hw=100    image_HLS[:,:,1]=image_HLS[:,:,1]*0.8    haze_list= generate_random_blur_coordinates(imshape,hw)    for haze_points in haze_list:         image_HLS[:,:,1][image_HLS[:,:,1]&gt;255]  = 255 ##Sets all values above 255 to 255        image_HLS= add_blur(image_HLS, haze_points[0],haze_points[1], hw) ## adding all shadow polygons on empty mask, single 255 denotes only red channel    image_RGB = cv2.cvtColor(image_HLS,cv2.COLOR_HLS2RGB) ## Conversion to RGB    return image_RGB</span>
</code></pre><p>Coding this was the hardest of all the functions above. I have tried a radial approach to generate patches here. Since on a foggy day usually most of the fog is at the far end of the road and as we approach near, vision keeps clearing itself.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/Wb0JBy40QWvfm65-n0GHn24MOBHXnSrPRRzt" alt="Image" width="800" height="456" loading="lazy">
<em>Foggy Highway</em></p>
<p>It’s a real difficult task for a machine to detect nearby cars and lanes in such a foggy condition, and is a good way to train and test the robustness of the driving model.</p>
<h3 id="heading-torrential-rain">Torrential rain</h3>
<p>I thought of making the rain part a little better by combining fog and rain. As there is always some haze during rains and it’s good to train the car for that also. There’s no new function is required for this. We can achieve the effect by sequentially calling both.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/MzAyhI05YhGfg9hN7Adb-40MM2iZ3pCtqvtn" alt="Image" width="800" height="456" loading="lazy"></p>
<p>The car on the right is barely visible in this image, and this is a real world scenario. We can hardly make out anything on the road in heavy rain.</p>
<p>I hope this article will help you train the model in various weather conditions. For my complete code, you can visit my <a target="_blank" href="https://github.com/UjjwalSaxena">GitHub profile</a>. And I’ve written a lot of other articles, which you can read on <a target="_blank" href="https://medium.com/@er.ujjwalsaxena">Medium</a> and on my <a target="_blank" href="https://erujjwalsaxena.wordpress.com/">WordPress site</a>.</p>
<p>Enjoy!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Simple Image Recognition System with TensorFlow (Part 2) ]]>
                </title>
                <description>
                    <![CDATA[ By Wolfgang Beyer This is the second part of my introduction to building an image recognition system with TensorFlow. In the first part we built a softmax classifier to label images from the CIFAR-10 dataset. We achieved an accuracy of around 25–30%.... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-simple-image-recognition-system-with-tensorflow-part-2-c83348b33bce/</link>
                <guid isPermaLink="false">66c34fec9972b7c5c7624eba</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ TensorFlow ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Mon, 02 Jan 2017 20:40:16 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*D3S-dXQ28R0S74ERw1OfoA.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Wolfgang Beyer</p>
<p>This is the second part of my introduction to building an image recognition system with TensorFlow. In <a target="_blank" href="http://www.wolfib.com/Image-Recognition-Intro-Part-1/">the first part</a> we built a softmax classifier to label images from the CIFAR-10 dataset. We achieved an accuracy of around 25–30%. Since there are 10 different and equally likely categories, labeling the images randomly we’d expect an accuracy of 10%. So we’re already a lot better than random, but there’s still plenty of room for improvement.</p>
<p>In this post, I’ll describe how to build a neural network that performs the same task. Let’s see by how much we can increase our prediction accuracy!</p>
<h3 id="heading-neural-networks">Neural Networks</h3>
<p>Neural networks are very loosely based on how biological brains work. They consist of a number of artificial neurons which each process multiple incoming signals and return a single output signal. The output signal can then be used as an input signal for other neurons.</p>
<p>Let’s take a look at an individual neuron:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/0*klAp2u32X2gBoOud.png" alt="Image" width="627" height="462" loading="lazy">
<em>An artificial neuron. Its output is the result of the ReLU function of a weighted sum of its inputs.</em></p>
<p>What happens in a single neuron is very similar to what happens in the the softmax classifier. Again we have a vector of input values and a vector of weights. The weights are the neuron’s internal parameters. Both input vector and weights vector contain the same number of values, so we can use them to calculate a weighted sum.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*Pb8y5alv3tV-JfH5zWErJA.png" alt="Image" width="419" height="34" loading="lazy"></p>
<p>So far, we’re doing exactly the same calculation as in the softmax classifier, but now comes a little twist: as long as the result of the weighted sum is a positive value, the neuron’s output is this value. But if the weighted sum is a negative value, we ignore that negative value and the neuron generates an output of 0 instead. This operation is called a Rectified Linear Unit (ReLU).</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/0*SWKY4_QAAlb1VzfM.png" alt="Image" width="626" height="350" loading="lazy">
<em>Rectified Linear Unit, which is defined by f(x) = max(0, x)</em></p>
<p>The reason for using a ReLU is that this creates a nonlinearity. The neuron’s output is now not strictly a linear combination (= weighted sum) of its inputs anymore. We’ll see why this is useful when we stop looking at individual neurons and instead look at the whole network.</p>
<p>The neurons in artificial neural networks are usually not connected randomly to each other. Most of the time they are arranged in layers:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*a9ZvzYEuDwG7VhQaItzkMw.png" alt="Image" width="296" height="356" loading="lazy">
_An artificial neural network with 2 layers, a hidden and an output layer. The input is not considered a layer, since it just feeds the data (without transforming it) to the first proper layer. <br>(Image is part of the [here](https://commons.wikimedia.org/wiki/Main_Page" rel="noopener" target="_blank" title=""&gt;Wikimedia Commons and was taken from &lt;a href="https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg" rel="noopener" target="<em>blank" title="))</em></p>
<p>The input image’s pixel values are the inputs for the network’s first layer of neurons. The output of the neurons in layer 1 is the input for neurons of layer 2 and so forth. This is the reason why having a nonlinearity is so important. Without the ReLU at each layer, we would only have a sequence of weighted sums. And stacked weighted sums can be merged into a single weighted sum, so the multiple layers would give us no improvement over a single layer network. Introducing the ReLU nonlinearity solves this problem as each additional layer really adds something to the network.</p>
<p>The network’s final layer’s output are the values we are interested in, the scores for the image categories. In this network architecture each neuron is connected to all neurons of the previous layer, therefore this kind of network is called a fully connected network. As we shall see in Part 3 of this Tutorial, that is not necessarily always the case.</p>
<p>And that’s already the end of my very brief part on the theory of neural networks. Let’s get started building one!</p>
<h3 id="heading-the-code">The Code</h3>
<p>The full code for this example is <a target="_blank" href="https://github.com/wolfib/image-classification-CIFAR10-tf">available on Github</a>. It requires TensorFlow and the CIFAR-10 dataset (see <a target="_blank" href="http://www.wolfib.com/Image-Recognition-Intro-Part-1/#prerequisites">Part 1</a>) on how to install the prerequisites).</p>
<p>If you’ve made your way through my previous blog post, you’ll see that the code for the neural network classifier is pretty similar to the code for the softmax classifier. But in addition to switching out the part of the code that defines the model, I’ve added a couple of small features to show some of the things TensorFlow can do:</p>
<ul>
<li>Regularization: this is a very common technique to prevent overfitting of a model. It works by applying a counter-force during the optimization process which aims to keep the model simple.</li>
<li>Visualization of the model with TensorBoard: TensorBoard is included with TensorFlow and allows you to generate charts and graphs from your models and from data generated by your models. This helps with analyzing your models and is especially useful for debugging.</li>
<li>Checkpoints: this feature allows you to save the current state of your model for later use. Training a model can take quite a while, so it’s essential to not have to start from scratch each time you want to use it.</li>
</ul>
<p>The code is split into two files this time: there’s <code>two_layer_fc.py</code>, which defines the model, and <code>run_fc_model.py</code>, which runs the model (in case you’re wondering: ‘fc’ stands for fully connected).</p>
<h3 id="heading-2-layer-fully-connected-neural-network">2-Layer Fully Connected Neural Network</h3>
<p>Let’s look at the model itself first and deal with running and training it later. <code>two_layer_fc.py</code> contains the following functions:</p>
<ul>
<li><code>inference()</code> gets us from input data to class scores.</li>
<li><code>loss()</code> calculates the loss value from class scores.</li>
<li><code>training()</code> performs a single training step.</li>
<li><code>evaluation()</code> calculates the accuracy of the network.</li>
</ul>
<h3 id="heading-generating-class-scores-inference">Generating Class Scores: <code>inference()</code></h3>
<p><code>inference()</code> describes the forward pass through the network. How are the class scores calculated, starting from input images?</p>
<p>The <code>images</code> parameter is the TensorFlow placeholder containing the actual image data. The next three parameters describe the shape/size of the network. <code>image_pixels</code> is the number of pixels per input image, <code>classes</code> is the number of different output labels and <code>hidden_units</code> is the number of neurons in the first/hidden layer of our network.</p>
<p>Each neuron takes all values from the previous layer as input and generates a single output value. Each neuron in the hidden layer therefore has <code>image_pixels</code> inputs and the layer as a whole generates <code>hidden_units</code> outputs. These are then fed into the <code>classes</code> neurons of the output layer which generate <code>classes</code> output values, one score per class.</p>
<p><code>reg_constant</code> is the regularization constant. TensorFlow allows us to add regularization to our network very easily by handling most of the calculations automatically. I’ll go into a bit more detail when we get to the <a target="_blank" href="http://www.wolfib.com/Image-Recognition-Intro-Part-2/#loss_function">loss function</a>.</p>
<p>Since our neural network has 2 similar layers, we’ll define a separate scope for each. This allows us to reuse variable names in each scope. The <code>biases</code> variable is defined in the way we already know, by using <code>tf.Variable()</code>.</p>
<p>The definition of the <code>weights</code> variable is a bit more involved. We use <code>tf.get_variable()</code>, which allows us to add regularization. <code>weights</code> is a matrix with dimensions of <code>image_pixels</code> by <code>hidden_units</code> (input vector size x output vector size). The <code>initializer</code> parameter describes the <code>weight</code> variable’s initial values.</p>
<p>Up to now, we’ve initialized our variables to 0, but this wouldn’t work here. Think about the neurons in a single layer. They all receive exactly the same input values. If they all had the same internal parameters as well, they would all make the same calculation and all output the same value. To avoid this, we need to randomize their initial weights. We use an initialization scheme which usually works well, the weights are initialized to normally distributed values. We drop values which are more than 2 standard deviations from the mean, and the standard deviation is set to the inverse of the square root of the number of input pixels. Luckily TensorFlow handles all these details for us, we just need to specify that we want to use a <code>truncated_normal_initializer</code> which does exactly what we want.</p>
<p>The final parameter for the <code>weights</code> variable is the <code>regularizer</code>. All we have to do at this point is to tell TensorFlow we want to use L2-regularization for the <code>weights</code> variable. I’ll cover regularization <a target="_blank" href="http://www.wolfib.com/Image-Recognition-Intro-Part-2/#regularization">here</a>.</p>
<p>To create the first layer’s output we multiply the <code>images</code> matrix and the <code>weights</code> matrix witch each other and add the <code>bias</code> variable. This is exactly the same as in the softmax classifier from the <a target="_blank" href="http://www.wolfib.com/Image-Recognition-Intro-Part-1/">previous blog post</a>. Then we apply <code>tf.nn.relu()</code>, the ReLU function to arrive at the hidden layer’s output.</p>
<p>Layer 2 is very similar to layer 1. The number of inputs is <code>hidden_units</code>, the number of outputs is <code>classes</code>. Therefore the dimensions of the <code>weights</code> matrix are <code>[hidden_units, classes]</code>. Since this is the final layer of our network, there’s no need for a ReLU anymore. We arrive at the class scores (<code>logits</code>) by multiplying input (<code>hidden</code>) and <code>weights</code> with each other and adding <code>bias</code>.</p>
<p>The summary operation <code>tf.histogram_summary()</code> allows us to record the value of the <code>logits</code> variable for later analysis with TensorBoard. I’ll cover this <a target="_blank" href="http://www.wolfib.com/Image-Recognition-Intro-Part-2/#tensorboard">later</a>.</p>
<p>To sum it up, the <code>inference()</code> function as whole takes in input images and returns class scores. That’s all a trained classifier needs to do, but in order to arrive at a trained classifier, we first need to measure how good those class scores are. That’s the job of the loss function.</p>
<h3 id="heading-calculating-the-loss-loss">Calculating the Loss: <code>loss()</code></h3>
<p>First we calculate the cross-entropy between <code>logits</code>(the model’s output) and <code>labels</code>(the correct labels from the training dataset). That has been our whole loss function for the softmax classifier, but this time we want to use regularization, so we have to add another term to our loss.</p>
<p>Let’s take a step back first and look at what we want to achieve by using regularization.</p>
<h3 id="heading-overfitting-and-regularization">Overfitting and Regularization</h3>
<p>When a statistical model captures the random noise in the data it was trained on instead of the true underlying relationship, this is called overfitting.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/0*HJ-uiODNvji6bXPd.png" alt="Image" width="480" height="480" loading="lazy">
_The red and blue circles represent two different classes. The green line represents an overfitted model whereas the black line represents a model with a good fit. <br>(Image is part of the [here](https://commons.wikimedia.org/wiki/Main_Page" rel="noopener" target="_blank" title=""&gt;Wikimedia Commons and was taken from &lt;a href="https://en.wikipedia.org/wiki/File:Overfitting.svg" rel="noopener" target="<em>blank" title="))</em></p>
<p>In the above image there are two different classes, represented by the blue and red circles. The green line is an overfitted classifier. It follows the training data perfectly, but it is also heavily dependent on it and is likely to handle unseen data worse than the black line, which represents a regularized model.</p>
<p>So our goal for regularization is to arrive at a simple model without any unnecessary complications. There are different ways to achieve this, and the option we are choosing is called L2-regularization. L2-regularization adds the sum of the squares of all the weights in the network to the loss function. This corresponds to a heavy penalty if the model is using big weights and a small penalty if the model is using small weights.</p>
<p>That’s why we used the <code>regularizer</code> parameter when defining the weights and assigned a <code>l2_regularizer</code> to it. This tells TensorFlow to keep track of the L2-regularization terms (and weigh them by the parameter <code>reg_constant</code>) for this variable. All regularization terms are added to a collection called <code>tf.GraphKeys.REGULARIZATION_LOSSES</code>, which the loss function accesses. We then add the sum of all regularization losses to the previously calculated cross-entropy to arrive at the total loss of our model.</p>
<h3 id="heading-optimizing-the-variables-training">Optimizing the Variables: <code>training()</code></h3>
<p><code>global_step</code> is a scalar variable which keeps track of how many training iterations have already been performed. When repeatedly running the model in our training loop, we already know this value. It’s the iteration variable of the loop. The reason we’re adding this value directly to the TensorFlow graph is that we want to be able to take snapshots of the model. And these snapshots should include information about how many training steps have already been performed.</p>
<p>The definition of the gradient descent optimizer is simple. We provide the learning rate and tell the optimizer which variable it is supposed to minimize. In addition, the optimizer automatically increments the <code>global_step</code> parameter with every iteration.</p>
<h3 id="heading-measuring-performance-evaluation">Measuring Performance: <code>evaluation()</code></h3>
<p>The calculation of the model’s accuracy is the same as in the softmax case: we compare the model’s predictions with true labels and calculate the frequency of how often the prediction is correct. We’re also interested in how the accuracy evolves over time, so we’re adding a summary operation which keeps track of the value of <code>accuracy</code>. We’ll cover this in the <a target="_blank" href="http://www.wolfib.com/Image-Recognition-Intro-Part-2/#tensorboard">section about TensorBoard</a>.</p>
<p>To summarize what we have done so far, we have defined the behavior of a 2-layer artificial neural network using 4 functions: <code>inference()</code> constitutes the forward pass through the network and returns class scores. <code>loss()</code> compares predicted and true class scores and generates a loss value. <code>training()</code> performs a training step and optimizes the model’s internal parameters and <code>evaluation()</code> measures the performance of our model.</p>
<h3 id="heading-running-the-neural-network">Running the Neural Network</h3>
<p>Now that the neural network is defined, let’s look at how <code>run_fc_model.py</code> runs, trains and evaluates the model.</p>
<p>After the obligatory imports we’re defining the model parameters as external flags. TensorFlow has its own module for command line parameters, which is a thin wrapper around <a target="_blank" href="https://docs.python.org/3/library/argparse.html">Python’s <code>argparse</code></a>. We’re using it here for convenience, but you can just as well use <code>argparse</code> directly instead.</p>
<p>In the first couple of lines, the various command line parameters are being defined. The parameters for each flag are the flag’s name, its default value and a short description. Executing the file with the <code>-h</code> flag displays these descriptions.</p>
<p>The second block of lines calls the function which actually parses the command line parameters. Then the values of all parameters are printed to the screen.</p>
<p>Here we define constants for the number of pixels per image (32 x 32 x 3) and the number of different image categories. Then we start measuring the runtime by creating a timer.</p>
<p>We want to log some info about the training process and use TensorBoard to display that info. TensorBoard requires the logs for each run to be in a separate directory, so we’re adding date and time info to the name of the log directory.</p>
<p><code>load_data()</code> loads the CIFAR-10 data and returns a dictionary containing separate training and test datasets.</p>
<h3 id="heading-generate-the-tensorflow-graph">Generate the TensorFlow Graph</h3>
<p>We’re defining TensorFlow placeholders. When performing the actual calculations, these will be filled with training/testing data.</p>
<p>The <code>images_placeholder</code> has dimensions of batch size x pixels per image. A batch size of ‘None’ allows us to run the graph with different batch sizes (the batch size for training the net can be set via a command line parameter, but for testing we’re passing the whole test set as a single batch).</p>
<p>The <code>labels_placeholder</code> is a vector of integer values containing the correct class label, one per image in the batch.</p>
<p>Here we’re referencing the functions we covered earlier in <code>two_layer_fc.py</code>.</p>
<ul>
<li><code>inference()</code> gets us from input data to class scores.</li>
<li><code>loss()</code> calculates a loss value from class scores.</li>
<li><code>training()</code> performs a single training step.</li>
<li><code>evaluation()</code> calculates the accuracy of the network.</li>
</ul>
<p>Defines a summary operation for TensorBoard (covered <a target="_blank" href="http://www.wolfib.com/Image-Recognition-Intro-Part-2/#tensorboard">here</a>).</p>
<p>Generates a <code>saver</code> object to save the model’s state at checkpoints (covered <a target="_blank" href="http://www.wolfib.com/Image-Recognition-Intro-Part-2/#saver">here</a>).</p>
<p>We start the TensorFlow session and immediately initialize all variables. Then we create a summary writer which we will use to periodically save log information to disk.</p>
<p>These lines are responsible for generating batches of input data. Let’s pretend we have 100 training images and a batch size of 10. In the softmax example we just picked 10 random images for each iteration. This means that after 10 iterations each image will have been picked once on average(!). But in fact some images will have been picked multiple times while some images haven’t been part of any batch so far. As long as you repeat this often enough, it’s not that terrible that randomness causes some images to be part of the training batches somewhat more often than others.</p>
<p>But this time we want to improve the sampling process. What we do is we first shuffle the 100 images of the training dataset. The first 10 images of the shuffled data are our first batch, the next 10 images are our second batch and so forth. After 10 batches we’re at the end of our dataset and the process starts again. We shuffle the data another time and run through it from front to back. This guarantees that no image is being picked more often than any other while still ensuring that the order in which the images are returned is random.</p>
<p>In order to achieve this, the <code>gen_batch()</code> function in <code>data_helpers()</code> returns a Python <code>generator</code>, which returns the next batch each time it is evaluated. The details of how generators work are beyond the scope of this post (a good explanation can be found <a target="_blank" href="https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/">here</a>). We’re using the Python’s built-in <code>zip()</code> function to generate a list of tuples of the from <code>[(image1, label1), (image2, label2), ...]</code>, which is then passed to our generator function.</p>
<p><code>next(batches)</code> returns the next batch of data. Since it’s still in the form of <code>[(imageA, labelA), (imageB, labelB), ...]</code>, we need to <a target="_blank" href="https://docs.python.org/2/library/functions.html#zip">unzip</a> it first to separate images from labels, before filling <code>feed_dict</code>, the dictionary containing the TensorFlow placeholders, with a single batch of training data.</p>
<p>Every 100 iterations the model’s current accuracy is evaluated and printed to the screen. In addition, the <code>summary</code> operation is being run and its results are added to the <code>summary_writer</code> which is responsible for writing the summaries to disk. From there they can be read and displayed by TensorBoard (see <a target="_blank" href="http://www.wolfib.com/Image-Recognition-Intro-Part-2/#tensorboard">this section</a>).</p>
<p>This line runs the <code>train_step</code> operation (defined previously to call <code>two_layer_fc.training()</code>, which contains the actual instructions for the optimization of the variables).</p>
<p>When training a model takes a longer period of time, there is an easy way to save a snapshot of your progress. This allows you to come back later and restore the model in exactly the same state. All you need to do is to create a <code>tf.train.Saver</code> object (we did that earlier) and then call its <code>save()</code> method every time you want to take a snapshot.</p>
<p>Restoring a model is just as easy, just call the saver’s <code>restore()</code> method. There is a working code example showing how to do this in the file <code>[restore_model.py](https://github.com/wolfib/image-classification-CIFAR10-tf/blob/master/restore_model.py)</code>in the github repository.</p>
<p>After the training is finished, the final model is evaluated on the test set (remember, the test set contains data that the model has not seen so far, allowing us to judge how well the model is able to generalize to new data).</p>
<h3 id="heading-results">Results</h3>
<p>Let’s run the model with the default parameters via “<code>python run_fc_model.py</code>”. My output looks like this:</p>
<pre><code>Parameters: batch_size = <span class="hljs-number">400</span> hidden1 = <span class="hljs-number">120</span> learning_rate = <span class="hljs-number">0.001</span> max_steps = <span class="hljs-number">2000</span> reg_constant = <span class="hljs-number">0.1</span> train_dir = tf_logs
</code></pre><pre><code>Step <span class="hljs-number">0</span>, training accuracy <span class="hljs-number">0.09</span> Step <span class="hljs-number">100</span>, training accuracy <span class="hljs-number">0.2675</span> Step <span class="hljs-number">200</span>, training accuracy <span class="hljs-number">0.3925</span> Step <span class="hljs-number">300</span>, training accuracy <span class="hljs-number">0.41</span> Step <span class="hljs-number">400</span>, training accuracy <span class="hljs-number">0.4075</span> Step <span class="hljs-number">500</span>, training accuracy <span class="hljs-number">0.44</span> Step <span class="hljs-number">600</span>, training accuracy <span class="hljs-number">0.455</span> Step <span class="hljs-number">700</span>, training accuracy <span class="hljs-number">0.44</span> Step <span class="hljs-number">800</span>, training accuracy <span class="hljs-number">0.48</span> Step <span class="hljs-number">900</span>, training accuracy <span class="hljs-number">0.51</span> Saved checkpoint Step <span class="hljs-number">1000</span>, training accuracy <span class="hljs-number">0.4425</span> Step <span class="hljs-number">1100</span>, training accuracy <span class="hljs-number">0.5075</span> Step <span class="hljs-number">1200</span>, training accuracy <span class="hljs-number">0.4925</span> Step <span class="hljs-number">1300</span>, training accuracy <span class="hljs-number">0.5025</span> Step <span class="hljs-number">1400</span>, training accuracy <span class="hljs-number">0.5775</span> Step <span class="hljs-number">1500</span>, training accuracy <span class="hljs-number">0.515</span> Step <span class="hljs-number">1600</span>, training accuracy <span class="hljs-number">0.4925</span> Step <span class="hljs-number">1700</span>, training accuracy <span class="hljs-number">0.56</span> Step <span class="hljs-number">1800</span>, training accuracy <span class="hljs-number">0.5375</span> Step <span class="hljs-number">1900</span>, training accuracy <span class="hljs-number">0.51</span> Saved checkpoint Test accuracy <span class="hljs-number">0.4633</span> Total time: <span class="hljs-number">97.54</span>s
</code></pre><p>We can see that the training accuracy starts at a level we would expect from guessing randomly (10 classes -&gt; 10% chance of picking the correct one). Over the first about 1000 iterations the accuracy increases to around 50% and fluctuates around that value for the next 1000 iterations. The test accuracy of 46% is not much lower than the training accuracy. This indicates that our model is not significantly overfitted. The performance of the softmax classifier was around 30%, so 46% is an improvement of about 50%. Not bad!</p>
<h3 id="heading-visualization-with-tensorboard">Visualization with TensorBoard</h3>
<p>TensorBoard allows you to visualize different aspects of your TensorFlow graphs and is very useful for debugging and improving your networks. Let’s look at the TensorBoard-related lines of code spread throughout the codebase.</p>
<p>In <code>two_layer_fc.py</code> we find the following:</p>
<p>Each of these three lines creates a summary operation. By defining a summary operation you tell TensorFlow that you are interested in collecting summary information from certain tensors (<code>logits</code>, <code>loss</code> and <code>accuracy</code> in our case). The other parameter for the summary operation is just a label you want to attach to the summary.</p>
<p>There are different kinds of summary operations. We’re using <code>scalar_summary</code> to record information about scalar (non-vector) values and <code>histogram_summary</code> to collect info about a distribution of multiple values (more info about the various summary operations can be found in the <a target="_blank" href="https://www.tensorflow.org/api_docs/python/summary/">TensorFlow docs</a>).</p>
<p>In <code>run_fc_model.py</code> the following lines are relevant for the TensorBoard visualization:</p>
<p>An operation in TensorFlow doesn’t run by itself, you need to either call it directly or call another operation which depends on it. Since we don’t want to call each summary operation individually each time we want to collect summary information, we’re using <code>tf.merge_all_summaries</code> to create a single operation which runs all our summaries.</p>
<p>During the initialization of the TensorFlow session we’re creating a summary writer. The summary writer is responsible for actually writing summary data to disk. In its constructor we supply <code>logdir</code>, the directory where we want the logs to be written. The optional graph argument tells TensorBoard to render a display of the whole TensorFlow graph.</p>
<p>Every 100 iterations we execute the merged summary operation and feed the results to the summary writer which writes them to disk.</p>
<p>To view the results we run TensorBoard via “<code>tensorboard --logdir=tf_logs</code>” and open <code>localhost:6006</code> in a web browser. In the “Events”-tab we can see how the network’s loss decreases and how its accuracy increases over time.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/0*pA0aqJZDvp8oH3ss.png" alt="Image" width="371" height="626" loading="lazy">
<em>TensorBoard charts displaying the model’s loss and accuracy over a training run.</em></p>
<p>The “Graphs”-tab shows a visualization of the TensorFlow graph we have defined. You can interactively rearrange it until you’re satisfied with how it looks. I think the following image shows the structure of our network pretty well.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/0*yCn9F7RKH5LftmxD.png" alt="Image" width="681" height="653" loading="lazy">
<em>TensorBoard displays the TensorBoard graph in an interactive visualization.</em></p>
<p>In the “Distribution”- and “Histograms”-tabs you can explore the results of the <code>tf.histogram_summary</code> operation we attached to <code>logits</code>, but I won’t go into further details here. More info can be found in the <a target="_blank" href="https://www.tensorflow.org/how_tos/summaries_and_tensorboard/">relevant section of the offical TensorFlow documentation</a>.</p>
<h3 id="heading-further-improvements">Further Improvements</h3>
<p>Maybe you’re thinking that training the softmax classifier took a lot less computation time than training the neural network. While that’s true, even if we kept training the softmax classifier as long as it took the neural network to train, it wouldn’t reach the same performance. The longer you train a model, the smaller the additional gains get and after a certain point the performance improvement is miniscule. We’ve reached this point with the neural network too. Additional training time would not improve the accuracy significantly anymore. There’s something else we could do though:</p>
<p>The default parameter values are chosen to be pretty ok, but there is some room for improvement left. By varying parameters such as the number of neurons in the hidden layer or the learning rate, we should be able to improve the model’s accuracy some more. A testing accuracy greater than 50% should definitely be possible with this model with some further optimization. Although I would be very surprised if this model could be tuned to reach 65% or more. But there’s another type of network architecture for which such an accuracy is easily doable: convolutional neural networks. These are a class of neural networks which are not fully connected. Instead they try to make sense of local features in their input, which is very useful for analyzing images. It intuitively makes a lot of sense to take spatial information into account when looking at images. In part 3 of this series we will see the principles of how convolutional neural networks work and build one ourselves.</p>
<p>Stay tuned for part 3 on convolutional neural networks and thanks a lot for reading! I’m happy about any feedback you might have!</p>
<p>aYou can also check out other articles I’ve written on <a target="_blank" href="http://www.wolfib.com">my blog</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
