image processing - freeCodeCamp.org

How to Build and Deploy an Image Hosting Service on Sevalla

Manish Shivanandhan — Fri, 26 Sep 2025 13:15:09 +0000

When most people think of image hosting, they imagine uploading photos to a cloud service and getting back a simple link.

It feels seamless, but behind that experience sits a powerful set of technologies. At the core is something called object storage, which is a different way of handling files compared to traditional databases or file systems.

In this article, we’ll build a complete image hosting service using Node.js and Express, connect it to object storage, and finally, deploy the whole project to Sevalla.

By the end, you will have a working application that lets users upload images and retrieve them through hosted URLs, all running live on the cloud.

What is Object Storage?
What We Will Be Building
How to Set Up the Project
How to Create Your Object Storage
How to Deploy Your Project on Sevalla
Why This Project Matters
Conclusion

What is Object Storage?

To understand why our project is designed the way it is, we need to first understand object storage.

Traditional file storage systems save files in a hierarchy of folders, like your computer’s file explorer. Block storage systems, often used in databases, split data into chunks and manage them for speed and reliability.

Object storage is different. It treats each file, whether an image, video, or document, as a single object. Each object is stored with its metadata and a unique identifier inside a flat structure, usually called a bucket.

This flat architecture makes object storage scalable almost without limit. Instead of worrying about file paths or directories, you simply place an object in a bucket and get back an identifier.

Amazon S3 is the industry standard for object storage, offering massive scale, global replication, and advanced features, but it comes with added complexity and often unpredictable costs. Sevalla’s object storage, on the other hand, is designed for developers who want the same durability and scalability without the steep learning curve.

It provides a simpler setup, and is compatible with S3, so interacting with it is same as using a S3 bucket without the additional setup and complexity. While S3 is ideal for enterprises with petabytes of data, Sevalla’s solution is perfect for projects like image hosting, blogs, or mobile apps where ease of use and speed matter most.

What We Will Be Building

We will create a simple yet practical image hosting service. At its core, the service allows a user to send an image through an HTTP request. The server will accept this image, process it, and store it in object storage.

The usefulness of such a project goes far beyond a coding exercise. If you are building a blog, you could use this service to store images for your posts without worrying about file management on your web server.

If you are developing a mobile app that requires profile pictures or image sharing, this backend can serve as your foundation. Even if you simply want to understand how cloud-native applications handle file uploads, this project gives you a clear, hands-on experience.

By the end, you will not just have code running locally. We will deploy the application on Sevalla, meaning your image hosting service will be live, scalable, and accessible to anyone with a link.

How to Set Up the Project

Let us start by setting up a Node.js project. You can clone this repository if you don’t want to setup the project from scratch.

Create a new project directory, initialize it with npm, and install the required dependencies.

npm init -y
npm i express multer dotenv @aws-sdk/client-s3 @aws-sdk/s3-request-presigner

We will use Express for our web server, Multer for handling file uploads, and the AWS SDK to connect to object storage. Multer acts as middleware, giving us easy access to uploaded files. The AWS SDK gives us programmatic access to object storage, allowing us to upload files and generate links.

Let’s write a quick index.html and put it inside the public/ directory to serve as the UI for file upload.

html>
<html lang="en">
<head>
  <meta charset="utf-8" /> 
  <meta name="viewport" content="width=device-width,initial-scale=1" /> 
  <title>Pic Hosttitle>

  
  <style>
    :root { color-scheme: light dark; } /* Support dark/light themes */
    body { 
      font-family: system-ui, sans-serif; 
      max-width: 560px; 
      margin: 4rem auto; 
      padding: 0 1rem; 
    }
    h1 { font-size: 1.25rem; margin-bottom: 1rem; }
    form, .card { 
      border: 1px solid #9993; 
      padding: 1rem; 
      border-radius: 12px; 
    }
    input[type="file"] { margin: .5rem 0 1rem; }
    button { 
      padding: .6rem 1rem; 
      border-radius: 10px; 
      border: 1px solid #9995; 
      background: #0000FF; 
      cursor: pointer; 
    }
    #result { margin-top: 1rem; display: none; }
    #result a { word-break: break-all; } /* Break long URLs nicely */
  style>
head>
<body>
  
  <h1>Simple Image Hosth1>

  
  <form id="uploadForm" class="card">
    <label for="file">Choose imagelabel><br/>
    <input id="file" name="file" type="file" accept="image/*" required />
    <br/>
    <button type="submit">Uploadbutton>
    
    <div id="status" aria-live="polite" style="margin-top:.75rem;">div>
  form>

  
  <div id="result" class="card">
    <div>
      <strong>Share this page:strong> 
      <a id="pageUrl" href="#" target="_blank" rel="noopener">a>
    div>
  div>

  
  <script>
    const form = document.getElementById('uploadForm');   // Form element
    const statusEl = document.getElementById('status');   // Upload status
    const result = document.getElementById('result');     // Result box
    const pageUrlEl = document.getElementById('pageUrl'); // Share link
    const directUrlEl = document.getElementById('directUrl'); // (unused here)

    // Event listener for form submission
    form.addEventListener('submit', async (e) => {
      e.preventDefault(); // Prevent full-page reload
      statusEl.textContent = 'Uploading...'; 
      result.style.display = 'none';

      const fd = new FormData(); // FormData object for sending file
      const file = document.getElementById('file').files[0];
      if (!file) {
        statusEl.textContent = 'Pick a file first.';
        return;
      }
      fd.append('file', file); // Attach file to request

      try {
        // Send file to backend /upload route
        const res = await fetch('/upload', { method: 'POST', body: fd });
        if (!res.ok) throw new Error('Upload failed');
        const data = await res.json();

        // Show returned page URL
        pageUrlEl.textContent = data.pageUrl;
        pageUrlEl.href = data.pageUrl;

        // Display result card and reset form
        result.style.display = 'block';
        statusEl.textContent = 'Done!';
        form.reset();
      } catch (err) {
        // Handle error
        statusEl.textContent = 'Error: ' + err.message;
      }
    });
  script>
body>
html>

When a user visits the page, they’ll see a simple upload form with a file picker. They can select an image from their computer and click Upload. Then JavaScript intercepts the form submission using addEventListener('submit'), prevents the browser from doing a full page refresh, and instead, packages the selected file into a FormData object.

That file is then sent to the server with a fetch call to the /upload route. If the server responds successfully, the JSON returned contains a pageUrl. This URL is displayed inside the result card, which was initially hidden. The user can now copy this link and share it with others.

If something goes wrong, like no file being selected, the server erroring out, or the upload failing, the script updates the status message to inform the user.

Here’s how it looks to the user.

Now let’s create the backend using server.js file.

import path from "path"; // For working with file paths
import express from "express"; // Web framework to handle HTTP routes
import multer from "multer"; // Middleware for handling file uploads
import crypto from "crypto"; // Used to generate random unique IDs
import dotenv from "dotenv"; // Loads environment variables from .env file
import { fileURLToPath } from "url"; // For handling ES module file paths
import {
  S3Client,
  PutObjectCommand,
  HeadObjectCommand,
  GetObjectCommand,
} from "@aws-sdk/client-s3"; // AWS SDK commands for S3 operations
import { getSignedUrl } from "@aws-sdk/s3-request-presigner"; // To generate temporary signed URLs

dotenv.config(); // Load environment variables

// Setup paths for __dirname and __filename in ES modules
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

// Bucket name from environment
const S3_BUCKET = process.env.S3_BUCKET;

// Create an S3 client (works with Sevalla-compatible storage as well)
const s3 = new S3Client({
  region: "auto", // Auto-region for Sevalla
  endpoint: process.env.ENDPOINT, // Custom endpoint for object storage
  credentials: {
    accessKeyId: process.env.AWS_ACCESS_KEY_ID, // From .env
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY, // From .env
  },
});

// Initialize Express app
const app = express();

// Serve static files (like index.html, CSS, JS) from "public" folder
app.use(express.static(path.join(__dirname, "public")));

// Multer setup: store uploaded files in memory (not on disk)
// Limit file size to 10MB
const upload = multer({
  storage: multer.memoryStorage(),
  limits: { fileSize: 10 * 1024 * 1024 },
});

// ---------- ROUTE 1: GET / ----------
// Serves the main HTML file (upload form)
app.get("/", (req, res) => {
  res.sendFile(path.join(__dirname, "public", "index.html"));
});

// ---------- ROUTE 2: POST /upload ----------
// Handles image uploads and stores them in object storage
app.post("/upload", upload.single("file"), async (req, res) => {
  try {
    // Check if file exists
    if (!req.file) return res.status(400).json({ error: "file is required" });

    // Generate a random ID for the file
    const id = crypto.randomUUID().replace(/-/g, "");
    const key = id;

    // Create a PutObjectCommand to upload file to S3/Sevalla
    const put = new PutObjectCommand({
      Bucket: S3_BUCKET,
      Key: key,
      Body: req.file.buffer,
      ContentType: req.file.mimetype,
      Metadata: {
        originalname: req.file.originalname || "",
      },
    });

    // Upload the file
    await s3.send(put);

    // Build a page URL for retrieving the image later
    const baseUrl = `${req.protocol}://${req.get("host")}`;
    const pageUrl = `${baseUrl}/i/${id}`;

    // Respond with the page URL
    res.json({ id, pageUrl });
  } catch (err) {
    console.error(err);
    res.status(500).json({ error: "upload_failed" });
  }
});

// ---------- ROUTE 3: GET /i/:id ----------
// Redirects to a signed URL for secure access to the uploaded file
app.get("/i/:id", async (req, res) => {
  const { id } = req.params;
  const key = id;

  try {
    // Ensure the object exists in storage
    await s3.send(new HeadObjectCommand({ Bucket: S3_BUCKET, Key: key }));

    // Create a signed URL valid for 1 hour
    const command = new GetObjectCommand({ Bucket: S3_BUCKET, Key: key });
    const signedUrl = await getSignedUrl(s3, command, { expiresIn: 3600 });

    // Redirect user to the signed URL
    return res.redirect(302, signedUrl);
  } catch (err) {
    console.error(err);
    return res.status(404).send("Not found");
  }
});

// ---------- Boot the Server ----------
app.listen(process.env.PORT || 3000, () => {
  console.log(`Image host server listening for requests...`);
});

Route 1: `GET /`

This is the entry point of the app. When you open the browser and go to the root URL, it serves the index.html file from the public folder. That file contains the upload form where the user can select an image and submit it.

Route 2: `POST /upload`

This is where the magic happens. When a user selects an image and clicks “Upload,” the file is sent to this endpoint. Multer handles the file upload in memory, and then the file is pushed to object storage using the PutObjectCommand. A random unique ID is generated as the key for the file. Once uploaded, the server responds with a pageUrl that can be used to view the uploaded image later.

Route 3: `GET /i/:id`

This route retrieves an uploaded image. Instead of serving the file directly, it generates a signed URL valid for one hour using getSignedUrl. This signed URL gives temporary access to the file stored in object storage. The server then redirects the user to that signed URL. If the file doesn’t exist, it returns a 404 error.

Before you run this code, we need access to the object storage and add the value in an environment file. The code you see process.env fetches these values and helps us authenticate with the object storage to read and write files.

How to Create Your Object Storage

Once created, click “Settings” and you will see the access key and secret key. We need these four values

Bucket name
Endpoint URL
Access Key
Secret Key

Copy them into a file named .env within your project.

AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_ID_HERE
AWS_SECRET_ACCESS_KEY=YOUR_SECRET_ACCESS_KEY_HERE
S3_BUCKET=YOUR_BUCKET_NAME_HERE
ENDPOINT=YOUR_ENDPOINT_URL_HERE

Additionally, enable public access in the settings so that you can push files from your local environment.

Testing the Application Locally

Let’s make sure our code works locally.

node server.js

Go to http://localhost:3000/ and try uploading a file. It should give you the URL to view the file after a successful upload.

You can visit the URL to see your uploaded file. You can also double check if it has been uploaded using the Object Storage UI.

Great. We have built a simple image hosting and sharing service. Now let’s get this into the cloud.

How to Deploy Your Project on Sevalla

First, push your project to GitHub or fork my repository. Then log in to your Sevalla dashboard and create a new application.

Connect your GitHub account, choose the repository that contains your image hosting service, and select the branch you want to deploy. Sevalla will automatically detect that it is a Node.js project and install dependencies. It will also run the application on the specified port.

To configure AWS credentials and bucket information, go to the environment variables section in your app and add your AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, and S3_BUCKET_NAME. These values will be injected into your application at runtime, ensuring that sensitive data is not hardcoded into your source code.

Once environment variables are added, go to “Overview” and click “Deploy”.

Wait for a few minutes. Once the deployment is complete, Sevalla will give you a live URL. Click “Visit APP” to go to your application’s page.

Congratulations! Your app is now live. You can share the URL with others or even add a custom domain to your app to have your own image hosting solution.

Why This Project Matters

This project is more than just a coding exercise. It teaches you how modern applications manage files at scale, introduces you to object storage, and shows how to integrate cloud services into your own projects.

With Sevalla, you also learned how to deploy production-ready applications, giving you the full cycle from local prototype to live cloud service.

For developers building blogs, mobile apps, or even internal tools, the ability to host images reliably and at scale is invaluable. With object storage and a simple Node.js service, you can avoid reinventing the wheel and rely on proven cloud infrastructure.

Conclusion

We began by exploring object storage and why it is ideal for handling files like images. We then built a Node.js application that accepts uploads, stores them in Sevalla’s Object Storage, and returns accessible URLs. Finally, we deployed the application on Sevalla, turning a local project into a live image hosting service. Along the way, you gained not only working code but also a deeper understanding of how to build cloud-native services.

By completing this project, you now have a working image hosting service you can extend and adapt. You could add features like authentication, image resizing, or even a better front-end interface with drag-and-drop UI. Most importantly, you have experienced how development and deployment fit together in modern software.

How to Use Nano Banana for Image Generation - Explained with Code Examples

Tarun Singh — Fri, 19 Sep 2025 13:20:23 +0000

AI is changing the image generation and editing process into a smooth workflow. Now, with just a single prompt, you can tell your computer to generate or edit an existing image. Google just launched its new model for image generation or editing, "Nano Banana" – Gemini 2.5 Flash. It's a powerful, nimble tool that's changing how we think about image generation and manipulation, and it's something you'll definitely want in your developer toolkit.

In this article, you will learn how to use “Nano Banana” for Image Generation using Gemini’s 2.5 Flash Image. So, let’s get started!

What is "Nano Banana"?
- Why "Nano Banana"?
Setting Up Your Project
Beyond the Basics: What Else Can You Do?
Wrapping Up

What is "Nano Banana"?

Nano Banana is the latest image-editing cum generation tool from Google DeepMind. Forget the formal jargon for a second. Imagine you have an incredibly talented, lightning-fast artist at your beck and call. You can describe anything to them – "an astronaut riding a horse on the Moon" – and poof, it appears. Or, you hand them a picture of your dog and say, "Make the dog wear a cap on his head," and they do it instantly, keeping your cat looking like your dog.

That's essentially Nano Banana. It's an advanced AI model from the Gemini family, specifically engineered for rapid, intelligent image generation and nuanced editing. It understands your natural language commands, enabling you to bring complex visual ideas to life or make surgical changes to existing images with surprising ease.

Why "Nano Banana"?

Because it's small (flash!), packed with goodness, and leaves you feeling like you just peeled back a new layer of creative possibility. It's fast, efficient, and incredibly versatile.

The Superpowers You Get:

Prompt-Perfect Editing: Want to change a background, alter a pose, or add a specific object? Just ask. Nano Banana understands and executes.
Character Consistency: This is a big one. If you're creating a story or a series of images, maintaining the look of a specific character or object is crucial. Nano Banana excels at this, ensuring your protagonist looks the same whether they're in a forest or on the moon.
Visual Mashups (Multi-Image Fusion): Got a few different visual elements you want to combine seamlessly? It can blend them into a cohesive new image.

and much more!

Interested? Let's get our hands dirty. But wait! To use “Nano Banana, “ you have two ways to do this:

Using Google AI Studio: The simplest and easiest way to generate or edit images in Google Studio. This is a web-based tool that gives you direct access to the Gemini models without writing a single line of code. It's the absolute best place to test and start, and is useful for developers and non-developers, also. Also, there's no need to install libraries, manage API keys, or write any code
Building with the Gemini API: This is beneficial if you want more custom solutions for your application. For any serious application—whether it's a web app, a mobile app, or a backend service—you'll need to integrate directly with the Gemini API. This is where the real power lies, as it allows you to automate tasks and create interactive experiences.

In this tutorial, you will see how we can use this tool in our own applications, using nothing but Python. So, let’s get started.

How to Set Up Your Project

Step 1: Get an API key from Google Gemini

The very first step for using “Nano Banana” is to get an API key. Head over to Google AI Studio, click on “Create API key“, and generate a new one by specifying a project from your existing Google Cloud projects.

Once you have generated an API key, save it securely somewhere.

Step 2: Install the SDK and Other Dependencies

Open your terminal and run:

pip install google-generativeai pillow python-dotenv

We’ll use Pillow for easy image handling and python-dotenv to safely manage our API key.

Step 3: Set Up Your Environment

It’s crucial to keep your API key out of your code for security. For this, we usually use environment variables. So, create a file named .env in your project root and add your API key:

GEMINI_API_KEY="YOUR_API_KEY_HERE"

Step 4: Image Generation & Editing

Example 1: Text-to-Image Generation

Text-to-Image is like an artist who can draw anything you describe. In this, you simply write the prompt (a sentence or a description), even a very detailed one, and the AI will generate a unique, high-quality image that matches your description. It’s perfect for bringing your most imaginative ideas to life with just a few words.

import os
import google.generativeai as genai
from PIL import Image
from io import BytesIO
from dotenv import load_dotenv

# Configuration
load_dotenv()
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel('gemini-2.5-flash-image-preview')

# Prompt, Image, and Response Setup
prompt = "A golden retriever puppy sitting in a field of daisies, bright and cheerful"
output_filename = "text_to_image_result.png"

# saving image helper function from text prompt response
def save_image_from_response(response, filename):
    """Helper function to save the image from the API response."""
    if response.candidates and response.candidates[0].content.parts:
        for part in response.candidates[0].content.parts:
            if part.inline_data:
                image_data = BytesIO(part.inline_data.data)
                img = Image.open(image_data)
                img.save(filename)
                print(f"Image successfully saved as {filename}")
                return filename
    print("No image data found in the response.")
    return None

def main():
    print(f"Generating image for prompt: '{prompt}'...")
    response = model.generate_content(prompt)
    save_image_from_response(response, output_filename)

if __name__ == "__main__":
    main()

Output:

The code used in the example handles everything needed to communicate with the Gemini API and save the image.

First, we import the required libraries and load the API key from .env using load_dotenv(). This makes the key available so we can connect to Google’s service with genai.configure().
The model we’re using is gemini-2.5-flash-image-preview, which is designed for fast image generation.
We define a prompt (“A golden retriever puppy...”) and a filename for saving the image.
The helper function save_image_from_response(...) looks at the API’s response, extracts the raw image data, and saves it as a PNG file.
In main(), we call the model with the prompt, then pass the response to the helper function to save the result.
The if __name__ == "__main__": block ensures the script runs only when executed directly, not when imported.

Example 2: Image-to-Image Editing

Image-to-Image is like a photo editor. Instead of starting from scratch, you can upload an existing picture and describe how to change it. For instance, you can request background removal, addition of new objects, or even a complete artistic style change.

import os
import google.generativeai as genai
from PIL import Image
from io import BytesIO
from dotenv import load_dotenv

# Configuration
load_dotenv()
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel('gemini-2.5-flash-image-preview')

# Prompt, Image, and Response Setup
input_image_path = "input_dog.png"
prommpt = "Make the dog wear a small wizard hat and spectacles."
output_filename = "edited_image_result.png"

# saving image helper function from text prompt response
def save_image_from_response(response, filename):
    """Helper function to save the image from the API response."""
    if response.candidates and response.candidates[0].content.parts:
        for part in response.candidates[0].content.parts:
            if part.inline_data:
                image_data = BytesIO(part.inline_data.data)
                img = Image.open(image_data)
                img.save(filename)
                print(f"Image successfully saved as {filename}")
                return filename
    print("No image data found in the response.")
    return None

def main():
    print(f"Editing image '{input_image_path}' with prompt: '{prommpt}'...")
    try:
        img_to_edit = Image.open(input_image_path)
        response = model.generate_content([prommpt, img_to_edit])
        save_image_from_response(response, output_filename)
    except FileNotFoundError:
        print(f"Error: The file '{input_image_path}' was not found.")

if __name__ == "__main__":
    main()

Output:

This code is very similar to the first example, but the key difference is in the core logic.

input_image_path: This variable now holds the file path to the image you want to edit.
Image.open(input_image_path): This line uses the Pillow library to open your local image file to be used.
model.generate_content([prommpt, img_to_edit]): This is the most important part. Unlike before, we now pass a list to the generate_content function that contains both the text prompt and the image object. This tells the API to use the provided image as a starting point for its generation.
try...except block: Here, we are handling the errors. It tries to open the image file, and if it fails (because the file isn't there), it will except the FileNotFoundError and print a friendly message to the user instead of crashing.

Example 3: Multi-Image Fusion

Multi-image fusion is like merging two or more images or objects. Upload several images and instruct the AI to blend them into one composite picture seamlessly. This is a tool for creating new scenes, combining people and backgrounds, or creating detailed product mockups.

import os
import google.generativeai as genai
from PIL import Image
from io import BytesIO
from dotenv import load_dotenv

# Configuration
load_dotenv()
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel('gemini-2.5-flash-image-preview')

# Prompt, Images, and Response Setup
image1_path = "dog_image.png"
image2_path = "cap_image.png"
prompt = "Make the dog from the first image wear the cap from the second image. The cap should fit realistically on the dog's head."
output_filename = "dog_with_cap_result.png"

def save_image_from_response(response, filename):
    """Helper function to save the image from the API response."""
    if response.candidates and response.candidates[0].content.parts:
        for part in response.candidates[0].content.parts:
            if part.inline_data:
                image_data = BytesIO(part.inline_data.data)
                img = Image.open(image_data)
                img.save(filename)
                print(f"Image successfully saved as {filename}")
                return filename
    print("No image data found in the response.")
    return None

def main():
    print(f"Fusing images '{image1_path}' and '{image2_path}'...")
    try:
        img1 = Image.open(image1_path)
        img2 = Image.open(image2_path)
        response = model.generate_content([prompt, img1, img2])
        save_image_from_response(response, output_filename)
    except FileNotFoundError:
        print("Error: One or both image files were not found.")

if __name__ == "__main__":
    main()

Output:

The logic of the code above is an extension of the Image-to-Image example.

image1_path and image2_path: These variables hold the paths to the two images you want to fuse or merge.
model.generate_content([prompt, img1, img2]): Here, the list passed to the generate_content function contains three items: the text prompt and both image objects. This tells the AI to use the prompt to combine the elements from both images into a single output.

Example 4: Image Restoration

This feature can restore old, faded, or damaged photos. Upload a picture and request Gemini to restore it. This includes sharpening low-quality images, colorizing old black-and-white photos, and enhancing textures, which can make your memories look new again.

import os
import google.generativeai as genai
from PIL import Image
from io import BytesIO
from dotenv import load_dotenv

# Configuration
load_dotenv()
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel('gemini-2.5-flash-image-preview')

# Prompt, Image, and Response Setup
input_image_path = "old_photo.png"
prompt = "Restore this old, faded photograph. Sharpen the details, remove any scratches or damage, and enhance the colors to make it look like a new, high-quality photo."
output_filename = "restored_image_result.png"

def save_image_from_response(response, filename):
    """Helper function to save the image from the API response."""
    if response.candidates and response.candidates[0].content.parts:
        for part in response.candidates[0].content.parts:
            if part.inline_data:
                image_data = BytesIO(part.inline_data.data)
                img = Image.open(image_data)
                img.save(filename)
                print(f"Image successfully saved as {filename}")
                return filename
    print("No image data found in the response.")
    return None

def main():
    print(f"Attempting to restore image: '{input_image_path}'...")
    try:
        old_photo = Image.open(input_image_path)
        response = model.generate_content([prompt, old_photo])
        save_image_from_response(response, output_filename)
    except FileNotFoundError:
        print(f"Error: The file '{input_image_path}' was not found.")

if __name__ == "__main__":
    main()

Output:

The structure here is identical to the Image-to-Image Editing example because, from a technical perspective, image restoration is a form of image-to-image editing.

Now the prompt is where the magic happens. The text prompt explicitly tells the model what to do with the image, outlining the restoration steps like "sharpen the details," "remove scratches," and "enhance the colors." The model's intelligence allows it to understand these abstract instructions and apply them to the visual data to give you a better and a realistic update to your old image.

Beyond the Basics: What Else Can You Do?

This is just the tip of the iceberg! Nano Banana is incredibly versatile. Here are some ideas for where you can take your projects:

Batch Processing: Automate the generation of multiple images from a list of prompts.
Creative Assets: Design icons, backgrounds, or character sprites for games or apps directly from your Python script.
Data Processing: Integrate Nano Banana into a data pipeline to programmatically edit or generate images based on data inputs.
AI Art Galleries: Build a backend service that allows users to submit prompts and receive images.

Wrapping Up

"Nano Banana" (Gemini 2.5 Flash Image) isn't just a cool tech tool; it's a practical, powerful tool for developers and creatives alike. With just a few lines of code, you can tap into its capabilities and bring your visual ideas to real life. This streamlined approach makes it easy to get started, experiment, and integrate this visual magic into your projects.

If you found this article helpful and want to discuss AI development, LLMs, or software development, feel free to connect with me on X/Twitter, LinkedIn, or check out my portfolio on my Blog. I regularly share insights about AI, development, technical writing, and much more.

Happy coding, and may your creations be as vibrant as a field of fresh bananas!

How to Enhance Images with Neural Networks

Manish Shivanandhan — Thu, 04 Sep 2025 00:44:55 +0000

Artificial intelligence is changing how we work with images. What once took hours in Photoshop can now happen in seconds with AI-powered tools. You can take a blurry picture, enlarge it without losing sharpness, fix the lighting, remove unwanted noise, or even bring color to a black-and-white photo, all with a single click.

The magic you see in these tools is powered by algorithms which are trained AI models that understand how images should look and then reconstruct them accordingly. These models have studied millions of examples to learn patterns, textures, and details, so they can “predict” what’s missing and fill it in naturally.

For developers, photographers, and content creators, knowing the basics of these algorithms can help you pick the right tools for your workflow. Even if you never plan to code an AI model yourself, this knowledge will help you make better choices for image processing, web apps, or creative projects.

Let’s look at five of the most important algorithms used in AI image enhancement today. Along the way, you’ll see real-world tools that use these algorithms and how you can try them yourself.

Image Colorization
GAN-Based Image Enhancement
Noise Reduction (Denoising Autoencoders)
Image Upscaling using Super-Resolution
Artifact Removal
Why These Algorithms Matter to Developers
Conclusion

Image Colorization

Automatic image colorization might be the most visually dramatic AI enhancement of all. It takes a black-and-white image and predicts the colors that should be there, often producing results that look like the photo was taken in full color.

The AI behind this uses convolutional neural networks (CNNs) trained on huge datasets of color images. The model sees both the grayscale and the color versions during training, so it learns how certain objects typically appear. For example, it might learn that grass is usually green, the sky is often blue, and human skin falls within a certain range of tones.

One of the most famous models is DeOldify, which combines CNNs with GANs. The GAN setup helps refine the results, making colors more natural and avoiding strange or overly bright tones.

Colorization has practical uses beyond restoring old family photos. It’s used in film restoration, historical projects, digital storytelling, and even concept art.

See Image Colorization in action.

GAN-Based Image Enhancement

GANs, or Generative Adversarial Networks, are one of the most powerful AI techniques in image enhancement. They consist of two neural networks: the generator, which tries to create realistic-looking images, and the discriminator, which evaluates them. Over many iterations, the generator becomes extremely good at producing images that pass as real.

In image retouching, GANs can handle many tasks at once, like fixing lighting, improving sharpness, enhancing textures, and even subtly changing elements to make the picture more appealing. Because GANs learn from real-world images, the results often feel more natural than traditional editing filters.

GAN-based retouching is used in professional portrait editing, e-commerce product photos, real estate listings, and even game asset creation. It’s also behind many “one-click enhance” buttons you see in modern apps.

See a GAN powered photo enhancer here.

Noise Reduction (Denoising Autoencoders)

Noise in images looks like random specks of color or brightness that shouldn’t be there. It often happens in low-light photos or in images taken with high ISO settings. Noise makes photos look grainy and less professional.

Traditional noise removal methods simply blurs the image to hide the noise, but this also destroyed fine details. AI noise reduction works differently.

Denoising Autoencoders, one of the most common approaches, learn from pairs of images—one clean and one noisy. The AI studies how noise distorts details, then learns to reverse the process.

When you pass a noisy photo through a denoising autoencoder, it removes the noise while preserving edges, textures, and important small details.

Noise reduction isn’t just for photography. It’s also used in document scanning to make text easier to read, medical imaging to clarify scans, cleaning up screenshots or UI mockups for presentations

See Noise Reduction in action here.

Image Upscaling using Super-Resolution

Super-resolution is the process of increasing the resolution of an image to make it sharper and larger without simply stretching the pixels.

In the past, enlarging a small image just made it blurry. AI super-resolution works differently. It studies the image, detects patterns, and then generates new pixels that match what would have been there in a higher-quality original.

One of the first big breakthroughs was SRCNN (Super-Resolution Convolutional Neural Network). SRCNN works by breaking the image into patches, analyzing them, and then predicting what higher-resolution patches should look like. This early approach was effective but sometimes produced overly smooth images.

Then came ESRGAN (Enhanced Super-Resolution Generative Adversarial Network), which took things further. ESRGAN uses a GAN architecture, a generator creates enhanced images, while a discriminator judges how real they look. Through this back-and-forth training, the generator learns to produce fine textures like hair strands, fabric weaves, or building details that look realistic to the human eye.

Super-resolution is widely used in e-commerce (for clearer product photos), printing (turning web images into high-resolution posters), and web apps (making user-uploaded images look professional).

See Super resolution powered image upscaler in action.

Artifact Removal

When a JPEG image is heavily compressed, it develops blocky patches, fuzzy edges, and strange halos around lines. These are called compression artifacts, and they appear because JPEG reduces file size by removing fine detail. Traditional fixes blur the image to hide these defects, but that also softens important edges and textures.

FBCNN, or Flexible Blind Convolutional Neural Network, takes a smarter approach. Instead of needing to know the exact compression level beforehand, FBCNN is trained to handle a wide range of artifact severities without extra input. This is what makes it “blind”, it doesn’t require metadata about how the JPEG was compressed. It can adapt its restoration process on the fly.

FBCNN works in two main steps. First, it extracts features from the image, analyzing patterns in edges, textures, and flat areas to identify where artifacts are most likely. Then, it applies a learned mapping to reconstruct what those regions should look like without the damage.

Because it can estimate the compression quality itself, FBCNN avoids the common problem of over-smoothing lightly compressed images or under-restoring heavily compressed ones.

This flexibility makes FBCNN useful in many scenarios: cleaning up low-quality images from social media, restoring graphics and text in screenshots, or preparing old compressed web images for printing. Modern AI tools often integrate FBCNN-style processing as a first step before applying super-resolution or general enhancement.

FBCNN’s ability to adapt without manual tuning makes it one of the most practical and developer-friendly models for real-world JPEG restoration today.

See artifact removal in action.

Why These Algorithms Matter to Developers

Even if you have never trained your own AI model, understanding these algorithms gives you a better sense of what’s possible and how to apply it. Many of the tools mentioned here offer APIs, which means developers can build them into their own apps and websites.

If you run a social platform, you can automatically enhance user-uploaded images before they appear in feeds. If you build e-commerce platforms, you can clean and upscale product images for better sales conversions. If you work in media archiving, you can restore and preserve images without spending hours on manual edits.

The real value comes from knowing which algorithm is right for the problem you’re solving. Super-resolution for enlarging, denoising for cleaning, colorization for restoration, artifact removal for fixing compression, and GAN retouching for overall beautification.

Conclusion

AI image enhancement has moved from research labs to everyday tools, making it possible for anyone to transform low-quality images into something sharp, vibrant, and professional. The algorithms behind these tools like super-resolution, denoising, colorization, artifact removal, and GAN retouching are the building blocks of modern visual AI.

Whether you’re a developer looking to integrate image processing into your app or a creator who wants to improve your visuals, knowing how these algorithms work will help you get the most out of AI. This is only the beginning and future models will be even more precise, faster, and capable of things we haven’t yet imagined. Developers who understand these foundations will be ready to make the most of the next wave of AI-powered creativity.

Hope you enjoyed this article. Signup for my free AI newsletter TuringTalks.ai for more hands-on tutorials on AI. You can also visit my website.

How to Blend Images in Rust Using Pixel Math

Anshul Sanghi — Tue, 27 Aug 2024 10:25:56 +0000

For anyone looking to learn about image processing as a programming niche, blending images is a very good place to start. It's one of the simplest yet most rewarding techniques when it comes to image processing.

To help your intuition, it's best to imagine an image as a mathematical graph of pixel values plotted along the x and y coordinates. The top right pixel in an image is your origin, which corresponds to an x value of 0 and a y value of 0.

Once you imagine this, any pixel in an image can be read or modified using it's coordinate in this x-y graph. For example, for a square image of size 5px x 5px, the coordinate of the center pixel is 2, 2. You may have expected it to be 3, 3, but image coordinates in this context work similar to array indexes and start from 0 for both axis.

Approaching image processing this way also helps you address each pixel individually, making the process much simpler.

Prerequisites

The focus of this article is for you to understand and learn how to blend images using the Rust programming language, without going into the details of the language or it's syntax. So being comfortable writing Rust programs is required.

If you're not familiar with Rust, I highly encourage you to learn the basics. Here's an interactive Rust course that can get you started.

Introduction
How Image Blending Works
Project Setup
How to Read Pixel Values
How to Blend Functions
How to Apply Blend Functions To Images
Putting It All Together
Glossary

Introduction

Image blending refers to the technique of merging pixels from multiple images to create a single output image that is derived from all of its inputs. Depending on which blending operation is used, the image output can vary widely given the same inputs.

This technique serves as the basis for many complex image processing tools, some of which you may already be familiar with. Things such as removing moving people from images if you have multiple images, merging images of the night sky to create star trails, and merging multiple noise-heavy images to create a noise reduced image are all examples of this technique at play.

To achieve the blending of images in this tutorial, we will make use of "pixel math", which while not being a truly standard term, refers to the technique of performing mathematical operations on a pixel or set of pixels to generate an output pixel.

For example, to blend two images using the "average" blend mode, you will perform the mathematical average operation on all input pixels at a given location, to generate the output at the same location.

Pixel math is not limited to point operations, which are basically operations performed during image processing that generate a given output pixel based on input pixel from single or multiple images from the same location in the x-y coordinate system.

In my experience so far, the entirety of image processing field is 99% mathematics and 1% black magic. Mathematical operations on pixels and it's surrounding pixels is the basis of image manipulation techniques such as compression, resizing, blurring and sharpening, noise reduction, and so on.

How Image Blending Works

The technique is technically simple to implement. Let's take the example of a simple average blend. Here's how it works:

Read the pixel data of both images into memory, usually into an array for each image.
- The array is usually 2 dimensional. Each entry in array is another array for color images, the secondary array holds the 3 pixel values corresponding to Red, Green, and Blue color channels.
For each pixel location:
1. For each channel:
  a. Take the value of the channel from the 2nd image, let's consider it y.
  b. Perform the averaging operation x/2 + y/2.
  c. Save the output value of this operation as the value of the output channel
2. Save the result of previous operation as the value of the output pixel.
Construct the output image with the same dimensions from the computed data.

You'll notice that pixel math is performed on a per-channel basis. This is always true for the blend modes we cover in this tutorial, but many techniques involve applying blends between the channels themselves and many times within the same image.

Project Setup

Let's get started by setting up a project that gives us a good baseline to work with.

cargo new --bin image-blender
cd image-blender

You will also need a single dependency to help you perform these operations:

cargo add image

image is a Rust library we'll use to work with images of all of the standard formats and encodings. It also helps us convert between various formats, and provides easy access to pixel data as buffers.

For more information on the image crate, you can refer to the official documentation.

To follow along, you can use any two images, the only requirement being that they should be of the same size and in the same format. You can also find the images used in this tutorial, along with complete code, in the GitHub repository here.

How to Read Pixel Values

The first step is to load the images and read their pixel values into a data structure that facilitates our operation. For this tutorial, we're going to use a Vec of arrays (Vec<[u8; 3]>). Each entry in the outer Vec represents a pixel, and the channel-wise values of each pixel are stored in [u8; 3] array.

Let's start by creating a new file to hold this code called io.rs.

// src/io.rs

use image::GenericImageView;

pub struct SourceData {
    pub width: usize,
    pub height: usize,
    pub image1: Vec<[u8; 3]>,
    pub image2: Vec<[u8; 3]>,
}

pub fn read_pixel_data(image1_path: String, image2_path: String) -> SourceData {
    // Open the images
    let image1 = image::open(image1_path).unwrap();
    let image2 = image::open(image2_path).unwrap();

    // Compute image dimensions
    let (width, height) = image1.dimensions();
    let (width, height) = (width as usize, height as usize);

    // Create arrays to hold input pixel data
    let mut image1_data: Vec<[u8; 3]> = vec![[0, 0, 0]; width * height];
    let mut image2_data: Vec<[u8; 3]> = vec![[0, 0, 0]; width * height];

    // Iterate over all pixels in the input image, along with their positions in x & y
    // coordinates.
    for (x, y, pixel) in image1.to_rgb8().enumerate_pixels() {
        // Compute the raw values for each channel in the RGB pixel.
        let [r, g, b] = pixel.0;

        // Compute linear index based on 2D index. This is basically computing index in
        // 1D array based on the row and column index of the pixel in the 2D image.
        let index = (y * (width as u32) + x) as usize;

        // Save the channel-wise values in the correct index in data arrays.
        image1_data[index] = [r, g, b];
    }

    // Iterate over all pixels in the input image, along with their positions in x & y
    // coordinates.
    for (x, y, pixel) in image2.to_rgb8().enumerate_pixels() {
        // Compute the raw values for each channel in the RGB pixel.
        let [r, g, b] = pixel.0;

        // Compute linear index based on 2D index. This is basically computing index in
        // 1D array based on the row and column index of the pixel in the 2D image.
        let index = (y * (width as u32) + x) as usize;

        // Save the channel-wise values in the correct index in data arrays.
        image2_data[index] = [r, g, b];
    }

    SourceData {
        width,
        height,
        image1: image1_data,
        image2: image2_data,
    }
}

How to Blend Functions

The next step is to implement the blending functions, which are pure functions that take two pixel values as input and return the output value. This is implemented through the BlendOperation trait defined below. Let's create a new file to host all the operations called operations.rs.

// src/operations.rs

pub trait BlendOperation {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3];
}

Next, we need to implement this trait for all of the blending methods we want to support.

For showcasing the result of each of the blending modes, the following two input images are blended together

Average Blend

An average blend involves channel-wise averaging the input pixel values to get the output pixel.

// src/operations.rs

pub struct AverageBlend;

impl BlendOperation for AverageBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            pixel1[0] / 2 + pixel2[0] / 2,
            pixel1[1] / 2 + pixel2[1] / 2,
            pixel1[2] / 2 + pixel2[2] / 2,
        ]
    }
}

Multiply Blend

A multiply blend involves channel-wise multiplication of input pixel values after they've been normalized[¹] to get the output pixel. The output pixel is then rescaled back to the original range by multiplying with 255.

// src/operations.rs

pub struct MultiplyBlend;

impl BlendOperation for MultiplyBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            ((pixel1[0] as f32 / 255. * pixel2[0] as f32 / 255.) * 255.) as u8,
            ((pixel1[1] as f32 / 255. * pixel2[1] as f32 / 255.) * 255.) as u8,
            ((pixel1[2] as f32 / 255. * pixel2[2] as f32 / 255.) * 255.) as u8,
        ]
    }
}

Lighten Blend

Lighten blend involves channel-wise comparison of input pixel values, selecting the pixel with higher value (intensity) as the output pixel.

// src/operations.rs

pub struct LightenBlend;

impl BlendOperation for LightenBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            pixel1[0].max(pixel2[0]),
            pixel1[1].max(pixel2[1]),
            pixel1[2].max(pixel2[2]),
        ]
    }
}

Darken Blend

Darken blend is the opposite operation of lighten blend. It involves channel-wise comparison of input pixel values, selecting the pixel with least value (intensity) as the output pixel.

// src/operations.rs

pub struct DarkenBlend;

impl BlendOperation for DarkenBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            pixel1[0].min(pixel2[0]),
            pixel1[1].min(pixel2[1]),
            pixel1[2].min(pixel2[2]),
        ]
    }
}

Screen Blend

Screen blend refers to multiplying the inverse of two images, and then inverting the result. In our implementation, the pixels first need to be normalized[¹]. The normalized[¹] values are then inverted by subtracting them from 1, then they're multiplied and inverted again.

Finally, the output is multiplied by 255 to de-normalize the output pixel value.

// src/operations.rs

pub struct ScreenBlend;

impl BlendOperation for ScreenBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            ((1. - ((1. - (pixel1[0] as f32 / 255.)) * (1. - (pixel2[0] as f32 / 255.)))) * u8::MAX as f32) as u8,
            ((1. - ((1. - (pixel1[1] as f32 / 255.)) * (1. - (pixel2[1] as f32 / 255.)))) * u8::MAX as f32) as u8,
            ((1. - ((1. - (pixel1[2] as f32 / 255.)) * (1. - (pixel2[2] as f32 / 255.)))) * u8::MAX as f32) as u8,
        ]
    }
}

Addition Blend

Addition blend involves adding the input values and then clamping the result to the maximum range of the color depth we're targeting. In this case, that would be 0-255 as we're targeting 8-bit color depth.

We also have to convert the values to u16 in order to avoid loss of value due to overflow. We can also use normalized[¹] values here to achieve the same result.

// src/operations.rs

pub struct AdditionBlend;

impl BlendOperation for AdditionBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            (pixel1[0] as u16 + pixel2[0] as u16).clamp(0, u8::MAX as u16) as u8,
            (pixel1[1] as u16 + pixel2[1] as u16).clamp(0, u8::MAX as u16) as u8,
            (pixel1[2] as u16 + pixel2[2] as u16).clamp(0, u8::MAX as u16) as u8,
        ]
    }
}

Subtraction Blend

Addition blend involves subtracting the input values and then clamping the result to the maximum range of the color depth we're targeting. In this case, that would be 0-255 as we're targeting 8-bit color depth.

We also convert the values to i16 in order to avoid loss of value due to overflow and lack of sign. We can also use normalized[¹] values here to achieve the same result.

// src/operations.rs

pub struct SubtractionBlend;

impl BlendOperation for SubtractionBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            (pixel1[0] as i16 - pixel2[0] as i16).clamp(0, u8::MAX as i16) as u8,
            (pixel1[1] as i16 - pixel2[1] as i16).clamp(0, u8::MAX as i16) as u8,
            (pixel1[2] as i16 - pixel2[2] as i16).clamp(0, u8::MAX as i16) as u8,
        ]
    }
}

How to Apply Blend Functions To Images

The final step is to actually use the blending operations we created previously and apply them to pairs of images.

To achieve this, we need a function that can take the SourceData type we defined previously as input, along with a blending operation as the arguments, and gives us the final output buffer. Let's start by creating a new file for it called blend.rs.

// src/blend.rs

use image::{ImageBuffer, Rgb};
use crate::{operations::BlendOperation, SourceData};

impl SourceData {
    pub fn blend_images(&self, operation: impl BlendOperation)  -> ImageBufferu8>, Vec<u8>> {
        let SourceData {
            width,
            height,
            image1,
            image2,
        } = self;

        // Create a new buffer that has the same size as input images, which will serve as our output data
        let mut buffer = ImageBuffer::new(*width as u32, *height as u32);

        // Iterate over all pixels in the output buffer, along with their coordinates
        for (x, y, output_pixel) in buffer.enumerate_pixels_mut() {
            // Compute linear index form x & y coordinates. In other words, you have the
            // row and column indexes here, and you want to compute the array index based
            // on these two positions.
            let index = (y * *width as u32 + x) as usize;

            // Store pixel values in the given position into variables
            let pixel1 = image1[index];
            let pixel2 = image2[index];

            // Compute the blended pixel and convert it into the `Rgb` type, which is then
            // assigned to the output pixel in the buffer.
            *output_pixel = Rgb::from(operation.perform_operation(pixel1, pixel2));
        }

        buffer
    }
}

Putting It All Together

It's now time to make use of all the new things you've learnt so far, and put them together in main.rs file.

// src/main.rs

mod blend;
mod io;
mod operations;

use io::*;
use operations::{
    AdditionBlend, AverageBlend, DarkenBlend, LightenBlend, MultiplyBlend, ScreenBlend,
    SubtractionBlend,
};

fn main() {
    let source_data = read_pixel_data("image1.jpg".to_string(), "image2.jpg".to_string());

    let output_buffer = source_data.blend_images(AdditionBlend);
    output_buffer.save("addition.jpg").unwrap();

    let output_buffer = source_data.blend_images(AverageBlend);
    output_buffer.save("average.jpg").unwrap();

    let output_buffer = source_data.blend_images(DarkenBlend);
    output_buffer.save("darken.jpg").unwrap();

    let output_buffer = source_data.blend_images(LightenBlend);
    output_buffer.save("lighten.jpg").unwrap();

    let output_buffer = source_data.blend_images(MultiplyBlend);
    output_buffer.save("multiply.jpg").unwrap();

    let output_buffer = source_data.blend_images(ScreenBlend);
    output_buffer.save("screen.jpg").unwrap();

    let output_buffer = source_data.blend_images(SubtractionBlend);
    output_buffer.save("subtraction.jpg").unwrap();
}

You can now run the program using the following command, and you should have all the images generated and saved in the project folder:

cargo run --release

As you might have guessed already, this implementation only works for 8-bit RGB images. This code, however, can be extended very easily to support the other color formats such as 8-bit Luma (Monochrome), 16-bit RGB (Many RAW camera images), and so on.

I highly encourage you to try that out. You can also reach out to me for help with anything in this tutorial or with extending the code in this tutorial. I'd be happy to answer all your queries. Email is the best way to reach me, you can email me at anshul@anshulsanghi.tech.

Glossary

Normalization refers to the process of rescaling the pixel values so that the values are in floating point format and are in the range of 0-1. For example, for an 8 bit image, the color black is represented by 0 (0 in de-normalized value) and the color white is represented by 1 (255 in de-normalized value). Intermediary decimal values between 0 & 1 represent different intensities of the pixel between black and white. Normalization is done for many different reasons such as:

Preventing overflows during calculations.
Re-scaling images to the same range irrespective of their individual color depth.
Expanding possible dynamic range of the image.

Enjoying my work?

Consider buying me a coffee to support my work!

Till next time, happy coding and wishing you clear skies!

Rust Tutorial – How to Build a Naïve Star Detector for Images

Anshul Sanghi — Tue, 16 Apr 2024 19:34:07 +0000

Star detection is a crucial step in many of the processing and analysis routines that we perform on astronomical images. It is extremely important for a process called plate-solving, which is the process of figuring out which part of the sky an image shows, or which part of the sky your telescope is pointed at.

All modern telescope mounts can make use of plate solving software to automatically figure out where they're pointed at, and in which direction they need to move to point at the correct location.

Star detection, sometimes, is also used in correcting the effect of atmosphere on the sharpness of targets such as galaxies. It is also crucial for combining astronomical images from multiple nights, telescopes, locations and so on into a single output image that has a very high signal-to-noise ratio.

With this tutorial, I'd like to introduce a very naïve technique for detecting stars in an image.

A quick note:

Star detection is a very complex topic, and I've only scratched the surface both in my own understanding and in this article.

The steps I use and describe in this article are derived from public documentation on existing real world applications (both for star detection and for edge detection), as well as some blog posts from incredibly knowledgeable people (which I link to at the end of the article, be sure to check them out).

As such, this implementation is intended for learning purposes only.

Before You Read

Prerequisites for the first part of the tutorial

The process described builds upon the concept of multi-scale processing of images using a trous wavelet transform. If you're not aware of what that is, I encourage you to learn more about it using my previous article that I just linked to, and then come back to this one.

This article also assumes that you have a basic understanding of Centroids. Just knowing what they mean is enough, as you don't have to calculate them yourself. Since the article focuses on image processing and analysis, a basic understanding of how pixels work in digital format is helpful, but not mandatory.

Prerequisites for the second part of this tutorial

Here, we focus on implementing the algorithm using the Rust programming language, without going much into the details of the language itself. So being comfortable writing Rust programs, and comfortable reading crate documentations is required.

If this is not you, you can still read Part 1 and learn the technique, and then maybe you'll want to then try it out in a language of your choice.

If you're not familiar with Rust, I highly encourage you to learn the basics. Here's an interactive Rust course that can get you started.

How Star Detection Works
How to Implement it in Rust
Further Reading
Wrapping Up

How Star Detection Works

Since this process involves a lot of steps, let's see how it works, with an increasing level of detail about what actually happens as we go along. With each increasing level, we'll be unwrapping the black box bit by bit.

What is Star Detection?

Star detection, at it's simplest form, involves isolating the stars from the rest of the image, and then performing edge detection on it.

1. Input image

2. Detected stars visualised using green circles

How Star Detection Works

First, you try to extract away the pixels that you think might be stars from the rest of the pixels in the image. This new image, that only contains the extracted pixels, is then analysed using edge detection techniques to find the star positions in 2D space.

1. Input image

2. Extracted pixels that are potentially stars

3. Detected stars visualised using green circles

An Intermediary Look At The Process

Then, you decompose your input image into multiple layers, each layer containing a part of the original data such that adding all layers gives us back the original data.

You then isolate the layers that would only contain small sized structures, such as noise and stars, and throw away the rest of the data.

Different layers of structure in the image that the input is decomposed into. We throw away the final layer and retain the rest in this example

With this filtered data, you find the edges in the image using the contouring technique (which is explained in the next section). Each contour gives us multiple "points" in the 2D space. You then try to draw a closed shape using the points you have.

Once you've done this, all you need is to find the center of this shape and you have the location of the stars.

1. Input image

2. Image after decomposing into layers and throwing away large scale data

3. Image after binarisation

4. Detected contours visualised using green outlines

5. Detected stars visualised using green circles

Picking It Apart

Using a multi-scale analysis technique facilitated by the à trous transform algorithm, you break down the image into multiple layers, each containing different scaled structures from the original image. You take the layers containing smaller scale structures and throw away the rest.

To these layers, you apply a bilateral denoising filter to reduce noise so that you can ensure that you're only left with stars and not noise that the algorithm might pick up as stars later on.

Different layers of structure in the image that the input is decomposed into. We throw away the final layer and retain the rest in this example.

1. Input image

2. Image after decomposing into layers and throwing away large scale data

3. Noise reduced image

Once you've filtered out the noise, you binarize your image using thresholding. Thresholding and binarization is the process of converting all of the pixels to either pure black or pure white, so that they're easier to work with. You can do this by selecting a certain intensity value, and all pixels with intensity less than this become black and all pixels with intensity more than this become white.

To find the optimum intensity value to binarize the image with, you define a minimum number of stars that you expect to find in the image, which is usually determined based on what you actually need to do with your star locations.

In our example, we'll start with a minimum of 500 and slowly push it to the limit of the sample image to see what happens.

Binarizing noise-reduced and wavelet filtered image

This makes the process of edge detection (which is the next step in our process) using contouring much more reliable.

Contouring is a term that describes the process of figuring out where the structures are in your image, and drawing a border along those structures – these are known as contours.

It is similar to edge-detection, but edge-detection helps you differentiate between individual neighbouring pixels, whereas contours are designed to work with a complete boundary of any structures in an image.

The library we'll be using finds the contours in an image using the algorithm proposed by Suzuki and Abe: Topological Structural Analysis of Digitized Binary Images by Border Following. Contouring in this manner will give you a collection of points that lie on the border of each contour.

For each contour it finds, you create a polygon by joining all of the border points within that contour. If this shape is an open shape, then you just extrapolate the final border to create a polygon, which needs to be a closed shape. You then use the centroid formulae on this polygon to find the center of mass of your shape, which gives you the center of your star (in most cases).

You also need to find the euclidean distances between the center of mass and each border point, the longest of which becomes the size of the star.

Contouring the binarized image to find closed polygons around stars visualised here using green outlines

Once you have your star size, you reject any stars that are either smaller than 1 pixel or larger than 24 pixels. These are educated guesses that I use, and they seem to give me the best results for sample images (but this is definitely a potential point of improvement).

After all of this, you should have the x and y coordinates of the star, as well as its size in pixels.

Detected stars visualised using green circles around them

We're going to stop there, but there's a lot more that you can do after this step to remove false-positives and fix the centroid/size of stars.

How to Implement it in Rust

Let's create a new library project:

cargo new --lib stardetect-rs && cd stardetect-rs

Prerequisites

You need a couple of dependencies to get started. Let's add them and I'll explain why you need them:

cargo add image imageproc image-dwt geo

image is a Rust library we'll use to work with images of all of the standard formats and encodings. It also helps us convert between various formats, and provides easy access to pixel data as buffers.
imageproc is another library by the people who created the image library. It's an extension for the same as it implements image processing functions and algorithms for the image lib.
image-dwt is my own library (shameless plug) that implements the à trous wavelet decomposition algorithm for image crate. This is needed to break down our image into multiple scales that I mentioned previously.
geo is a Rust library that allows us to easily work with geometric types (like points in 2d space), shapes (such as polygons), and algorithms implemented for them. We use this library to build our polygon based on contour data, and to also find the centroid of the polygon that I described above. It also helps us compute euclidean distances between points, which we use for determining star size.

How to read and decompose the input image

You start by reading the input image and decomposing it so that you're only left with stars (and noise).

You need to define a new struct that will act as a wrapper for your input image, and add a constructor for it to create an instance of this struct based on input:

// lib.rs
use image::{DynamicImage, GrayImage};

pub struct StarDetect {
    source: GrayImage,
}

impl From for StarDetect {
    fn from(source: DynamicImage) -> Self {
        Self {
            source: source.to_luma8(),
        }
    }
}

You then need to add the ability to extract the first n layers from wavelet decomposition of your image:

// lib.rs

use image_dwt::kernels::LinearInterpolationKernel;
use image_dwt::recompose::{OutputLayer, RecomposableWaveletLayers};
use image_dwt::transform::ATrousTransform;

impl StarDetect {
    fn extract_small_scale_structures(&mut self) {
        let (width, height) = self.source.dimensions();

        // Decompose the image into 8 layers
        let filtered_image = ATrousTransform::new(
            &DynamicImage::ImageLuma8(self.source.clone()),
            8,
            LinearInterpolationKernel,
        )
        // Filter out the residue image and keep the rest
        .filter(|item| item.pixel_scale.is_some())
        // Recompose the first 3 layers into a grayscale image.
        .recompose_into_image(width as usize, height as usize, OutputLayer::Grayscale);

        // Update the source image that we will work with
        // going forward.
        self.source = filtered_image.to_luma8();
    }
}

Noise reduction

Now that you have the input image (which should only contain noise and stars), let's get rid of the noise:

// lib.rs

impl StarDetect {
    fn apply_noise_reduction(&mut self) {
        self.source = imageproc::filter::bilateral_filter(&self.source, 10, 10., 3.);
    }
}

Next, you need to determine the optimum threshold value for a given minimum star count. You find it by picking a value and iteratively optimising it until you hit a star count that's more than the minimum.

How to optimize the threshold and binarization

Start by creating a new file threshold.rs and defining a trait with necessary methods. You need a method to optimise your threshold value and another for performing the binarization operation:

// threshold.rs

pub(crate) trait ThresholdingExtensions {
    fn optimize_threshold_for_star_count(&self, min_star_count: usize) -> u8;
    fn binarize(&mut self, threshold: u8);
}

Let's implement both of these:

// threshold.rs

use crate::centroid::find_star_centres_and_size;
use crate::StarDetect;

impl ThresholdingExtensions for StarDetect {
    fn optimize_threshold_for_star_count(&self, min_star_count: usize) -> u8 {
        // Current star count
        let mut star_count = 0;

        // Starting threshold value
        let mut threshold = u8::MAX;

        // Iterate until you've found the best threshold
        while star_count < min_star_count {
            // Panic if we reach the 0 intensity value while iterating.
            // This means that there are fewer stars than we hoped for.
            if threshold == 0 {
                panic!("Maximum iteration count reached");
            }

            // Reduce threshold to 95% of its previous value.
            // Using this, we check finer and finer differences
            // in threshold for each iteration.
            threshold = (0.95 * threshold as f32) as u8;

            // Clone the source data since we need to modify it
            // without affecting original data.
            let mut source = self.clone();

            // Binarize the source data image using current threshold
            ThresholdingExtensions::binarize(&mut source, threshold);

            // Find the number of stars detected with the current threshold
            star_count = find_star_centres_and_size(&source.source).len();
        }

        threshold
    }

    fn binarize(&mut self, threshold: u8) {
        // Iterate over every pixel in source image
        for pixel in self.source.iter_mut() {
            if *pixel > threshold {
                // If pixel intensity is greater than threshold
                // set it to maximum intensity instead.
                *pixel = u8::MAX;
            } else {
                // Otherwise, set it to 0 intensity.
                *pixel = 0;
            }
        }
    }
}

You might notice that we use the find_star_centres_and_size function when trying to find the optimised threshold value. We'll get to that shortly, as we need to declare some types that will hold the state of our computation before we implement the function.

Create a new file centroid.rs.

Define a new struct that will hold the coordinates and size of the star:

// centroid.rs

use imageproc::point::Point;

#[derive(Eq, PartialEq, Copy, Clone, Debug)]
pub struct StarCenter {
    coord: Point<u32>,
    radius: u32,
}

impl StarCenter {
    pub fn coord(&self) -> &Point<u32> {
        &self.coord
    }
    pub fn radius(&self) -> u32 {
        self.radius
    }
}

We've also defined methods to retrieve these fields. Point is a type provided to you by imageproc crate to store coordinates in an image.

How to construct polygons around stars

We're going to implement this function inside out. We first need a way to construct our polygon from contours. Let's implement that:

// centroid.rs

use geo::LineString;
use imageproc::contours::Contour;

pub(crate) fn construct_closed_polygon(contour: &Contour<u32>) -> LineString<f32> {
    // Create a new line string that connects all points
    // in the contour. This can create either an open
    // or a closed shape.
    let mut line_string = LineString::from_iter(contour.points.iter().map(|point| Coord {
        x: point.x as f32,
        y: point.y as f32,
    }));

    // If it is an open shape, close the shape to create a
    // polygon. This does nothing otherwise.
    line_string.close();

    line_string
}

Contour is a type provided by the imageproc crate, which is what it returns as the result of contouring operation on an image. It contains a list of points that lie on the border of the contour.

LineString is a type provided by geo and is defined by them as "An ordered collection of two or more Coords, representing a path between locations.". In this case, we use this type to construct the polygon shape.

How to detect star size and location using contours

Next, you need a way to compute the StarCenter type we declared previously from contour data:

// centroid.rs

use geo::{Centroid, Coord, EuclideanDistance};

pub(crate) fn filter_map_contour_to_star_centers(contour: &Contour<u32>) -> Option {
    // If there are no points in the contour
    // it is not a star.
    if contour.points.is_empty() {
        return None;
    }

    if contour.points.len() == 1 {
        // If there's only 1 point in the contour
        // consider it to be the center of the star
        // of size 1px.
        let center = contour.points.first().unwrap();
        let radius = 1_u32;

        return Some(StarCenter {
            coord: *center,
            radius,
        });
    }

    // Otherwise, construct a polygon around the star based on
    // contour information.
    let polygon = construct_closed_polygon(contour);

    // Find the centre of gravity of this polygon (centroid)
    let center = polygon.centroid().unwrap();

    // Find the radius of the star based on maximum distance between
    // the centroid and any of the points in contour.
    let radius = polygon.points().fold(0., |distance, point| {
        point.euclidean_distance(¢er).max(distance)
    });

    // If the radius is less than 1px or more than 24px
    // we reject it as a non-star.
    if !(1. ..=24.).contains(&radius) {
        return None;
    }

    // Construct star center based on previously computed information
    Some(StarCenter {
        coord: Point {
            x: center.x() as u32,
            y: center.y() as u32,
        },
        radius: radius as u32,
    })
}

This function utilises the construct_closed_polygon function you defined previously to compute the final star centers and sizes. Now for the easy part: let's implement the missing find_star_centres_and_size:

// centroid.rs

use image::GrayImage;

pub(crate) fn find_star_centres_and_size(image: &GrayImage) -> Vec {
    // Compute the contours in source image
    let contours = imageproc::contours::find_contours::<u32>(image);

    contours
        .iter()
        // Iterate over all contours and create a list
        // of star center and size data.
        .filter_map(filter_map_contour_to_star_centers)
        .collect()
}

How to encapsulate the process

All you need now is to implement one last method on the StarDetect struct that encapsulates the entire process:

// lib.rs

use crate::centroid::{find_star_centres_and_size, StarCenter};
use crate::threshold::ThresholdingExtensions;

impl StarDetect {
    pub fn find_stars(&mut self, min_stars: usize) -> Vec {
        self.extract_small_scale_structures();
        self.apply_noise_reduction();

        let threshold = self.optimize_threshold_for_star_count(min_stars);
        self.binarize(threshold);

        find_star_centres_and_size(&self.source)
    }
}

This method only calls the functions we've written so far. The user of your library will only need to call this function and nothing else.

You can now use what you've created to find stars in an image. For this article going forward, the image I'll be using to demonstrate is shown below. If you'd like to follow along, you can download the image I'll be using from here.

M42 Orion Nebula, The Dark Horse Nebula, The Flaming Star Nebula And The Surrounding H-Alpha Gas

As you might notice, we have a wide range of star shapes, sizes and colors in this image, but the same goes for noise and other large-scale nebulae structures too.

How to test the implementation on astronomical images

Create a new file main.rs and declare it as a binary target in the Cargo.toml file. It should look like this:

[package]
name = "stardetector"
version = "0.1.0"
edition = "2021"

[[bin]]
name = "stardetector"
path = "src/main.rs"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
geo = "0.28.0"
image = "0.25.1"
image-dwt = "0.3.2"
imageproc = "0.24.0"

You can finally use the lib we created to process the sample image. The final code in main.rs should look like this:

use image::Rgba;
use stardetector::StarDetect;

fn main() {
    // Load the image as mutable. You need mutability so that
    // you can draw on this image.
    let mut image = image::open("m42-star-detection.jpg").unwrap();

    // Create a new star detector instance. You clone the image
    // here because you need to also draw on the image for
    // visualisation purposes in this example.
    let mut star_detector = StarDetect::from(image.clone());

    // Run the star finder function with a minimum star count of
    // 500
    let stars = star_detector.find_stars(500);

    // Iterate over all stars you've found
    for star in stars {
        // Draw a hollow circle on the image so that you
        // can see what the algorithm found
        imageproc::drawing::draw_hollow_circle_mut(
            &mut image,
            (star.coord().x as i32, star.coord().y as i32),
            // Extend the radius by 4px so that it's easier to see
            // in the visualisation.
            star.radius() as i32 + 4,
            // Draw the circle with a pure green color
            Rgba([0, u8::MAX, 0, 1]),
        );
    }

    // Save the image with star positions annotated with
    // green circles.
    image.save("annotated.jpg").unwrap();
}

Ensure that the downloaded image is present at the root of this project folder.

We can finally run the program and see what it gives us:

cargo run --release