Anshul Sanghi - freeCodeCamp.org

How to Blend Images in Rust Using Pixel Math

Anshul Sanghi — Tue, 27 Aug 2024 10:25:56 +0000

For anyone looking to learn about image processing as a programming niche, blending images is a very good place to start. It's one of the simplest yet most rewarding techniques when it comes to image processing.

To help your intuition, it's best to imagine an image as a mathematical graph of pixel values plotted along the x and y coordinates. The top right pixel in an image is your origin, which corresponds to an x value of 0 and a y value of 0.

Once you imagine this, any pixel in an image can be read or modified using it's coordinate in this x-y graph. For example, for a square image of size 5px x 5px, the coordinate of the center pixel is 2, 2. You may have expected it to be 3, 3, but image coordinates in this context work similar to array indexes and start from 0 for both axis.

Approaching image processing this way also helps you address each pixel individually, making the process much simpler.

Prerequisites

The focus of this article is for you to understand and learn how to blend images using the Rust programming language, without going into the details of the language or it's syntax. So being comfortable writing Rust programs is required.

If you're not familiar with Rust, I highly encourage you to learn the basics. Here's an interactive Rust course that can get you started.

Introduction
How Image Blending Works
Project Setup
How to Read Pixel Values
How to Blend Functions
How to Apply Blend Functions To Images
Putting It All Together
Glossary

Introduction

Image blending refers to the technique of merging pixels from multiple images to create a single output image that is derived from all of its inputs. Depending on which blending operation is used, the image output can vary widely given the same inputs.

This technique serves as the basis for many complex image processing tools, some of which you may already be familiar with. Things such as removing moving people from images if you have multiple images, merging images of the night sky to create star trails, and merging multiple noise-heavy images to create a noise reduced image are all examples of this technique at play.

To achieve the blending of images in this tutorial, we will make use of "pixel math", which while not being a truly standard term, refers to the technique of performing mathematical operations on a pixel or set of pixels to generate an output pixel.

For example, to blend two images using the "average" blend mode, you will perform the mathematical average operation on all input pixels at a given location, to generate the output at the same location.

Pixel math is not limited to point operations, which are basically operations performed during image processing that generate a given output pixel based on input pixel from single or multiple images from the same location in the x-y coordinate system.

In my experience so far, the entirety of image processing field is 99% mathematics and 1% black magic. Mathematical operations on pixels and it's surrounding pixels is the basis of image manipulation techniques such as compression, resizing, blurring and sharpening, noise reduction, and so on.

How Image Blending Works

The technique is technically simple to implement. Let's take the example of a simple average blend. Here's how it works:

Read the pixel data of both images into memory, usually into an array for each image.
- The array is usually 2 dimensional. Each entry in array is another array for color images, the secondary array holds the 3 pixel values corresponding to Red, Green, and Blue color channels.
For each pixel location:
1. For each channel:
  a. Take the value of the channel from the 2nd image, let's consider it y.
  b. Perform the averaging operation x/2 + y/2.
  c. Save the output value of this operation as the value of the output channel
2. Save the result of previous operation as the value of the output pixel.
Construct the output image with the same dimensions from the computed data.

You'll notice that pixel math is performed on a per-channel basis. This is always true for the blend modes we cover in this tutorial, but many techniques involve applying blends between the channels themselves and many times within the same image.

Project Setup

Let's get started by setting up a project that gives us a good baseline to work with.

cargo new --bin image-blender
cd image-blender

You will also need a single dependency to help you perform these operations:

cargo add image

image is a Rust library we'll use to work with images of all of the standard formats and encodings. It also helps us convert between various formats, and provides easy access to pixel data as buffers.

For more information on the image crate, you can refer to the official documentation.

To follow along, you can use any two images, the only requirement being that they should be of the same size and in the same format. You can also find the images used in this tutorial, along with complete code, in the GitHub repository here.

How to Read Pixel Values

The first step is to load the images and read their pixel values into a data structure that facilitates our operation. For this tutorial, we're going to use a Vec of arrays (Vec<[u8; 3]>). Each entry in the outer Vec represents a pixel, and the channel-wise values of each pixel are stored in [u8; 3] array.

Let's start by creating a new file to hold this code called io.rs.

// src/io.rs

use image::GenericImageView;

pub struct SourceData {
    pub width: usize,
    pub height: usize,
    pub image1: Vec<[u8; 3]>,
    pub image2: Vec<[u8; 3]>,
}

pub fn read_pixel_data(image1_path: String, image2_path: String) -> SourceData {
    // Open the images
    let image1 = image::open(image1_path).unwrap();
    let image2 = image::open(image2_path).unwrap();

    // Compute image dimensions
    let (width, height) = image1.dimensions();
    let (width, height) = (width as usize, height as usize);

    // Create arrays to hold input pixel data
    let mut image1_data: Vec<[u8; 3]> = vec![[0, 0, 0]; width * height];
    let mut image2_data: Vec<[u8; 3]> = vec![[0, 0, 0]; width * height];

    // Iterate over all pixels in the input image, along with their positions in x & y
    // coordinates.
    for (x, y, pixel) in image1.to_rgb8().enumerate_pixels() {
        // Compute the raw values for each channel in the RGB pixel.
        let [r, g, b] = pixel.0;

        // Compute linear index based on 2D index. This is basically computing index in
        // 1D array based on the row and column index of the pixel in the 2D image.
        let index = (y * (width as u32) + x) as usize;

        // Save the channel-wise values in the correct index in data arrays.
        image1_data[index] = [r, g, b];
    }

    // Iterate over all pixels in the input image, along with their positions in x & y
    // coordinates.
    for (x, y, pixel) in image2.to_rgb8().enumerate_pixels() {
        // Compute the raw values for each channel in the RGB pixel.
        let [r, g, b] = pixel.0;

        // Compute linear index based on 2D index. This is basically computing index in
        // 1D array based on the row and column index of the pixel in the 2D image.
        let index = (y * (width as u32) + x) as usize;

        // Save the channel-wise values in the correct index in data arrays.
        image2_data[index] = [r, g, b];
    }

    SourceData {
        width,
        height,
        image1: image1_data,
        image2: image2_data,
    }
}

How to Blend Functions

The next step is to implement the blending functions, which are pure functions that take two pixel values as input and return the output value. This is implemented through the BlendOperation trait defined below. Let's create a new file to host all the operations called operations.rs.

// src/operations.rs

pub trait BlendOperation {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3];
}

Next, we need to implement this trait for all of the blending methods we want to support.

For showcasing the result of each of the blending modes, the following two input images are blended together

Average Blend

An average blend involves channel-wise averaging the input pixel values to get the output pixel.

// src/operations.rs

pub struct AverageBlend;

impl BlendOperation for AverageBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            pixel1[0] / 2 + pixel2[0] / 2,
            pixel1[1] / 2 + pixel2[1] / 2,
            pixel1[2] / 2 + pixel2[2] / 2,
        ]
    }
}

Multiply Blend

A multiply blend involves channel-wise multiplication of input pixel values after they've been normalized[¹] to get the output pixel. The output pixel is then rescaled back to the original range by multiplying with 255.

// src/operations.rs

pub struct MultiplyBlend;

impl BlendOperation for MultiplyBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            ((pixel1[0] as f32 / 255. * pixel2[0] as f32 / 255.) * 255.) as u8,
            ((pixel1[1] as f32 / 255. * pixel2[1] as f32 / 255.) * 255.) as u8,
            ((pixel1[2] as f32 / 255. * pixel2[2] as f32 / 255.) * 255.) as u8,
        ]
    }
}

Lighten Blend

Lighten blend involves channel-wise comparison of input pixel values, selecting the pixel with higher value (intensity) as the output pixel.

// src/operations.rs

pub struct LightenBlend;

impl BlendOperation for LightenBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            pixel1[0].max(pixel2[0]),
            pixel1[1].max(pixel2[1]),
            pixel1[2].max(pixel2[2]),
        ]
    }
}

Darken Blend

Darken blend is the opposite operation of lighten blend. It involves channel-wise comparison of input pixel values, selecting the pixel with least value (intensity) as the output pixel.

// src/operations.rs

pub struct DarkenBlend;

impl BlendOperation for DarkenBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            pixel1[0].min(pixel2[0]),
            pixel1[1].min(pixel2[1]),
            pixel1[2].min(pixel2[2]),
        ]
    }
}

Screen Blend

Screen blend refers to multiplying the inverse of two images, and then inverting the result. In our implementation, the pixels first need to be normalized[¹]. The normalized[¹] values are then inverted by subtracting them from 1, then they're multiplied and inverted again.

Finally, the output is multiplied by 255 to de-normalize the output pixel value.

// src/operations.rs

pub struct ScreenBlend;

impl BlendOperation for ScreenBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            ((1. - ((1. - (pixel1[0] as f32 / 255.)) * (1. - (pixel2[0] as f32 / 255.)))) * u8::MAX as f32) as u8,
            ((1. - ((1. - (pixel1[1] as f32 / 255.)) * (1. - (pixel2[1] as f32 / 255.)))) * u8::MAX as f32) as u8,
            ((1. - ((1. - (pixel1[2] as f32 / 255.)) * (1. - (pixel2[2] as f32 / 255.)))) * u8::MAX as f32) as u8,
        ]
    }
}

Addition Blend

Addition blend involves adding the input values and then clamping the result to the maximum range of the color depth we're targeting. In this case, that would be 0-255 as we're targeting 8-bit color depth.

We also have to convert the values to u16 in order to avoid loss of value due to overflow. We can also use normalized[¹] values here to achieve the same result.

// src/operations.rs

pub struct AdditionBlend;

impl BlendOperation for AdditionBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            (pixel1[0] as u16 + pixel2[0] as u16).clamp(0, u8::MAX as u16) as u8,
            (pixel1[1] as u16 + pixel2[1] as u16).clamp(0, u8::MAX as u16) as u8,
            (pixel1[2] as u16 + pixel2[2] as u16).clamp(0, u8::MAX as u16) as u8,
        ]
    }
}

Subtraction Blend

Addition blend involves subtracting the input values and then clamping the result to the maximum range of the color depth we're targeting. In this case, that would be 0-255 as we're targeting 8-bit color depth.

We also convert the values to i16 in order to avoid loss of value due to overflow and lack of sign. We can also use normalized[¹] values here to achieve the same result.

// src/operations.rs

pub struct SubtractionBlend;

impl BlendOperation for SubtractionBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            (pixel1[0] as i16 - pixel2[0] as i16).clamp(0, u8::MAX as i16) as u8,
            (pixel1[1] as i16 - pixel2[1] as i16).clamp(0, u8::MAX as i16) as u8,
            (pixel1[2] as i16 - pixel2[2] as i16).clamp(0, u8::MAX as i16) as u8,
        ]
    }
}

How to Apply Blend Functions To Images

The final step is to actually use the blending operations we created previously and apply them to pairs of images.

To achieve this, we need a function that can take the SourceData type we defined previously as input, along with a blending operation as the arguments, and gives us the final output buffer. Let's start by creating a new file for it called blend.rs.

// src/blend.rs

use image::{ImageBuffer, Rgb};
use crate::{operations::BlendOperation, SourceData};

impl SourceData {
    pub fn blend_images(&self, operation: impl BlendOperation)  -> ImageBufferu8>, Vec<u8>> {
        let SourceData {
            width,
            height,
            image1,
            image2,
        } = self;

        // Create a new buffer that has the same size as input images, which will serve as our output data
        let mut buffer = ImageBuffer::new(*width as u32, *height as u32);

        // Iterate over all pixels in the output buffer, along with their coordinates
        for (x, y, output_pixel) in buffer.enumerate_pixels_mut() {
            // Compute linear index form x & y coordinates. In other words, you have the
            // row and column indexes here, and you want to compute the array index based
            // on these two positions.
            let index = (y * *width as u32 + x) as usize;

            // Store pixel values in the given position into variables
            let pixel1 = image1[index];
            let pixel2 = image2[index];

            // Compute the blended pixel and convert it into the `Rgb` type, which is then
            // assigned to the output pixel in the buffer.
            *output_pixel = Rgb::from(operation.perform_operation(pixel1, pixel2));
        }

        buffer
    }
}

Putting It All Together

It's now time to make use of all the new things you've learnt so far, and put them together in main.rs file.

// src/main.rs

mod blend;
mod io;
mod operations;

use io::*;
use operations::{
    AdditionBlend, AverageBlend, DarkenBlend, LightenBlend, MultiplyBlend, ScreenBlend,
    SubtractionBlend,
};

fn main() {
    let source_data = read_pixel_data("image1.jpg".to_string(), "image2.jpg".to_string());

    let output_buffer = source_data.blend_images(AdditionBlend);
    output_buffer.save("addition.jpg").unwrap();

    let output_buffer = source_data.blend_images(AverageBlend);
    output_buffer.save("average.jpg").unwrap();

    let output_buffer = source_data.blend_images(DarkenBlend);
    output_buffer.save("darken.jpg").unwrap();

    let output_buffer = source_data.blend_images(LightenBlend);
    output_buffer.save("lighten.jpg").unwrap();

    let output_buffer = source_data.blend_images(MultiplyBlend);
    output_buffer.save("multiply.jpg").unwrap();

    let output_buffer = source_data.blend_images(ScreenBlend);
    output_buffer.save("screen.jpg").unwrap();

    let output_buffer = source_data.blend_images(SubtractionBlend);
    output_buffer.save("subtraction.jpg").unwrap();
}

You can now run the program using the following command, and you should have all the images generated and saved in the project folder:

cargo run --release

As you might have guessed already, this implementation only works for 8-bit RGB images. This code, however, can be extended very easily to support the other color formats such as 8-bit Luma (Monochrome), 16-bit RGB (Many RAW camera images), and so on.

I highly encourage you to try that out. You can also reach out to me for help with anything in this tutorial or with extending the code in this tutorial. I'd be happy to answer all your queries. Email is the best way to reach me, you can email me at anshul@anshulsanghi.tech.

Glossary

Normalization refers to the process of rescaling the pixel values so that the values are in floating point format and are in the range of 0-1. For example, for an 8 bit image, the color black is represented by 0 (0 in de-normalized value) and the color white is represented by 1 (255 in de-normalized value). Intermediary decimal values between 0 & 1 represent different intensities of the pixel between black and white. Normalization is done for many different reasons such as:

Preventing overflows during calculations.
Re-scaling images to the same range irrespective of their individual color depth.
Expanding possible dynamic range of the image.

Enjoying my work?

Consider buying me a coffee to support my work!

Till next time, happy coding and wishing you clear skies!

Rust Tutorial – Learn Advanced Iterators & Pattern Matching by Building a JSON Parser

Anshul Sanghi — Wed, 29 May 2024 10:45:15 +0000

Iterators and match patterns are two of the most used language features in Rust. If you have written any real world application, big or small, chances are that you've already used these, whether knowingly or unknowingly.

In this tutorial, I aim to help you understand how they actually work, the many ways they're usually used, and how powerful they are by writing a JSON parser that heavily uses these features.

Disclaimer

The goal of this tutorial is to create a real-world library that extensively uses match patterns and iterators. The goal is not to write a performant or fully-compliant JSON parser.

If you're very familiar with JSON, you will notice many things that are missing in this code, the biggest one being error handling when invalid tokens are encountered, and giving feedback to the user or helpful suggestions on what's wrong with the JSON.

This program also doesn't handle escape characters and sequences within string literals, for example. The code, for the most part, assumes that you have a valid JSON.

Prerequisites

Although this tutorial can be consumed by Rust programmers of any experience, previous experience or understanding of basic iterators and match patterns in Rust is helpful.

It is also assumed that you're familiar with the most basic Rust concepts such as traits, structs, enums, for loops, impl blocks, and so on. The tutorial does introduce you to iterator and match, so you don't need to be familiar with those to benefit from this tutorial.

What are Iterators in Rust?
1. How to implement iterators in Rust
2. What are peekable iterators in Rust?
What is The Match Statement in Rust?
How to Build a JSON Parser – Stage 1: Reader
How to Build a JSON Parser – Stage 2: Prepare Intermediate Data Types
1. The value type
2. How to add helpful conversion methods
How to Build a JSON Parser – Stage 3: Tokenization
How to Build a JSON Parser – Stage 4: From Tokens To Value
How to Use the JSON parser
Wrapping Up

What are Iterators in Rust?

Iterators are not a new concept, neither are they specific to Rust. It's both a pattern that is also implemented as an object in most programming languages for working with lists (such as arrays or vectors) or collections (such as HashMaps) that allows you to traverse through these data types and act on individual entries in them.

In Rust, iterators are a very powerful feature. The official Rust book describes it as:

The iterator pattern allows you to perform some task on a sequence of items in turn. An iterator is responsible for the logic of iterating over each item and determining when the sequence has finished. When you use iterators, you don’t have to reimplement that logic yourself.

In Rust, iterators are lazy, meaning they have no effect until you call methods that consume the iterator to use it up.

An iterator is an object that facilitates sequential access to the elements of a collection, like an array or a vector, without exposing the implementation details.

How to implement iterators in Rust

Iterators are implemented in Rust using a collection of traits, the most basic of which is the Iterator trait. It is implemented for all collections in the standard library, and can be implemented for custom types as well.

It requires the implementation of a single method: next(). This method returns an Option, where T is the type of element the iterator is for. When next() is called (the call is implicit in most cases and you generally use higher level methods), the iterator produces Some(value) for the next element in the sequence or None when the iteration is complete. In most cases, whether the value is Some or None is also implicit.

For example, anything that implements the Iterator trait, can be used with a for loop directly, which implicitly handles both calling the next method as well as handling whether the value is Some or None. A None value triggers the loop to end. This is true for the built-in types such as arrays, slices, vectors, and hash-maps as well.

For example, let's implement the iterator trait on a simple custom type. You need to store the current state of the iterator in the type. You can also store any additional information you need. Here, we just need to know the max number after which the iteration should end:

use std::iter::Iterator;

struct CustomType {
    current: usize,
    max: usize,
}

impl CustomType {
    fn new(max: usize) -> Self {
        Self {
            current: 0,
            max,
        }
    }
}

impl Iterator for CustomType {
    type Item = usize;

    fn next(&mut self) -> Option {
        if self.current >= self.max {
            None
        } else {
            self.current += 1;
            Some(self.current)
        }
    }
}

fn main() {
    let custom = CustomType::new(10);

    for item in custom {
        println!("Item is {item}");
    }
}

# Output

Item is 1
Item is 2
Item is 3
Item is 4
Item is 5
Item is 6
Item is 7
Item is 8
Item is 9
Item is 10

Rust iterators are also lazily evaluated, meaning they do not do anything unless they're used. This means that until you actually want to get the next value and do something with it, it won't even compute what the next value is.

This also means that if you have a chain of operations, such as a map and a filter, each item will go through the entire pipeline first, before the code processes the next item. This is unlike many other languages which support map and filter as methods, where first the entire map will be processed for all operations, and then the filter will be performed.

If you think about it carefully, iterators allow us to write parallel processing pipelines in a much easier way than the counterpart.

Since Iterator is just a trait, it allows for iterators to be chain-able and transformable to other iterators using various adapter methods (either the ones in standard library, or the ones that you can implement yourself).

What are peekable iterators in Rust?

Many times, you need the ability to know what the next element will be for deciding what to do, without actually modifying the iterator state for it to move to the next element. This is especially necessary when working with an iterator of tokens for parsing, as we'll do later in this tutorial.

This is where the Peekable struct comes in. You can convert any iterator into a peekable iterator by calling the peekable method on it.

Let's take the previous example and see how peekable works in action:

use std::iter::Iterator;

struct CustomType {
    current: usize,
    max: usize,
}

impl CustomType {
    fn new(max: usize) -> Self {
        Self {
            current: 0,
            max,
        }
    }
}

impl Iterator for CustomType {
    type Item = usize;

    fn next(&mut self) -> Option {
        if self.current >= self.max {
            None
        } else {
            self.current += 1;
            Some(self.current)
        }
    }
}

fn main() {
    let mut custom = CustomType::new(2).peekable();

    let first = custom.peek();
    println!("{first:?}");

    let second = custom.next();
    println!("{second:?}");

    let third = custom.next();
    println!("{third:?}");

    let fourth = custom.next();
    println!("{fourth:?}");
}

# Output

Some(1)
Some(1)
Some(2)
None

I also wanted to show you how you can use iterators manually without a for loop, which is why you see all the calls to next method, and also that it returns Option instead of the value directly.

Also notice that the first and second variables are both Some(1). This is because we called peek the first time which returned the first element but without modifying the state of the iterator.

What is The Match Statement in Rust?

The match statement is a pattern-matching syntax in Rust that allows you to conditionally run code based on complex conditions in a concise syntax. You can think of it as a switch statement from other languages, but on steroids.

A very simple example of a match statement is:

let value = true;

match value {
    true => {
        println!("Value is true")
    },
    false => {
        println!("Value is false")
    }
}

The various conditions defined above, namely true and false, are known as branches. Each branch can have a single match, multiple matches separated by the pipe | operator, and ranges. They also can have guards and binding for each arm. Let's see what each of these mean:

// Multiple conditions per branch

let value = "some_string";

match value {
    "some_string1" | "some_string2" | "some_string3" => {
        println!("Bad match");
    }
    "some_string" => {
        println!("Good match");
    }
    _ => {
        println!("No match");
    }
}

Notice the _ branch in the above example. Match statements require you to cover all possible cases. In the first example, since the value was a boolean, there were only two possible values, true and false. This is why in the first case, we already covered all possible values.

In the 2nd example however, the value we're matching against is a string ( &str to be more precise). A string can be any value. It's impossible to write a match statement that can ever cover all possible cases for this example. Good thing is, Rust has a special matcher _ that matches any value.

If you're experienced with JavaScript or C (or many other languages that have the traditional switch syntax), _ is equivalent to the default case in switch, but you don't need to use _, you can also bind it to a variable and handle it differently. We'll see how to do this shortly.

How to use iterators in match statements in Rust

A match statement allows you to use iterators as branches. A successful match occurs if the value being matched is one of the values in the iterator. For example, say you are matching if a char type is a digit or not. You can write a simple iterator of characters that contains all the digit characters and use that as a branch:

let value: char = '5';

match value {
    '0'..='9' => {
        println!("Character is a digit");
    }
    _ => {
        println!("Character is not a digit");
    }
}

The above example will print "Character is a digit". If you're not familiar with the ..= syntax, it's a shorthand to create iterators over a range. In the example above, the iterator starts at '0' character and ends at '9' character, including all of the characters in between.

You can also use 1..5 to create a iterator over the range between 1 and 5 but excluding 5, so that the iterator will contain 1, 2, 3, 4.

You can also use a variable that holds the iterator as the value, meaning that the iterators do not need to be created inline:

let list = vec!["1, 2", "3, 4"].iter();
    let value = "3, 4";

    match value {
        list => {
            println!("Matched");
        }
        _ => {
            println!("No matches");
        }
    }

Note that the example calls .iter() on the vec to store the iterator in list variable and not the vector. Match arms cannot have method calls, so it's important to convert the value to an iterator outside of the match statement.

What are match guards in Rust?

Guards in match statements are additional conditions for a particular branch that the branch must satisfy to consider a successful match. For example, if you want to match over a range of numbers, but also match on whether they're odd or even, match guards can come in handy.

The syntax is also pretty intuitive. It is of the form if => {}.

let value: u8 = 5;

match value {
    0..=9 if value % 2 == 0 => {
        println!("Value is even");
    }
    0..=9 if value % 2 == 1 => {
        println!("Value is odd");
    }
    _ => {
        println!("Unexpected value");
    }
}

The above code will print "Value is odd".

What is binding in Rust?

Binding allows you to store values in variables that can be used within the branch where the binding is present. It's basically assigning variables to certain parts of the pattern.

Pattern Binding

A very simple example is binding the catch-all pattern to a variable instead of ignoring its value using _.

let value: u8 = 5;

match value {
    0..=9 if value % 2 == 0 => {
        println!("Value is even");
    }
    0..=9 if value % 2 == 1 => {
        println!("Value is odd");
    }
    other_value => {
        println!("Unexpected value: {other_value}");
    }
}

Notice that in this example, we used the variable other_value to bind the value of value in the last pattern, which is a catch-all if it doesn't match any of the previous patterns. We can then use the variable in logic for that arm. Here we just print it the console.

Some other examples of binding are:

let value: Option<i32> = Some(43);

match value {
    Some(matched_value) => println!("The value is {matched_value}"),
    None => println!("There is no value")
}

In this example, we bound the value within the Some pattern for storing the inner value of the option, and use it in our logic.

pub struct Person {
    name: String,
    age: u32,
}

let value: Option = Some(Person {
    name: "Name".to_string(),
    age: 23,
});

match value {
    Some(Person { name: person_name, age }) => {
        println!("{person_name} is {age} years old");
    },
    None => {
        println!("The value is empty");
    }
}

We see two different types of binding in this example. The first is assigning a different name to a struct field by destructuring it ( name field), and the other is using the same name as the name of the field ( age field).

The `@` Binding

The official Rust book describes it as:

The at operator @ lets us create a variable that holds a value at the same time as we’re testing that value for a pattern match.

In our example of pattern matching against a range of values, or against an iterator, we can bind the matched value to a variable using this syntax to use it within that branch:

let value: u8 = 5;

match value {
    digit @ 0..=9 => {
        println!("The matched value is {digit}");
    }
    _ => {
        println!("Unexpected value");
    }
}

Here we are binding the matched value from the iterator to the digit variable, that we then use within the branch to read the actual value.

How to Build a JSON Parser – Stage 1: Reader

Before we can parse incoming JSON data, we need to be able to read it in a way which facilitates parsing it. To be able to tokenize the incoming JSON, we need to analyse each character as they come in, and based on whether they represent a literal value, or a delimiter, (or an invalid value), decide what to do with them as well as subsequent characters.

This is a really good use case for using a combination of iterators and Rust's match syntax.

Our reader needs to hold two pieces of data. A buffered reader using which we can iterate over the input, and a character_buffer, which will hold the current character being decoded.

At this point, you may ask why we need to hold the character buffer in the reader and the reason is that JSON is UTF-8 encoded.

What is the UTF-8 byte encoding?

A UTF-8 character can be anywhere between 1 byte to 4 bytes long. We need to be able to parse all of the valid characters because the JSON spec supports these characters. This means that JSON characters can be either of 1-byte, 2-bytes, 3-bytes or 4-bytes long.

For each iteration, we need to read 4 bytes at a time, decide how many characters the 4 bytes contain (for example, these 4 bytes can contain 4 1-byte characters), finish iterating over them and then move on to the reading next 4 bytes and repeating the process. To store this intermediary piece of information, we need the character buffer.

It is also possible that we only have part of the character in the current 4 byte. For example, if you consider 2 1-byte characters followed by 1 3-byte character like 23€, the first 4 bytes will contain 2 valid characters and only part of the next valid character. You also need to be able to handle this, which will involve rewinding the iterator.

It's possible to handle this in a way where we do not need allocations, and for performance reasons it's in fact better to do so. But I will leave that to you as a reader to think about how to implement it in this case, as it is not the focus of this article.

I hope that it's now clear why we iterators are the best tool for the job here.

How to read the data

We are going to support two different readers. One is directly from a buffered reader (which is most commonly created from a file), and the other is from a iterator over bytes.

These are going to be pretty straightforward. For reading from a file, you need to create a buffered cursor over the underlying file data:

let file = File::create("dummy.json").unwrap();
let reader = BufReader::new(file);

Let's start by implementing the JSON Reader struct and these methods on it:

// src/reader.rs

use std::collections::VecDeque;
use std::io::{BufReader, Cursor, Read, Seek};
use std::str::from_utf8;

/// A struct that handles reading input data to be parsed and
/// provides an iterator over said data character-by-character.
pub struct JsonReader
where
    T: Read + Seek,
{
    /// A reference to the input data, which can be anything
    /// that implements [`Read`]
    reader: BufReader,

    /// A character buffer that holds queue of characters to
    /// be used by the iterator.
    ///
    /// This is necessary because UTF-8 can be 1-4 bytes long.
    /// Because of this, the reader always reads 4 bytes at a 
    /// time. We then iterate over "characters", irrespective of 
    /// whether they are 1 byte long, or 4.
    ///
    /// A [`VecDeque`] is used instead of a normal vector 
    /// because characters need to be read out from the start 
    /// of the buffer.
    character_buffer: VecDeque<char>,
}

impl JsonReader
where
    T: Read + Seek,
{
    /// Create a new [`JsonReader`] that reads from a file
    ///
    /// # Examples
    ///
    ///

/// use std::fs::File; /// use std::io::BufReader; /// use json_parser::reader::JsonReader; /// /// let file = File::create("dummy.json").unwrap(); /// let reader = BufReader::new(file); /// /// let json_reader = JsonReader::new(reader); /// ``` pub fn new(reader: BufReader) -> Self { JsonReader { reader, character_buffer: VecDeque::with_capacity(4), } }

/// Create a new [JsonReader] that reads from a given byte stream /// /// # Examples /// /// /// use std::io::{BufReader, Cursor}; /// use json_parser::reader::JsonReader; /// /// let input_json_string = r#"{"key1":"value1","key2":"value2"}"#; /// /// let json_reader = JsonReader::>::from_bytes(input_json_string.as_bytes()); ///

#[must_use] pub fn from_bytes(bytes: &[u8]) -> JsonReader> { JsonReader { reader: BufReader::new(Cursor::new(bytes)), character_buffer: VecDeque::with_capacity(4), } } }


### How to implement the iterator for `JsonReader`

Next, you are going to need to implement the `Iterator` trait on this `JSONReader` which will facilitate parsing.

First, if the character buffer isn't empty already, you can return the first character in buffer from iterator:

```rust
if !self.character_buffer.is_empty() {
    return self.character_buffer.pop_front();
}

If it is empty, you need to create a new buffer and read into that buffer from the reader:

let mut utf8_buffer = [0, 0, 0, 0];
let _ = self.reader.read(&mut utf8_buffer);

Here, you are creating a new array of size 4, and you'll be reading 4 bytes into it from the reader.

Next, you need to parse it as UTF-8. Rust provides you with a from_utf8 function that will try to parse the given bytes as UTF-8. It returns a string containing parsed characters if it was valid.

It returns an error with number of invalid bytes as part of the error information, which you can use to backtrack the reader to only retain the valid characters, and try the next 4 characters from the point of failure.

If that didn't make too much sense, looking at the code will make things clear:

match from_utf8(&utf8_buffer) {
    Ok(string) => {
        self.character_buffer = string.chars().collect();
        self.character_buffer.pop_front()
    }
    Err(error) => {
        // Read valid bytes, and rewind the buffered reader for 
        // the remaining bytes so that they can be read again in the
        // next iteration.

        let valid_bytes = error.valid_up_to();
        let string = from_utf8(&utf8_buffer[..valid_bytes]).unwrap();

        let remaining_bytes = 4 - valid_bytes;

        let _ = self.reader.seek_relative(-(remaining_bytes as i64));

        // Collect the valid characters into character_buffer
        self.character_buffer = string.chars().collect();

        // Return the first character from character_buffer
        self.character_buffer.pop_front()
    }
}

Here's the complete implementation of the Iterator trait:

// src/reader.rs

impl Iterator for JsonReader
where
    T: Read + Seek,
{
    type Item = char;

    #[allow(clippy::cast_possible_wrap)]
    fn next(&mut self) -> Option {
        if !self.character_buffer.is_empty() {
            return self.character_buffer.pop_front();
        }

        let mut utf8_buffer = [0, 0, 0, 0];
        let _ = self.reader.read(&mut utf8_buffer);

        match from_utf8(&utf8_buffer) {
            Ok(string) => {
                self.character_buffer = string.chars().collect();
                self.character_buffer.pop_front()
            }
            Err(error) => {
                // Read valid bytes, and rewind the buffered reader for
                // the remaining bytes so that they can be read again in the
                // next iteration.

                let valid_bytes = error.valid_up_to();
                let string = from_utf8(&utf8_buffer[..valid_bytes]).unwrap();

                let remaining_bytes = 4 - valid_bytes;

                let _ = self.reader.seek_relative(-(remaining_bytes as i64));

                // Collect the valid characters into character_buffer
                self.character_buffer = string.chars().collect();

                // Return the first character from character_buffer
                self.character_buffer.pop_front()
            }
        }
    }
}

And that's all you need to do for reading the input data for parsing. It's time to move on to the next stage in the process.

How to Build a JSON Parser – Stage 2: Prepare Intermediate Data Types

This isn't really a stage in the parsing pipeline, but it is a prerequisite for the next steps. We need to define Rust types that map to all of the possible types that JSON supports.

JSON supports the following data types:

String
Number
Boolean
Array
Object
Null

A number can further be either an integer, or a floating-point number. While you can use f64 as the Rust type for all JSON numbers, practically it's not feasible without littering your code with type casts everywhere when you try to use it.

So in this tutorial, we're going to indeed make that distinction and record that fact.

The value type

Enums are the ideal way to store state like this, where each variant needs to have some identifier as metadata (in this case the type of JSON value), and optionally some data attached to it. The data you're going to attach to these variants will be the actual value of that type in JSON.

// src/value.rs

use std::collections::HashMap;

#[derive(Debug, Copy, Clone, PartialEq)]
pub enum Number {
    I64(i64),
    F64(f64),
}

#[derive(Debug, PartialEq, Clone)]
pub enum Value {
    String(String),
    Number(Number),
    Boolean(bool),
    Array(Vec),
    Object(HashMap<String, Value>),
    Null,
}

The first few variants are pretty straightforward, you define the variant and the data it holds is a corresponding Rust type. The last variant is even simpler, representing the null value which doesn't need further data to be stored.

The Array and Object variants though are a bit more interesting, since they are recursively storing the Enum itself. This makes sense, as arrays in JSON can have any value type that JSON spec supports. And objects in JSON always have string keys and any JSON supported value, including other objects.

How to add helpful conversion methods

You will also need a way to convert the enum type into the underlying types, and throw an error if the underlying data isn't what you expected. This is mostly boilerplate code, so I'm just going to put it all together without further explanation:

// src/value.rs

impl TryFrom<&Value> for String {
    type Error = ();

    fn try_from(value: &Value) -> Result<Self, ()> {
        match value {
            Value::String(value) => Ok(value.clone()),
            _ => Err(()),
        }
    }
}

impl TryFrom<&Value> for i64 {
    type Error = ();

    #[allow(clippy::cast_possible_truncation)]
    fn try_from(value: &Value) -> Result<Self, ()> {
        match value {
            Value::Number(value) => match value {
                Number::I64(value) => Ok(*value),
                Number::F64(value) => Ok(*value as i64),
            },
            _ => Err(()),
        }
    }
}

impl TryFrom<&Value> for f64 {
    type Error = ();

    fn try_from(value: &Value) -> Result<Self, ()> {
        match value {
            Value::Number(value) => match value {
                Number::F64(value) => Ok(*value),
                Number::I64(value) => Ok(*value as f64),
            },
            _ => Err(()),
        }
    }
}

impl TryFrom<&Value> for bool {
    type Error = ();

    fn try_from(value: &Value) -> Result<Self, ()> {
        match value {
            Value::Boolean(value) => Ok(*value),
            _ => Err(()),
        }
    }
}

impl<'a> TryFrom<&'a Value> for &'a Vec {
    type Error = ();

    fn try_from(value: &'a Value) -> Result<Self, ()> {
        match value {
            Value::Array(value) => Ok(value),
            _ => Err(()),
        }
    }
}

#[allow(clippy::implicit_hasher)]
impl<'a> TryFrom<&'a Value> for &'a HashMap<String, Value> {
    type Error = ();

    fn try_from(value: &'a Value) -> Result<Self, ()> {
        match value {
            Value::Object(value) => Ok(value),
            _ => Err(()),
        }
    }
}

How to Build a JSON Parser – Stage 3: Tokenization

The next step is to take the input data and tokenize it.

Tokenization is the process of splitting a large chunk of input into smaller, more digestible units that can then be analysed independently. This also allows you to work with them much more easily than just byte streams and they help represent the incoming data as a standard form, and allow for mapping tokens to output value types.

The parser can then recursively process all tokens until there's nothing to process, giving us the parsed data once it finishes.

How to define expected valid tokens

There is going to be some duplication here compared to the value type you looked at previously, but that's to be expected, since the token representation of any literal value will be that value itself. There's no way to break it down to smaller units in that case.

Once again, Enum is the right data type for this since we need both metadata (as the token type), and optionally data associated with it.

The tokens representing literal values can be defined in this way:

// src/token.rs

use std::io::{Read, Seek};
use std::iter::Peekable;
use crate::reader::JsonReader;

#[derive(Debug, Copy, Clone, PartialEq)]
pub enum Number {
    I64(i64),
    F64(f64),
}

#[derive(Debug, Clone, PartialEq)]
pub enum Token {
    String(String),
    Number(Number),
    Boolean(bool),
    Null,
}

Apart from these, we also have a lot of other tokens in JSON that form the "grammar" of the JSON format. These are:

Curly braces ({ or } ) that represent opening and closing of an object respectively.
Square brackets ([ or ]) that represent opening and closing of an array respectively.
Colon (:) for separating key-value pairs within the object.
Comma (,) for separating values.
Quotes (") that represent opening/closing of the string literal values.

All of these do not need to have any data associated with them, so they're going to be unit variants in the enum. Adding these in, the complete enum will be:

// src/token.rs

use std::io::{Read, Seek};
use std::iter::Peekable;
use crate::reader::JsonReader;
use crate::value::Number;

#[derive(Debug, Clone, PartialEq)]
pub enum Token {
    CurlyOpen,
    CurlyClose,
    Quotes,
    Colon,
    String(String),
    Number(Number),
    ArrayOpen,
    ArrayClose,
    Comma,
    Boolean(bool),
    Null,
}

How to implement the tokenizer struct

You are going to need a JsonTokenizer struct that can facilitate the process while also responsible for holding the state of the tokenizer process:

// src/token.rs

pub struct JsonTokenizer
    where
        T: Read + Seek,
{
    tokens: Vec,
    iterator: Peekable>,
}

impl JsonTokenizer
where
    T: Read + Seek,
{
    pub fn new(reader: File) -> JsonTokenizer {
        let json_reader = JsonReader::::new(BufReader::new(reader));

        JsonTokenizer {
            iterator: json_reader.peekable(),
            tokens: vec![],
        }
    }

    pub fn from_bytes<'a>(input: &'a [u8]) -> JsonTokenizer'a [u8]>> {
        let json_reader = JsonReader::'a [u8]>>::from_bytes(input);

        JsonTokenizer {
            iterator: json_reader.peekable(),
            tokens: Vec::with_capacity(input.len()),
        }
    }
}

In this case, we've made it generic over where the input comes from. The type T needs to implement Read & Seek traits, the reason for which is explained shortly.

The iterator also needs to be Peekable, which basically means we should be able to read the next item in the iterator without advancing the iterator itself.

How to tokenize an iterator of characters

Once you've defined all of the expected tokens, you need to take your character iterator and convert it into a list of tokens, where each entry is a variant of the Token enum defined in the last section.

We'll start by writing a skeleton function that matches on the incoming character and panics if it encounters an invalid token:

// src/token.rs

impl JsonTokenizer where
    T: Read + Seek, {
    pub fn tokenize_json(&mut self) -> Result<&[Token], ()> {
        while let Some(character) = self.iterator.peek() {
            match *character {
                // Parse all other tokens here
                // ...
                character => {
                    if character.is_ascii_whitespace() {
                        continue;
                    }

                    panic!("Unexpected character: ;{character};")
                }
            }
        }

        Ok(&self.tokens)
    }
}

There are two noteworthy things here, let's start with the easy one. If your match block doesn't encounter any known characters (you will implement this shortly), you need to have a "catch-all" condition that matches any character.

Here, we are going to ignore any whitespace characters and continue to the next iteration if it encounters one. If the character isn't a whitespace, then you need to panic (or return error) here.

The next noteworthy thing here is self.iterator.peek(). To facilitate parsing of different kinds of tokens from delimiters to literal values, it is important that the iterator is not advanced when reading out the next character. This needs to happen so that you can conditionally advance it based on what character is next.

You also need to delegate parsing of certain sets of tokens to different functions, which will have their own logic of advancing the iterator.

A good example is parsing the null literal value. If the match encounters a n character and is not within a string, object, number, and so on, then you need to ensure that the next three characters are u, l, l respectively to form the literal value null and then advance the iterator by four so that the next loop starts parsing after the null character and not in the middle of it.

How to parse string tokens

We're going to start by parsing strings. Let's stop for a second and think what needs to happen step-by-step:

Check if match encounters a " character. If it does, push Token::Quote to your list of output tokens.
Advance the iterator by one so the next steps start from after the " character.
Parse all characters as part of the string until you encounter another " character which indicates closing of the string value.
Advance the iterator by however many characters are parsed as part of the string, and one addition to also jump over the closing " character.
Push Token::String with the parsed value to your list of output tokens.
Push Token::Quote to your list of output tokens.

Hopefully, that isn't too confusing. But the code should help you understand it better:

// src/token.rs

impl JsonTokenizer
    where
        T: Read + Seek,
{
    // ...

    pub fn tokenize_json(&mut self) -> Result<&[Token], ()> {
        while let Some(character) = self.iterator.peek() {
            match *character {
                '"' => {
                    // Pushed opening quote to output tokens list.
                    self.tokens.push(Token::Quotes);

                    // Skip quote token since we already added it to the tokens list.
                    let _ = self.iterator.next();

                    // Delegate parsing string value to a separate function.
                    // The function should also take care of advancing the iterator properly.
                    let string = self.parse_string();

                    // Push parsed string to output tokens list.
                    self.tokens.push(Token::String(string));

                    // Pushed closing quote to output tokens list.
                    self.tokens.push(Token::Quotes);
                }
                // ...
            }
        }

        Ok(&self.tokens)
    }

    fn parse_string(&mut self) -> String {
        // Create new vector to hold parsed characters.
        let mut string_characters = Vec::<char>::new();

        // Take each character by reference so that they
        // aren't moved out of the iterator, which will
        // require you to move the iterator into this
        // function.
        for character in self.iterator.by_ref() {
            // If it encounters a closing `"`, break
            // out of the loop as the string has ended.
            if character == '"' {
                break;
            }

            // Continue pushing to the vector to build
            // the string.
            string_characters.push(character);
        }

        // Create a string out of character iterator and
        // return it.
        String::from_iter(string_characters)
    }
}

As I've previously mentioned, we're not going to look at handling escape characters in this tutorial, as they do not add much value towards learning the topic at hand, but if you're interested, it will be a good exercise for you to add that in on top of the implementation.

That takes care of parsing the string, we can move on to a more interesting value type next.

How to parse number tokens

Numbers in JSON spec have a lot of variation. They can either be positive or negative, and integers or decimals. They can also be defined as scientific notation (for example negative exponential 3.7e-5 or positive exponential 3.7e5). And we need to parse all of these variations.

As always, we'll start with the easy bit. If we encounter any character that can be a valid character in number, you need to delegate parsing to a parse_number function. But also, any valid number can only start with either a digit, or a negative sign. A number cannot begin with a decimal character or an epsilon character, so it makes things easier for us.

// src/token.rs

impl JsonTokenizer
    where
        T: Read + Seek,
{
    // ...

    pub fn tokenize_json(&mut self) -> Result<&[Token], ()> {
        while let Some(character) = self.iterator.peek() {
            match *character {
                // ...

                '-' | '0'..='9' => {
                    let number = self.parse_number()?;
                    self.tokens.push(Token::Number(number));
                }

                // ...
            }
        }

        Ok(&self.tokens)
    }

    // ...
}

Next, we'll implement the parse_number method:

// src/token.rs

impl JsonTokenizer
    where
        T: Read + Seek,
{
    // ...

    fn parse_number(&mut self) -> Result {
        // Store parsed number characters.
        let mut number_characters = Vec::<char>::new();

        // Stores whether the digit being parsed is after a `.` character
        // making it a decimal.
        let mut is_decimal = false;

        // Stores the characters after an epsilon character `e` or `E`
        // to indicate the exponential value.
        let mut epsilon_characters = Vec::<char>::new();

        // Stores whether the digit being parsed is part of the epsilon
        // characters.
        let mut is_epsilon_characters = false;

        while let Some(character) = self.iterator.peek() {
            match character {
                // Match the negative sign character that indicates whether number is negative
                '-' => {
                    if is_epsilon_characters {
                        // If it's parsing epsilon characters, push it to the epsilon
                        // character set.
                        epsilon_characters.push('-');
                    } else {
                        // Otherwise, push it to normal character set.
                        number_characters.push('-');
                    }

                    // Advance the iterator by 1.
                    let _ = self.iterator.next();
                }
                // Match a positive sign, which can be treated as redundant and ignored since
                // positive is the default.
                '+' => {
                    // Advance the iterator by 1.
                    let _ = self.iterator.next();
                }
                // Match any digit between 0 and 9, and store it into the `digit`
                // variable.
                digit @ '0'..='9' => {
                    if is_epsilon_characters {
                        // If it's parsing epsilon characters, push it to the epsilon
                        // character set.
                        epsilon_characters.push(*digit);
                    } else {
                        // Otherwise, push it to normal character set.
                        number_characters.push(*digit);
                    }
                    // Advance the iterator by 1.
                    let _ = self.iterator.next();
                }
                // Match the period character which indicates start of the fractional
                // part of a decimal number.
                '.' => {
                    // Push the decimal character to numbers character set.
                    number_characters.push('.');

                    // Set the current state of number being decimal to true.
                    is_decimal = true;

                    // Advance the iterator by 1.
                    let _ = self.iterator.next();
                }
                // Match any of the characters that can signify end of the number
                // literal value. This can be a comma which separates key-value pair,
                // closing object character, closing array character, or a `:` which
                // separates a key from its value.
                '}' | ',' | ']' | ':' => {
                    break;
                }
                // Match the epsilon character which indicates that the number is in
                // scientific notation.
                'e' | 'E' => {
                    // Panic if it's already parsing an exponential number since this would
                    // mean there are 2 epsilon characters which is invalid.
                    if is_epsilon_characters {
                        panic!("Unexpected character while parsing number: {character}. Double epsilon characters encountered");
                    }

                    // Set the current state of number being in scientific notation to true.
                    is_epsilon_characters = true;

                    // Advance the iterator by 1.
                    let _ = self.iterator.next();
                }
                // Panic if any other character is encountered.
                other => {
                    if !other.is_ascii_whitespace() {
                        panic!("Unexpected character while parsing number: {character}")
                    } else {
                        self.iterator.next();
                    }
                },
            }
        }

        if is_epsilon_characters {
            // if the number is an exponential, perform the calculations to convert it
            // to a floating point number in rust.

            // Parse base as floating point number.
            let base: f64 = String::from_iter(number_characters).parse().unwrap();

            // Parse exponential as floating point number.
            let exponential: f64 = String::from_iter(epsilon_characters).parse().unwrap();

            // Return the final computed decimal number.
            Ok(Number::F64(base * 10_f64.powf(exponential)))
        } else if is_decimal {
            // if the number is a decimal, parse it as a floating point number in rust.
            Ok(Number::F64(
                String::from_iter(number_characters).parse::<f64>().unwrap(),
            ))
        } else {
            // Parse the number as an integer in rust.
            Ok(Number::I64(
                String::from_iter(number_characters).parse::<i64>().unwrap(),
            ))
        }
    }
}

It is advisable for you to go through the code and read the comments to understand this function. You shouldn't encounter any new syntax that is not either covered already or assumed to be known by the reader.

How to parse boolean tokens

Parsing booleans is going to be the simplest one we look at so far. All we need to do is match t or f as the first character, and then check the next few characters to ensure they form the literal value true or false.

// src/token.rs

impl JsonTokenizer
    where
        T: Read + Seek,
{
    // ...

    pub fn tokenize_json(&mut self) -> Result<&[Token], ()> {
        while let Some(character) = self.iterator.peek() {
            match *character {
                // ...

                // Match `t` character which indicates beginning of a boolean literal.
                't' => {
                    // Advance iterator by 1.
                    let _ = self.iterator.next();

                    // Assert next character is `r` while advancing the iterator by 1.
                    assert_eq!(Some('r'), self.iterator.next());
                    // Assert next character is `u` while advancing the iterator by 1.
                    assert_eq!(Some('u'), self.iterator.next());
                    // Assert next character is `e` while advancing the iterator by 1.
                    assert_eq!(Some('e'), self.iterator.next());

                    // Push the literal value to token list.
                    self.tokens.push(Token::Boolean(true));
                }
                'f' => {
                    // Advance iterator by 1.
                    let _ = self.iterator.next();

                    // Assert next character is `a` while advancing the iterator by 1.
                    assert_eq!(Some('a'), self.iterator.next());
                    // Assert next character is `l` while advancing the iterator by 1.
                    assert_eq!(Some('l'), self.iterator.next());
                    // Assert next character is `s` while advancing the iterator by 1.
                    assert_eq!(Some('s'), self.iterator.next());
                    // Assert next character is `e` while advancing the iterator by 1.
                    assert_eq!(Some('e'), self.iterator.next());

                    // Push the literal value to token list.
                    self.tokens.push(Token::Boolean(false));
                }

                // ...
            }
        }

        Ok(&self.tokens)
    }
}

How to parse Null Literal

This is very similar to how we parsed booleans in the previous step:

// src/token.rs

impl JsonTokenizer
    where
        T: Read + Seek,
{
    // ...

    pub fn tokenize_json(&mut self) -> Result<&[Token], ()> {
        while let Some(character) = self.iterator.peek() {
            match *character {
                // ...

                'n' => {
                    // Advance iterator by 1.
                    let _ = self.iterator.next();

                    // Assert next character is `u` while advancing the iterator by 1.
                    assert_eq!(Some('u'), self.iterator.next());
                    // Assert next character is `l` while advancing the iterator by 1.
                    assert_eq!(Some('l'), self.iterator.next());
                    // Assert next character is `l` while advancing the iterator by 1.
                    assert_eq!(Some('l'), self.iterator.next());

                    // Push null literal value to output tokens list.
                    self.tokens.push(Token::Null);
                }

                // ...
            }
        }

        Ok(&self.tokens)
    }
}

How to parse delimiters

Parsing delimiters is very simple. All you need to do is to match on them, and push the respective token into the output token list:

// src/token.rs

impl JsonTokenizer
    where
        T: Read + Seek,
{
    // ...

    pub fn tokenize_json(&mut self) -> Result<&[Token], ()> {
        while let Some(character) = self.iterator.peek() {
            match *character {
                // ...

                '{' => {
                    self.tokens.push(Token::CurlyOpen);
                    let _ = self.iterator.next();
                }
                '}' => {
                    self.tokens.push(Token::CurlyClose);
                    let _ = self.iterator.next();
                }
                '[' => {
                    self.tokens.push(Token::ArrayOpen);
                    let _ = self.iterator.next();
                }
                ']' => {
                    self.tokens.push(Token::ArrayClose);
                    let _ = self.iterator.next();
                }
                ',' => {
                    self.tokens.push(Token::Comma);
                    let _ = self.iterator.next();
                }
                ':' => {
                    self.tokens.push(Token::Colon);
                    let _ = self.iterator.next();
                }

                // ...
            }
        }

        Ok(&self.tokens)
    }
}

How to parse a terminating character

The input can sometimes contain \0 as the last character to indicate that the input has ended. This is more commonly known as EOF (End Of File) when dealing with files. It is also referred by other names like "escape sequence" or "null" character.

We need to handle it and break out of our parsing loop if we ever encounter this:

// src/token.rs

impl JsonTokenizer
    where
        T: Read + Seek,
{
    // ...

    pub fn tokenize_json(&mut self) -> Result<&[Token], ()> {
        while let Some(character) = self.iterator.peek() {
            match *character {
                // ...

                '\0' => break,
                other => {
                    if !other.is_ascii_whitespace() {
                        panic!("Unexpected token encountered: {other}")
                    } else {
                        self.iterator.next();
                    }
                },

                // ...
            }
        }

        Ok(&self.tokens)
    }
}

How to Build a JSON Parser – Stage 4: From Tokens To Value

Now that you have all the tokens, it's time to move on to the final stage of the process, converting tokens to real values that you can work with in the Rust code.

Start by creating a unit struct, which can be used as the parser. At this stage, we don't need to hold any state for the entirety of the process:

// src/parser.rs

use std::collections::HashMap;
use std::fs::File;
use std::io::{BufReader, Cursor};
use std::iter::Peekable;
use std::slice::Iter;
use crate::token::{JsonTokenizer, Token};
use crate::value::Value;

/// Main parser which is the entrypoint for parsing JSON.
pub struct JsonParser;

We are also going to use this as the public interface for the parser. So let's start by implementing those methods first:

// src/parser.rs

impl JsonParser {
    /// Create a new [`JsonParser`] that parses JSON from bytes.
    pub fn parse_from_bytes<'a>(input: &'a [u8]) -> Result {
        let mut json_tokenizer = JsonTokenizer::u8]>>>::from_bytes(input);
        let tokens = json_tokenizer.tokenize_json()?;

        Ok(Self::tokens_to_value(tokens))
    }

    /// Create a new [`JsonParser`] that parses JSON from file.
    pub fn parse(reader: File) -> Result {
        let mut json_tokenizer = JsonTokenizer::>::new(reader);
        let tokens = json_tokenizer.tokenize_json()?;

        Ok(Self::tokens_to_value(tokens))
    }
}

With that out of the way, you first need to implement the tokens_to_value method that these public methods are calling.

How to parse primitives

This method will be responsible for taking an iterator of tokens as input and outputting the Value type you defined previously. This is also pretty straightforward, since the object/array parsing is delegated to separate methods, which we'll look at shortly.

// src/parser.rs

impl JsonParser {
    // ...

    fn tokens_to_value(tokens: &[Token]) -> Value {
        // Create a peekable iterator over tokens
        let mut iterator = tokens.iter().peekable();

        // Initialize final value to null.
        let mut value = Value::Null;

        // Loop while there are tokens in the iterator.
        // Note that you do not need to manually handle advancing the
        // iterator in this case which is why you can directly call
        // `iterator.next()`.
        while let Some(token) = iterator.next() {
            match token {
                Token::CurlyOpen => {
                    value = Value::Object(Self::process_object(&mut iterator));
                }
                Token::String(string) => {
                    value = Value::String(string.clone());
                }
                Token::Number(number) => {
                    value = Value::Number(*number);
                }
                Token::ArrayOpen => {
                    value = Value::Array(Self::process_array(&mut iterator));
                }
                Token::Boolean(boolean) => value = Value::Boolean(*boolean),
                Token::Null => value = Value::Null,
                // Ignore all delimiters as you don't need to explicitly do anything
                // when you encounter them.
                Token::Comma
                | Token::CurlyClose
                | Token::Quotes
                | Token::Colon
                | Token::ArrayClose => {}
            }
        }

        value
    }
}

How to parse arrays

Parsing arrays is almost as straightforward as the parsing logic we looked at above. Since arrays are just collection of other JSON values, there's not much logic involved into parsing them, unlike objects.

// src/parser.rs

impl JsonParser {
    fn process_array(iterator: &mut Peekable>) -> Vec {
        // Initialise a vector of JSON Value type to hold the value of
        // array that's currently being parsed.
        let mut internal_value = Vec::::new();

        // Iterate over all tokens provided.
        while let Some(token) = iterator.next() {
            match token {
                Token::CurlyOpen => {
                    internal_value.push(Value::Object(Self::process_object(iterator)));
                }
                Token::String(string) => internal_value.push(Value::String(string.clone())),
                Token::Number(number) => internal_value.push(Value::Number(*number)),
                Token::ArrayOpen => {
                    internal_value.push(Value::Array(Self::process_array(iterator)));
                }
                // Break loop if array is closed. Due to recursive nature of process_array,
                // we don't need to explicitly check if the closing token matches the opening
                // one.
                Token::ArrayClose => {
                    break;
                }
                Token::Boolean(boolean) => internal_value.push(Value::Boolean(*boolean)),
                Token::Null => internal_value.push(Value::Null),
                // Ignore delimiters
                Token::Comma | Token::CurlyClose | Token::Quotes | Token::Colon => {}
            }
        }

        internal_value
    }
}

How to parse objects

Parsing objects is a bit more tricky than the previous value types, since objects come with their own syntax. But there should be no surprises for you, which is why I encourage you to read through the code and the comments below to understand how it works.

// src/parser.rs

impl JsonParser {
    fn process_object(iterator: &mut Peekable>) -> HashMap<String, Value> {
        // Whether the item being parsed is a key or a value. The first element
        // should always be a key so this is initialised to true.
        let mut is_key = true;

        // The current key for which the value is being parsed.
        let mut current_key: Option<&str> = None;

        // The current state of parsed object.
        let mut value = HashMap::<String, Value>::new();

        while let Some(token) = iterator.next() {
            match token {
                // If it is a nested object, recursively parse it and store
                // in the hashmap with current key.
                Token::CurlyOpen => {
                    if let Some(current_key) = current_key {
                        value.insert(
                            current_key.to_string(),
                            Value::Object(Self::process_object(iterator)),
                        );
                    }
                }
                // If this token is encountered, break the loop since it
                // indicates end of an object being parsed.
                Token::CurlyClose => {
                    break;
                }
                Token::Quotes | Token::ArrayClose => {}
                // If the token is a colon, it is the separator between key
                // and value pair. So the item being parsed from this point
                // ahead will not be a key.
                Token::Colon => {
                    is_key = false;
                }
                Token::String(string) => {
                    if is_key {
                        // If the process is presently parsing key, set the value
                        // as current key.
                        current_key = Some(string);
                    } else if let Some(key) = current_key {
                        // If the process already has a key set for present item,
                        // parse string as value instead, and set the current_key to none
                        // once done to prepare for the next key-value pair.
                        value.insert(key.to_string(), Value::String(string.clone()));
                        // Set current_key to None to prepare for the next key-value pair.
                        current_key = None;
                    }
                }
                Token::Number(number) => {
                    if let Some(key) = current_key {
                        value.insert(key.to_string(), Value::Number(*number));
                        // Set current_key to None to prepare for the next key-value pair.
                        current_key = None;
                    }
                }
                Token::ArrayOpen => {
                    if let Some(key) = current_key {
                        value.insert(key.to_string(), Value::Array(Self::process_array(iterator)));
                        // Set current_key to None to prepare for the next key-value pair.
                        current_key = None;
                    }
                }
                // If the token is a comma, it is the separator between multiple key-value pairs
                // in JSON. So the item being parsed from this point ahead will be a key.
                Token::Comma => is_key = true,
                Token::Boolean(boolean) => {
                    if let Some(key) = current_key {
                        value.insert(key.to_string(), Value::Boolean(*boolean));
                        // Set current_key to None to prepare for the next key-value pair.
                        current_key = None;
                    }
                }
                Token::Null => {
                    if let Some(key) = current_key {
                        value.insert(key.to_string(), Value::Null);
                        // Set current_key to None to prepare for the next key-value pair.
                        current_key = None;
                    }
                }
            }
        }

        value
    }
}

And that's it. You should now have everything to start using this to parse a valid JSON file into Rust.

How to Use the JSON parser

Let's create a new example in the project to run our JSON parser:

mkdir examples; touch examples/json.rs

You also need to register it as an example in the Cargo.toml file:

[package]
name = "json-parser"
version = "0.1.0"
edition = "2021"

[dependencies]

[[example]]
path = "examples/json.rs"
name = "json"

Now let's write the code to run for this example. We start by copying over a sample JSON file to the root of the project, which you can find here.

// examples/json.rs

use std::fs::File;
use json_parser::parser::JsonParser;

fn main() {
    let file = File::open("test.json").unwrap();
    let parser = JsonParser::parse(file).unwrap();

    dbg!(parser);
}

Running this code using the following command, you should see the same output as below:

cargo run --example json --release

[examples/json.rs:8:5] parser = Object(
    {
        "pairs": Array(
            [
                Object(
                    {
                        "x1": Number(
                            F64(
                                41.844453001935875,
                            ),
                        ),
                        "y0": Number(
                            F64(
                                -33.78221816487377,
                            ),
                        ),
                        "y1": Number(
                            F64(
                                -78.10213222087448,
                            ),
                        ),
                        "x0": Number(
                            F64(
                                95.26235434764715,
                            ),
                        ),
                    },
                ),
                Object(
                    {
                        "x0": Number(
                            F64(
                                115.42029308864215,
                            ),
                        ),
                        "y0": Number(
                            F64(
                                1.2002187300000001e-5,
                            ),
                        ),
                        "x1": Number(
                            F64(
                                83.39640643072113,
                            ),
                        ),
                        "y1": Number(
                            F64(
                                28.643090267505812,
                            ),
                        ),
                    },
                ),
                Object(
                    {
                        "isWorking": Boolean(
                            true,
                        ),
                        "sample": String(
                            "string sample",
                        ),
                        "nullable": Null,
                        "isNotWorking": Boolean(
                            false,
                        ),
                    },
                ),
            ],
        ),
        "utf8": Object(
            {
                "key2": String(
                    "value2",
                ),
                "key1": String(
                    "ࠄࠀࠆࠄࠀࠁࠃ",
                ),
            },
        ),
    },
)

Congratulations! You've now written your very own JSON parser, while learning some of the advanced use cases of match and iterators in Rust.

Wrapping Up

I hope you can already see interesting ways you can make use of what you've learned today to optimize existing Rust code in your projects, and any future code you write that involves these.

You can find the complete code for everything we looked at in this article in this repository.

Also, feel free to contact me if you have any questions or opinions on this topic.

Enjoying my work?

Consider buying me a coffee to support my work!

☕Buy me a coffee.

'Till next time, happy coding and wishing you clear skies!

Procedural Macros in Rust – A Handbook for Beginners

Anshul Sanghi — Wed, 24 Apr 2024 17:49:17 +0000

In this handbook, you'll learn about procedural macros in Rust, and what purposes they serve. You'll also learn how to write your own procedural macros with both hypothetical and real-world examples.

This guide assumes that you're familiar with Rust and its basic concepts, such as data-types, iterators, and traits. If you need to establish or review your Rust basics, check out this interactive course.

You don't need any prior knowledge of macros, as this article covers everything from the ground up.

What are Macros in Rust?
1. Types of Macros in Rust
2. Types of Procedural Macros
Prerequisites
1. Helpful Dependencies
How to Write a Simple Derive Macro
A More Elaborate Derive macro
A Simple Attribute Macro
A More Elaborate Attribute macro
A Simple Function-like Macro
A More Elaborate Function-like Macro
Beyond Writing Macros
1. Helpful Crates/Tools
Downsides of Macros
Wrapping Up
1. Enjoying my work?

What are Macros in Rust?

Macros are an integral part of the Rust programming language. It doesn’t take long before you start encountering them when first learning the language.

In their simplest form, macros in Rust allow you to execute some code at compile-time. Rust pretty much allows you to do whatever you want when it comes to macros and what you can do with them. The most common use-case of this feature is writing code that generates other code.

Macros are a way to extend functionality of the compiler beyond what's supported as standard. Whether you want to generate code based on existing code, or you want to transform existing code in some form, macros are your go-to tool.

Here's how the official Rust book describes it:

The term macro refers to a family of features in Rust.

Fundamentally, macros are a way of writing code that writes other code, which is known as metaprogramming.

Metaprogramming is useful for reducing the amount of code you have to write and maintain, which is also one of the roles of functions. However, macros have some additional powers that functions don’t.

Using macros, you can also dynamically add things that are required to be added at compilation time, which is not possible using functions since they get called at runtime. One such feature, for example, is implementing Traits on types, which is required to be implemented at compilation time.

Another advantage of macros is that they can be very flexible, since they can take a dynamic amount of parameters or inputs unlike a function.

Macros do have their own syntax for both writing and using them, which we'll explore in detail in the coming sections.

Some examples of how macros are being used really helps convey just how powerful they are:

The SQLx project uses macros to verify all your SQL queries and statements (as long as you created them using the provided macro) at compile-time by actually executing them against a running instance of DB (yes, at compile time).
typed_html implements a complete HTML parser with compile-time validation, all while using the familiar JSX syntax.

Types of Macros in Rust

In Rust, there are 2 different types of macros: declarative and procedural.

Declarative macros

Declarative macros work based on syntax parsing. While the official docs define them as allowing you to write syntax extensions, I believe it's more intuitive to consider them as an advanced version of the match keyword for the compiler.

You can define one or more patterns to match, and their body should return the output Rust code you'd like the macro to produce.

We're not going to be talking about them in this article, but if you'd like to learn more, this is a good place to start.

Procedural macros

These macros, in their most basic use cases, execute any Rust code you want at compile time. The only requirement is that they should take Rust code as input, and return Rust code as output.

There's no special syntax parsing involved for writing these macros (unless you want to do so), which is why they're personally easier for me to understand and write.

Procedural macros are further divided into 3 categories: derive macros, attribute macros, and functional macros.

Types of Procedural Macros

Derive macros

Derive macros are, generally speaking, applied to data types in Rust. They are a way to extend the type declaration to also automatically "derive" functionality for it.

You can use them to generate "derived" types from a type, or as a way to implement methods on the target data type automatically. This should make sense once you look at the following example below.

Printing non-primitive data types, such as structs, enums or even errors (which are just structs, but let's assume they're not), for debugging purposes is a very common feature for any language, not just Rust. In Rust, only primitives implicitly have the ability to be printed in "debug" contexts.

If you think about how everything in Rust is just traits (even basic operations like add and equals), this makes sense. You want to be able to debug print your custom data types, but Rust has no way of saying "please apply this trait to every single data type in the code out there, ever".

This is where the Debug derive macro comes in. There's a standard way of debug-printing each type of data structure in Rust that it uses for its internal types. The Debug macro allows you to automatically implement the Debug trait for your custom types, while following the same rules and style guide as the implementation for internal data types.

// Derive macro examples

/// Example for deriving methods on data types
#[derive(Debug)]
pub struct User {
    username: String,
    first_name: String,
    last_name: String,
}

The Debug derive macro will result in the following code (presentational, not exact):

impl core::fmt::Debug for User {
    fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result {
        f.debug_struct(
            "User"
        )
        .field("username", &self.username)
        .field("first_name", &self.first_name)
        .field("last_name", &self.last_name)
        .finish()
    }
}

As you might be able to tell, nobody wants to write this code for all of their custom structs and enums again and again. This simple macro gives you a sense of both the power of macros in Rust, as well as why they're an essential part of the language itself.

During actual compilation, the same code would give the following as the result:

pub struct User {
    username: String,
    first_name: String,
    last_name: String,
}

impl core::fmt::Debug for User {
    fn fmt(&self, f: &mut core::fmt::Formatter) -> ::core::fmt::Result {
        f.debug_struct(
            "User"
        )
        .field("username", &self.username)
        .field("first_name", &self.first_name)
        .field("last_name", &self.last_name)
        .finish()
    }
}

Notice how the original type declaration is preserved in the output code. This is one of the main differences between derive macros vs others. Derive macros preserve the input type without modifications. They only add additional code to the output. On the other hand, all the other macros do not behave the same way. They only preserve the target when the output for macro itself includes the target as well.

Attribute macros

Attribute macros, in addition to data types, are usually applied to code blocks such as functions, impl blocks, inline blocks, and so on. They're usually used to either transform the target code in some way, or annotate it with additional information.

The most common use case for these is to modify a function to add additional functionality or logic to it. For example, you can easily write an attribute macro that:

Logs all input and output parameters
Logs the total runtime of the function
Counts the number of times that function is called
Adds pre-determined additional fields to any struct

and so on.

All of the things I mentioned above, and much more, combined form the insanely popular and useful instrumentation macro in Rust provided by the tracing crate. Of course I'm massively simplifying here, but it's good enough as an example.

If you're used to using Clippy, it might have screamed at you a couple of times to add the #[must_use] attribute to your function or method.

That is an example of macros used to annotate the function with additional information. It tells the compiler to warn the user if the return value from this function call isn't used. The Result type is already annotated with #[must_use] by default, which is how you see the warning Unused Result<...> that must be used when you don't use a return value of Result type.

Attribute macros are also what powers conditional compilation in Rust.

Functional macros

Functional macros are macros disguised as functions. These are the least restrictive type of procedural macros, as they can be used literally anywhere, as long as they output code that's valid in the context that they're used in.

These macros aren't "applied" to anything unlike the 2 others, but rather called just like you'd call a function. As arguments, you can literally pass in anything you want, as long as your macro knows how to parse it. This includes everything all the way from no arguments to valid Rust code to random gibberish that only your macro can make sense of.

They're in a sense the procedural version of declarative macros. If you need to execute Rust code and also be able to parse custom syntax, functional macros are your go-to tool. They're also useful if you need macro-like functionality in places where other macros cannot be used.

After that lengthy description of the basic information regarding macros, it's finally time to dive into actually writing procedural macros.

Prerequisites

There are certain rules around writing your own procedural-macros that you'll need to follow. These rules apply to all 3 types of procedural macros. They are:

Procedural macros can only be added to a project that is marked as proc-macro in Cargo.toml
Projects marked as such cannot export anything other than procedural macros.
The macros themselves have to all be declared in the lib.rs file.

Let’s begin by setting up our project with this code:

cargo new --bin my-app
cd my-app
cargo new --lib my-app-macros;

This will create a root project, as well as a sub-project within it that will host our macros. You need some changes in the Cargo.toml files for both these projects.

First, the Cargo.toml file for my-app-macros should have the following contents (notice that you need to declare a lib section that has the proc-macro property):

# my-app/my-app-macros/Cargo.toml

[package]
name = "my-app-macros"
version = "0.1.0"
edition = "2021"

[lib]
name = "my_app_macros"
path = "src/lib.rs"
proc-macro = true

[dependencies]

Next, the Cargo.toml file for my-app should have the following contents:

# my-app/Cargo.toml

workspace = { members = ["my-app-macros"] }

[package]
name = "my-app"
version = "0.1.0"
edition = "2021"
resolver = "2"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
my-app-macros = { path = "./my-app-macros" }

You need to set the dependency resolver version to “2”, and add your macros project as a dependency of the my-app project.

Helpful dependencies

From the compiler’s point of view, this is how macros work:

They take a stream of tokens as input (and optionally a stream of tokens as arguments to the macro itself).
They return a stream of tokens as output.

That’s all that the compiler knows! And as you'll soon see, it's enough for the compiler to know that.

This does create a problem though. You need to be able to make sense of this "stream of tokens" in a way where you correctly understand them, whether as Rust code or custom syntax, are able to modify them, and also output them. Doing so manually is no easy task, and for the purposes of this tutorial, it is out of scope.

We can, however, rely on great open source work done by many developers to ease this for us. You need to add a few dependencies to help with this problem:

syn — A syntax parser for Rust. This helps you to parse the input token stream as Rust AST. AST is a concept that you mostly run into when trying to write your own interpreter or compiler, but a basic understanding is essential for working with macros. Macros, after all, are just extensions that you write for the compiler in a sense. If you’re interested in learning more about what ASTs are, check out this very helpful introduction.
quote — quote is, and this is a huge generalisation, a crate that helps us perform the reverse operation of what syn does. It helps us convert Rust source code into a stream of tokens that we can output from our macro.
proc-macro2 — The standard library provides a proc-macro crate, but the types it provides cannot exist outside of procedural macros. proc-macro2 is a wrapper around the standard library that makes all of the internal types usable outside of the context of macros. This, for example, allows both syn and quote to not only be used for procedural macros, but in regular Rust code as well, should you ever have such a need. And we will indeed be using that extensively if we ever want to unit test our macros or their expansions.
darling–It facilitates parsing and working with macro arguments, which is otherwise a tedious process due to having to manually parse it from the syntax tree. darling provides us with serde-like ability to automatically parse input argument tree into our arguments struct. It also helps us in error handling around invalid arguments, required arguments, and so on.

While these projects are contributed to by many developers, I want to give special thanks to David Tolnay. He's a legend in the Rust community and is the creator of most of these projects, and many many more open source crates in Rust.

Let’s quickly add these dependencies to our project and start writing our macro:

// my-app-macros

cargo add syn quote proc-macro2 darling

How to Write a Simple Derive Macro

You are going to learn how to write a Derive macro in this section. By this time, you should already be aware of the different types of macros and what they entail, as we talked about them in the above sections.

The `IntoStringHashMap` Derive Macro

Let's say you have an app where you need to be able to convert structs into hash maps, that uses the String type for both keys and values. This means that it should work with any struct where all of the fields are convertible to String type using the Into trait.

How to Declare a Derive Macro

You declare macros by creating a function, and annotating that function using attribute macros that tell the compiler to consider that function as a macro declaration. Since your lib.rs is empty right now, you also need to declare proc-macro2 as an extern crate:

// my-app-macros/src/lib.rs
extern crate proc_macro;

use proc_macro::TokenStream;

#[proc_macro_derive(IntoStringHashMap)]
pub fn derive_into_hash_map(item: TokenStream) -> TokenStream {
    todo!()
}

All we’re doing here is declaring our macro as a derive macro with the identifier IntoStringHashMap. Note that the function name is not important here. What's important is the identifier passed to the proc_macro_derive attribute macro.

Let's immediately see how you can use this – we'll come back and finish the implementation later:

// my-app/src/main.rs

use my_app_macros::IntoStringHashMap;

#[derive(IntoStringHashMap)]
pub struct User {
    username: String,
    first_name: String,
    last_name: String,
    age: u32,
}

fn main() {

}

You can just use your macro as any other derive macro, using the identifier you declared for it (in this case it was IntoStringHashMap).

If you try and compile your code at this stage, you should see the following compilation error:

   Compiling my-app v0.1.0 

error: proc-macro derive panicked
 --> src/main.rs:3:10
  |
3 | #[derive(IntoHashMap)]
  |          ^^^^^^^^^^^
  |
  = help: message: not yet implemented

error: could not compile `my-app` (bin "my-app") due to 1 previous error

This clearly proves that our macro was executed during the compilation stage, as, if you're not familiar with the todo!() macro, panics with help: message: not yet implemented when executed.

This means that both our macro declaration and its usage works. We can move on to actually implementing this macro now.

How to Parse the Macro's Input

First, you parse the input token stream as a DeriveInput using syn, which is a representation of any target that you can use a derive macro with:

let input = syn::parse_macro_input!(item as syn::DeriveInput);

syn provides us with the parse_macro_input macro that uses a somewhat custom syntax as its arguments. You provide it the name of your input variable, the as keyword, and the data type in syn that it should parse the input token stream as (in our case, a DeriveInput).

If you jump into the source code for DeriveInput, you'll see that it gives us the following information:

attrs: Attributes applied to this type, whether other attribute macros declared by us, or the built-in ones such as must_use.
vis: The visibility specifier for this type declaration.
ident: The identifier (name) of the type.
generics: Information about the generic parameters this type takes, including lifetimes.
data: An enum that describes whether the target is a struct, an enum, or a union, and also provides us with more information for each of these.

These field names and their types (apart from data field) are pretty standard across targets supported by syn, such as functions, enums, and so on.

If you further jump into the declaration of the Data enum, and into DataStruct in particular, you'll see that it provides you with a field called fields. This is a collection of all the fields of this struct and you can use it to iterate over them. This is exactly what we need to build our hash map!

The complete implementation for this macro looks like this:

// my-app/my-app-macros/lib.rs

extern crate proc_macro2;

use proc_macro::TokenStream;
use quote::quote;
use syn::Data;

#[proc_macro_derive(IntoHashMap)]
pub fn into_hash_map(item: TokenStream) -> TokenStream {
    let input = syn::parse_macro_input!(item as syn::DeriveInput);

    let struct_identifier = &input.ident;

    match &input.data {
        Data::Struct(syn::DataStruct { fields, .. }) => {
            let mut implementation = quote!{
                let mut hash_map = std::collections::HashMap::<String, String>::new();
            };

            for field in fields {
                let identifier = field.ident.as_ref().unwrap();
                implementation.extend(quote!{
                    hash_map.insert(stringify!(#identifier).to_string(), String::from(value.#identifier));
                });
            }

            quote! {
                #[automatically_derived]
                impl From<#struct_identifier> for std::collections::HashMap<String, String> {
                    fn from(value: #struct_identifier) -> Self {
                        #implementation

                        hash_map
                    }
                }
            }
        }
        _ => unimplemented!()
    }.into()
}

There's a lot going on here, so let's break it down:

How to Ensure a `struct` Target for the Macro

let struct_identifier = &input.ident;: You store the struct identifier into a separate variable, so that you can easily use it later.

match &input.data {
    Data::struct(syn::DataStruct { fields, .. }) => { ... },
    _ => unimplemented!()
}

You match over the parsed data field from DeriveInput. If it is of type DataStruct (a Rust struct) then continue, else panic, as the macro isn't implemented for other types.

How to Build the Output Code

Let's take a look at the match arm implementation when you do have a DataStruct:

let mut implementation = quote!{
    let mut hash_map = std::collections::HashMap::<String, String>::new();
};

Here you created a new TokenStream using quote. This TokenStream is different than the one provided by the standard library, so don't confuse it with that. This needs to be mutable, as we'll be adding more code to this TokenStream soon.

TokenStream is basically the inverse representation of an AST. You provide actual Rust code to the quote macro, and it gives us the "stream of tokens" as you've called it previously for that source code.

This TokenStream can either be converted to the macro's output type, or be manipulated using methods provided by quote such as extend.

Moving on,

for field in fields {
    let identifier = field.ident.as_ref().unwrap();
    implementation.extend(quote!{
        hash_map.insert(
            stringify!(#identifier).to_string(),
            String::from(value.#identifier)
        );
    });
}

You loop over all of the fields. In each iteration, you first create a variable identifier to hold the name of the field for later use. You then use the extend method on our previously created TokenStream to add additional code to it.

The extend method just takes another TokenStream, which can easily be generated using quote macro. For the extension, you simply write code to insert a new entry into the hash_map that will be created in the macro output.

Let's have a closer look at that:

hash_map.insert(
    stringify!(#identifier).to_string(),
    String::from(value.#identifier)
);

As you know, the insert method takes a key and a value. You've told the compiler that both the key and value are of String type. stringify is a built-in macro in the standard library, that converts any Ident type into its &str equivalent. You use it here to convert your field identifiers into actual &str. You then call to_string() method on it to convert it to the String type.

But what does the #identifier represent?

quote provides you with the ability to use any variables declared outside of the TokenStream within it using the # prefix. Think of it as {} in format args. #identifier in this case simply gets replaced with the field identifier we declared outside of the extend call. So you basically call the stringify!() macro on the field identifier directly.

Similarly, you can access the value of a field using the familiar struct_variable.field_name syntax, but use the identifier variable instead of the field name instead. This is what you do when you pass the value to your insert statement: String::from(value.#identifier).

If you've looked at the code closely, you'll realise where the value came from, but if not, it's just what the trait implementation method uses to declare its input argument further down.

Once you've built your implementation using the for loop for each field in the struct, you have a TokenStream which, for representational purposes, contains the following code:

let mut hash_map = std::collections::HashMap::<String, String>::new();
hash_map.insert("username".to_string(), String::from(value.username));
hash_map.insert("first_name".to_string(), String::from(value.first_name));
hash_map.insert("last_name".to_string(), String::from(value.last_name));

Moving on to finally generating the output of our macro, you have:

quote! {
    impl From<#struct_identifier> for std::collections::HashMap<String, String> {
        fn from(value: #struct_identifier) -> Self {
            #implementation

            hash_map
        }
    }
}

Here you start by creating another TokenStream using quote. You write your From trait implementation in this block.

The following line again uses the # prefix syntax that we just looked at to declare that the trait implementation should be for your target struct, based on the identifier for the struct. In this case, this identifier will be replaced with User if you apply the derive macro to User struct.

impl From<#struct_identifier> for std::collections::HashMap<String, String> {}

Finally, you have the actual method body:

fn from(value: #struct_identifier) -> Self {
    #implementation

    hash_map
}

As you can see, you can easily nest TokenStreams into other TokenStreams using the same # syntax that lets you use external variables within the quote macro.

Here, you declare that your hash map implementation should be inserted as the first few lines of the function. And then you simply return the same hash_map. This completes your trait implementation.

As the very last step, you call .into() on the return type of our match block, which returns the output of quote macro call. This converts the TokenStream type used by quote into the TokenStream type that comes from the standard library and is expected by the compiler to be returned from a macro.

If it was harder to understand it when I broke it down line by line, you can look at the following complete but commented code in addition:

// Tell the compiler that this function is a derive macro, and the identifier for derive is `IntoHashMap`.
#[proc_macro_derive(IntoHashMap)]
// Declare a function that takes an input `TokenStream` and outputs `TokenStream`.
pub fn into_hash_map(item: TokenStream) -> TokenStream {
    // Parse the input token stream as `DeriveInput` type provided by `syn` crate.
    let input = syn::parse_macro_input!(item as syn::DeriveInput);

    // Store the struct identifier (name) into a variable so that you can insert it in the output code.
    let struct_identifier = &input.ident;

    // Match over the target type to which the derive macro was applied
    match &input.data {
        // Match that the target was a struct, and destructure the `fields` field from its information.
        Data::Struct(syn::DataStruct { fields, .. }) => {
            // Declare a new quote block that will hold the code for your implementation of the hash map.
            // This block will both create a new hash map, and also populate it with all of the fields from
            // the struct.
            let mut implementation = quote!{
                // This is just code that you want to see in the output. In this case, you want to have
                // a new hash map created.
                let mut hash_map = std::collections::HashMap::<String, String>::new();
            };

            // Iterate over all the fields of your target struct
            for field in fields {
                // Create a variable to store the identifier (name) of the field for later use
                let identifier = field.ident.as_ref().unwrap();
                // Extend your `implementation` block to include code in the output that populates
                // the hash map you create with the information from current field.
                implementation.extend(quote!{
                    // Convert the field identifier to a string using `stringify!` macro. This is used
                    // as the key in your new hash map entry. For value of this key, we access the field value
                    // from the struct using `value.#identifier`, where `#identifier` is replaced with the actual
                    // field name in output code.
                    hash_map.insert(stringify!(#identifier).to_string(), String::from(value.#identifier));
                });
            }

            // Create the final output block
            quote! {
                // Implement the `From` trait to allow converting your target struct, identified by
                // `struct_identifier` to a HashMap with both the key and the value as `String`.
                // Just like previously, #struct_identifier is replaced with the actual name of the
                // target struct in output code.
                impl From<#struct_identifier> for std::collections::HashMap<String, String> {
                    // This is just a method that the `From` trait requires you to implement. The
                    // type of the input value is again `#struct_identifier`, which is replaced with
                    // the name of the target struct in output code.
                    fn from(value: #struct_identifier) -> Self {
                        // Include the `implementation` block you created using `quote!` as the body
                        // of this method. `quote` allows you to nest other `quote` blocks freely.
                        #implementation

                        // Return the hash_map.
                        hash_map
                    }
                }
            }
        }
        // If the target is of any other type, panic.
        _ => unimplemented!()
        // Convert the `TokenStream` type used by `quote` to `TokenStream` type used by the
        // standard library and the compiler
    }.into()
}

And that's it. You've written your very first procedural macro in Rust!

It's now time to enjoy the fruits of your labour.

How to Use Your Derive Macro

Coming back to your my-app/main.rs, let's debug-print the hashmap that you create using the macro you implemented. Your main.rs should look like this:

// my-app/src/main.rs

use std::collections::HashMap;
use my_app_macros::IntoHashMap;

#[derive(IntoHashMap)]
pub struct User {
    username: String,
    first_name: String,
    last_name: String,
}

fn main() {
    let user = User {
        username: "username".to_string(),
        first_name: "First".to_string(),
        last_name: "Last".to_string(),
    };

    let hash_map = HashMap::<String, String>::from(user);

    dbg!(hash_map);
}

If you run this using cargo run, you should see the following output in your terminal:

[src/main.rs:20:5] hash_map = {
    "last_name": "Last",
    "first_name": "First",
    "username": "username",
}

And there you go!

How to Improve Our Implementation

There is a better way to work with iterators and quote that I skipped over in our original implementation – intentionally so, because it requires us to learn a bit more of the syntax specific to quote.

Let's see what it would have looked like with that, before we dive into how it works:

let input = syn::parse_macro_input!(item as syn::DeriveInput);
    let struct_identifier = &input.ident;

    match &input.data {
        Data::Struct(syn::DataStruct { fields, .. }) => {
            let field_identifiers = fields.iter().map(|item| item.ident.as_ref().unwrap()).collect::<Vec<_>>();

            quote! {
                impl From<#struct_identifier> for std::collections::HashMap<String, String> {
                    fn from(value: #struct_identifier) -> Self {
                        let mut hash_map = std::collections::HashMap::<String, String>::new();

                        #(
                            hash_map.insert(stringify!(#field_identifiers).to_string(), String::from(value.#field_identifiers));
                        )*

                        hash_map
                    }
                }
            }
        }
        _ => unimplemented!()
    }.into()

That looks so much more concise and easier to understand! Let's look at the special bit of syntax that makes it possible – in particular, the following line:

#(
    hash_map.insert(stringify!(#field_identifiers).to_string(), String::from(value.#field_identifiers));
)*

Let's break it down. First, you wrap the entire block in a #()* and your code goes inside the parentheses. This syntax is what allows you to make use of any iterator inside of the parenthesis, and it will repeat that block of code for all items in the iterator, while replacing the variable with correct item in each iteration.

In this case, you first create a field_identifiers iterator, that is a collection of all the field identifiers in your target struct. You then write your hash_map insert statement while using the iterator directly as if it is a single item. The #()* wrapper converts this into the expected output of multiple lines, one for each item in the iterator.

A More Elaborate Derive Macro

Now that you're comfortable writing a simple Derive macro, it's time to move on and create something that will actually be useful in the real world – especially if you're working with database models.

The `DeriveCustomModel` Macro

You're going to be building a Derive macro that helps you generate derived structs from your original struct. You're going to be needing this all the time whenever you're working with databases, and only want to load part of the data.

For example, if you have a User struct, which has all of the user information, but you only want to load the name information for the User from the database, you'll need a struct that only contains those fields – unless you want to make all the fields an Option, which isn't the best idea.

We will also need to add an implementation of From trait to automatically convert from User struct to the derived struct. Another thing our macro needs is to be able to derive multiple models from the same target struct.

Let's start by declaring it in lib.rs:

// lib.rs

#[proc_macro_derive(DeriveCustomModel, attributes(custom_model))]
pub fn derive_custom_model(item: TokenStream) -> TokenStream {
    todo!()
}

Most of this syntax should familiar to you by now from our previous example. The only addition we have here, is now we also define attributes(custom_model) in the call to proc_macro_derive, which basically tells the compiler to treat any attribute that begins with #[custom_model] as an argument for this derive macro on that target.

For example, once you've defined this, you can apply #[custom_model(name = "SomeName")] to the target struct, to define that the derived struct should have the name "SomeName". You need to parse this yourself and handle it too, of course – the definition was only to tell the compiler to pass that through to your macro implementation and not treat it as an unknown attribute.

Let's also create a new file that will contain the implementation detail of this macro. The macro rule states that it needs to be defined in lib.rs, and we've done that. The implementation itself can live anywhere in the project.

Create a new file custom_model.rs:

touch src/custom_model.rs

How to Separate the Implementation from the Declaration

Define a function that implements the DeriveCustomModel macro. We're also going to add all imports right away to avoid confusion later:

// custom_model.rs

use syn::{
    parse_macro_input, Data::Struct, DataStruct, DeriveInput, Field, Fields, Ident, Path,
};
use darling::util::PathList;
use darling::{FromAttributes, FromDeriveInput, FromMeta};
use proc_macro::TokenStream;
use quote::{quote, ToTokens};

pub(crate) fn derive_custom_model_impl(input: TokenStream) -> TokenStream {
    // Parse input token stream as `DeriveInput`
    let original_struct = parse_macro_input!(input as DeriveInput);

    // Destructure data & ident fields from the input
    let DeriveInput { data, ident, .. } = original_struct.clone();
}

This is just a Rust function, so there are no special rules here. You can call this from the declaration just like a regular Rust function.

#[proc_macro_derive(DeriveCustomModel, attributes(custom_model))]
pub fn derive_custom_model(item: TokenStream) -> TokenStream {
    custom_model::custom_model_impl(item)
}

How to Parse Derive Macro Arguments

To parse the arguments to our derive macro (which are usually provided using attributes applied to either the target or to its fields), we are going to rely on the darling crate to make it as simple as defining the data type for them.

// custom_model.rs

// Derive `FromDeriveInput` for this struct, which is a
// macro provided by darling to automatically add the ability
// to parse argument tokens into the given struct.
#[derive(FromDeriveInput, Clone)]
// We tell darling that we're looking for arguments
// that are defined using the `custom_model` attribute, and
// that we only support named structs for this.
#[darling(attributes(custom_model), supports(struct_named))]
struct CustomModelArgs {
    // Specify parameters for generating a derive model.
    // Multiple models can be generated by repeating
    // this attribute with parameters for each model.
    #[darling(default, multiple, rename = "model")]
    pub models: Vec,
}

We've told darling that for arguments to the struct, we should expect a list of model arguments, and each one will define parameters for a single derived model. This allows us to use the macro to generate multiple derived structs from a single input struct.

Next, let's define the arguments for each model:

// custom_model.rs

// Derive `FromMeta` for this struct, which is a
// macro provided by darling to automatically add the ability
// to parse metadata into the given struct.
#[derive(FromMeta, Clone)]
struct CustomModel {
    // Name of the generated model.
    name: String,
    // Comma-separated list of field identifiers
    // to be included in the generated model
    fields: PathList,
    // List of additional derives to apply to the
    // resulting struct such as `Eq` or `Hash`.
    #[darling(default)]
    extra_derives: PathList,
}

In this, we have two required arguments, name and fields, and one optional argument extra_derives. It's optional because of the #[darling(default)] annotation on it.

How to Implement `DeriveCustomModel`

Now that we have all of our data types defined, let's get to parsing – which is as simple as calling a method on our argument struct! The complete function implementation should like this:

// custom_model.rs

pub(crate) fn derive_custom_model_impl(input: TokenStream) -> TokenStream {
    // Parse input token stream as `DeriveInput`
    let original_struct = parse_macro_input!(input as DeriveInput);

    // Destructure data & ident fields from the input
    let DeriveInput { data, ident, .. } = original_struct.clone();

    if let Struct(data_struct) = data {
        // Extract the fields from this data struct
        let DataStruct { fields, .. } = data_struct;

        // `darling` provides this method on the struct
        // to easily parse arguments, and also handles
        // errors for us.
        let args = match CustomModelArgs::from_derive_input(&original_struct) {
            Ok(v) => v,
            Err(e) => {
                // If darling returned an error, generate a
                // token stream from it so that the compiler
                // shows the error in the right location.
                return TokenStream::from(e.write_errors());
            }
        };

        // Destructure `models` field from parsed args.
        let CustomModelArgs { models } = args;

        // Create a new output
        let mut output = quote!();

        // Panic if no models are defined but macro is
        // used.
        if models.is_empty() {
            panic!(
                "Please specify at least 1 model using the `model` attribute"
            )
        }

        // Iterate over all defined models
        for model in models {
            // Generate custom model from target struct's fields and `model` args.
            let generated_model = generate_custom_model(&fields, &model);

            // Extend the output to include the generated model
            output.extend(quote!(#generated_model));
        }

        // Convert output into TokenStream and return
        output.into()
    } else {
        // Panic if target is not a named struct
        panic!("DeriveCustomModel can only be used with named structs")
    }
}

The code that generates tokens for each model has been extracted away to another function that we call generate_custom_model. Let's implement that as well:

How to Generate Each Custom Model

// custom_model.rs

fn generate_custom_model(fields: &Fields, model: &CustomModel) -> proc_macro2::TokenStream {
    let CustomModel {
        name,
        fields: target_fields,
        extra_derives,
    } = model;

    // Create new fields output
    let mut new_fields = quote!();

    // Iterate over all fields in the source struct
    for Field {
        // The identifier for this field
        ident,
        // Any attributes applied to this field
        attrs,
        // The visibility specifier for this field
        vis,
        // The colon token `:`
        colon_token,
        // The type of this field
        ty,
        ..
    } in fields
    {
        // Make sure that field has an identifier, panic otherwise
        let Some(ident) = ident else {
            panic!("Failed to get struct field identifier")
        };

        // Try to convert field identifier to `Path` which is a type provided
        // by `syn`. We do this because `darling`'s PathList type is just a
        // collection of this type with additional methods on it.
        let path = match Path::from_string(&ident.clone().to_string()) {
            Ok(path) => path,
            Err(error) => panic!("Failed to convert field identifier to path: {error:?}"),
        };

        // If the list of target fields doesn't contain this field,
        // skip to the next field
        if !target_fields.contains(&path) {
            continue;
        }

        // If it does contain it, reconstruct the field declaration
        // and add it in `new_fields` output so that we can use it
        // in the output struct.
        new_fields.extend(quote! {
            #(#attrs)*
            #vis #ident #colon_token #ty,
        });
    }

    // Create a new identifier for output struct
    // from the name provided.
    let struct_ident = match Ident::from_string(name) {
        Ok(ident) => ident,
        Err(error) => panic!("{error:?}"),
    };

    // Create a TokenStream to hold the extra derive declarations
    // on new struct.
    let mut extra_derives_output = quote!();

    // If extra_derives is not empty,
    if !extra_derives.is_empty() {
        // This syntax is a bit compact, but you should already
        // know everything you need to understand it by now.
        extra_derives_output.extend(quote! {
            #(#extra_derives,)*
        })
    }

    // Construct the final struct by combining all the
    // TokenStreams generated so far.
    quote! {
        #[derive(#extra_derives_output)]
        pub struct #struct_ident {
            #new_fields
        }
    }
}

How to Use Your `DeriveCustomModel` Macro

Coming back to your my-app/main.rs, let's debug-print the generated hash-maps for your new structs that you create using the macro you implemented. Your main.rs should look like this:

// my-app/src/main.rs

use macros::{DeriveCustomModel, IntoStringHashMap};
use std::collections::HashMap;

#[derive(DeriveCustomModel)]
#[custom_model(model(
    name = "UserName",
    fields(first_name, last_name),
    extra_derives(IntoStringHashMap)
))]
#[custom_model(model(name = "UserInfo", fields(username, age), extra_derives(Debug)))]
pub struct User2 {
    username: String,
    first_name: String,
    last_name: String,
    age: u32,
}

fn main() {
    let user_name = UserName {
        first_name: "first_name".to_string(),
        last_name: "last_name".to_string(),
    };
    let hash_map = HashMap::<String, String>::from(user_name);

    dbg!(hash_map);

    let user_info = UserInfo {
        username: "username".to_string(),
        age: 27,
    };

    dbg!(user_info);
}

As you can see, extra_derives is already useful to us since we need to derive Debug and IntoStringHashMap for the new models.

If you run this using cargo run, you should see the following output in your terminal:

[src/main.rs:32:5] hash_map = {
    "last_name": "last_name",
    "first_name": "first_name",
}
[src/main.rs:39:5] user_info = UserInfo {
    username: "username",
    age: 27,
}

We are going to wrap up the derive macros here.

A Simple Attribute Macro

In this section, you're going to learn how to write an attribute macro.

The `log_duration` Attribute

You are going to write a simple attribute macro that can be applied to any function (or method) that will log the total run time of that function each time the function is called.

How to Declare an Attribute Macro

You declare attribute macros by creating a function and annotating that function using the proc_macro_attribute macro that tells the compiler to consider that function as a macro declaration. Let's see what that looks like:

// my-app-macros/src/lib.rs

#[proc_macro_attribute]
pub fn log_duration(args: TokenStream, item: TokenStream) -> TokenStream {
    log_duration_impl(args, item)
}

For these macros, the function name is important, as that also becomes the name of the macro. As you can see, these take two different arguments. The first is the argument passed to the attribute macro, and the second is the target of the attribute macro.

Let's also implement log_duration_impl. Create a new file log_duration.rs:

touch src/log_duration.rs

How to Implement the `log_duration` Attribute Macro

I'm going to give you the complete implementation first, and then I'll break down the parts that I haven't used so far:

// my-app-macros/src/log_duration.rs

use proc_macro::TokenStream;
use quote::quote;
use syn::{parse_macro_input, ItemFn};

pub(crate) fn log_duration_impl(_args: TokenStream, input: TokenStream) -> TokenStream {
    // Parse the input as `ItemFn` which is a type provided
    // by `syn` to represent a function.
    let input = parse_macro_input!(input as ItemFn);

    let ItemFn {
        // The function signature
        sig,
        // The visibility specifier of this function
        vis,
        // The function block or body
        block,
        // Other attributes applied to this function
        attrs,
    } = input;

    // Extract statements in the body of the functions
    let statements = block.stmts;

    // Store the function identifier for logging
    let function_identifier = sig.ident.clone();

    // Reconstruct the function as output using parsed input
    quote!(
        // Reapply all the other attributes on this function.
        // The compiler doesn't include the macro we are
        // currently working in this list.
        #(#attrs)*
        // Reconstruct the function declaration
        #vis #sig {
            // At the beginning of the function, create an instance of `Instant`
            let __start = std::time::Instant::now();

            // Create a new block, the body of which is the body of the function.
            // Store the return value of this block as a variable so that we can
            // return it later from the parent function.
            let __result = {
                #(#statements)*
            };

            // Log the duration information for this function
            println!("{} took {}μs", stringify!(#function_identifier), __start.elapsed().as_micros());

            // Return the result (if any)
            return __result;
        }
    )
    .into()
}

The only things that you might not have seen previously are the sig and the block fields you get from parsing the input as ItemFn. sig contains the entire signature of a function while block contains the entire body of the function. This is why, by using the following code, we can basically reconstruct the unmodified function:

// Example code to reconstruct unmodified fn in macro

#vis #sig #block

In this example, you want to modify the function body, which is why you create a new block that encapsulates the original function block.

How to Use Your `log_duration` Macro

Coming back to main.rs, using an attribute macro is simpler than you might think:

// main.rs

#[log_duration]
#[must_use]
fn function_to_benchmark() -> u16 {
    let mut counter = 0;
    for _ in 0..u16::MAX {
        counter += 1;
    }

    counter
}

fn main() {
    println!("{}", function_to_benchmark());
}

When you run this, you should get the following output:

function_to_benchmark took 498μs
65535

We are now ready to move on to a more complex use-case.

A More Elaborate Attribute Macro

The `cached_fn` Attribute

You are going to write an attribute macro that will allow you to add caching capability to any function. For the purposes of this example, we're going to assume that our function always has String arguments and also returns a String value.

Some of you might know this better as a "memoized" function.

In addition, you will need to allow the user of this macro to tell the macro how it can generate a dynamic key based on function args.

To help us facilitate the caching part so that we don't get diverted, we're going to use a dependency called cacache. cacache is a Rust library for managing local key and content caches. It works by writing the cache to the disk.

Let's add it to the project by editing the Cargo.toml file for my-app directly:

// Cargo.toml

workspace = { members = ["my-app-macros"] }

[package]
name = "my-app"
version = "0.1.0"
edition = "2021"
resolver = "2"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
# New dependency
cacache = { version = "13.0.0", default-features = false, features = ["mmap"] }
macros = { path = "./macros" }

How to Implement the `cached_fn` Attribute Macro

Let's start by declaring this macro in lib.rs:

// my-app-macros/src/lib.rs

#[proc_macro_attribute]
pub fn cached_fn(args: TokenStream, item: TokenStream) -> TokenStream {
    cached_fn_impl(args, item)
}

Create a new file cached_fn.rs to store the implementation:

touch my-app-macros/src/cached_fn.rs

Let's define how our arguments should look before we go ahead and implement anything:

`cached_fn` Attribute Arguments

// my-app-macros/src/cached_fn.rs

#[derive(FromMeta)]
struct CachedParams {
    // Accept any expression that we should use to compute the
    // key. This can be a constant string, or some computation
    // based on function arguments.
    keygen: Option,
}

The only argument is an optional keygen, which is of type Expr. Expr represents any valid Rust expression, so it can be very dynamic. In this example, you'll be passing an expression that generates the key based on function arguments of the target function.

As always, we'll first see the entire implementation and then break down the parts that are new:

// my-app-macros/src/cached_fn.rs

pub fn cached_fn_impl(args: TokenStream, item: TokenStream) -> TokenStream {
    // Parse argument tokens as a list of NestedMeta items
    let attr_args = match NestedMeta::parse_meta_list(args.into()) {
        Ok(v) => v,
        Err(e) => {
            // Write error to output token stream if there is one
            return proc_macro::TokenStream::from(Error::from(e).write_errors());
        }
    };

    // Parse the nested meta list as our `CachedParams` struct
    let CachedParams { keygen } = match CachedParams::from_list(&attr_args) {
        Ok(params) => params,
        Err(error) => {
            // Write error to output token stream if there is one
            return proc_macro::TokenStream::from(Error::from(error).write_errors());
        }
    };

    // Parse the input target item as a function
    let ItemFn {
        // The function signature
        sig,
        // The visibility specifier of this function
        vis,
        // The function block or body
        block,
        // Other attributes applied to this function
        attrs,
    } = parse_macro_input!(item as ItemFn);

    // Generate our key statement based on given param (or lack thereof)
    let key_statement = if let Some(keygen) = keygen {
        // If the user specified a `keygen`, use that as an expression to
        // get the cache key.
        quote! {
            let __cache_key = #keygen;
        }
    } else {
        // If no `keygen` was provided, use the name of the function
        // as cache key.
        let fn_name = sig.ident.clone().to_string();
        quote! {
            let __cache_key = #fn_name;
        }
    };

    // Reconstruct the function as output using parsed input
    quote!(
        // Apply other attributes from the original function to the generated function
        #(#attrs)*
        #vis #sig {
            // Include the key_statement we generated above as the first
            // thing in the function body
            #key_statement

            // Try to read the value from cache
            match cacache::read_sync("./__cache", __cache_key.clone()) {
                // If the value exists, parse it as string and return it
                Ok(value) => {
                    println!("Data is fetched from cached");
                    from_utf8(&value).unwrap().to_string()
                },
                Err(_) => {
                    println!("Data is not fetched from cached");
                    // Save the output of original function block into
                    // a variable.
                    let output = #block;

                    // Write the output value to cache as bytes
                    cacache::write_sync("./__cache", __cache_key, output.as_bytes()).unwrap();

                    // Return the original output
                    output
                }
            }
        }
    )
    .into()
}

Well, turns out that you've seen everything that we used in this one before.

The only new thing here is the use of the cacache dependency, but that's also pretty straightforward. You just give the location where you want to store the cached data as the first argument to the read_sync and write_sync functions provided by cacache.

We've also added some logging to help us verify that the macro works as expected.

How to Use the `cached_fn` Macro

To make any function memoized or cached, we simply annotate it using the cached_fn attribute:

// src/main.rs

#[cached_fn(keygen = "format!(\"{first_name} {last_name}\")")]
fn test_cache(first_name: String, last_name: String) -> String {
    format!("{first_name} {last_name}")
}

fn main() {
    test_cache("John".to_string(), "Appleseed".to_string());
    test_cache("John".to_string(), "Appleseed".to_string());
    test_cache("John".to_string(), "Doe".to_string());
}

If you run this, you should see the following output:

Data is not fetched from cached
Data is fetched from cached
Data is not fetched from cached

Which clearly shows that if the function is called more than once for the same arguments, data is returned from cache. But if the arguments are different, it doesn't return the value that was cached for a different set of arguments.

We did make a lot of assumptions for this one which don't hold true for a real-world use case. As such, this is only for learning purposes, but depicts a real-world use case.

For example, I've written attribute macros to cache HTTP handler functions using redis for production servers. Those have a very similar implementation to this, but contains a lot of bells and whistles to work with that particular use case.

A Simple Function-like Macro

It's finally time to have some fun again. We are going to start simple, but the second example is going to include parsing custom syntax. Fun, right?

Disclaimer: If you're familiar with declarative macros (using macro_rules! syntax), you might realize that the following examples can easily be written using that syntax and don't need to be procedural macros. Writing example procedural macros that cannot be written as declarative ones is extremely difficult if you also want to keep things simple, which is why the examples were chosen despite this.

The `constant_string` Macro

We're going to build a very simple macro that takes in a string literal (of type &str) as input and creates a global public constant for it (the name of the variable being the same as the value). Basically, our macro will generate the following:

pub const STRING_LITERAL: &str = "STRING_LITERAL";

How to Declare a Function-like Macro

You declare function-like macros by creating a function and annotating that function using a proc_macro macro. It tells the compiler to consider that function as a macro declaration. Let's see what that looks like:‌

// my-app-macros/src/lib.rs

#[proc_macro]
pub fn constant_string(item: TokenStream) -> TokenStream {
    constant_string_impl(item)
}

For these macros, the function name is important, as that also becomes the name of the macro. As you can see, these only take a single argument, which is whatever you pass on to the macro. It can literally be anything, even custom syntax that's not valid Rust code.

How to Implement the `constant_string` Macro

For the implementation, let's create a new file constant_string.rs:

touch my-app-macros/src/constant_string.rs

The implementation is pretty simple:

use darling::FromMeta;
use proc_macro::TokenStream;
use quote::quote;
use syn::{parse_macro_input, Ident, LitStr};

pub fn constant_string_impl(item: TokenStream) -> TokenStream {
    // Parse input as a string literal
    let constant_value = parse_macro_input!(item as LitStr);

    // Create a new `Ident` (identifier) from the passed string value.
    // This is going to be the name of the constant variable.
    let constant_value_name = Ident::from_string(&constant_value.value()).unwrap();

    // Generate the code for declaring the constant variable.
    quote!(pub const #constant_value_name: &str = #constant_value;).into()
}

All we're doing is parsing input as a string literal. If you pass something to do this is not a string literal, it will throw an error. Then we take the string, create a identifier out of it, and generate the output code. Short and simple.

How to Use the `constant_string` Macro

The usage of this macro is also pretty simple:

// src/main.rs

constant_string!("SOME_CONSTANT_STRING_VALUE");

The above code will expand to this:

pub const SOME_CONSTANT_STRING_VALUE: &str = "SOME_CONSTANT_STRING_VALUE";

A More Elaborate Function-like Macro

Function-like macros, as the name might suggest, can be used in a similar way to calling a function. You can also use them in any place where you can call a function, and beyond.

The `hash_mapify` Macro

Moving on to the interesting parts: the macro you're going to write now will allow you to generate a HashMap by simply passing in a list of key-value pairs. For example:

let variable = "Some variable";

hash_mapify!(
    &str,
    key = "value", 
    key2 = "value2", 
    key3 = "value3", 
    key4 = variable
);

As you can see, we want the first argument to be the type of the value, and the subsequent arguments to be the key-value pairs. And we'll need to parse all of this ourselves.

To keep things simple, since this can easily get out of hand, we're only going to support primitive values such as strings, integers, floats and booleans. So we're not going to support creating a hash_map with non-string keys or enum and struct as values.

How to Implement the `hash_mapify` Macro

We're going to start as usual by declaring our macro:

// my-app-macros/src/lib.rs

#[proc_macro]
pub fn hash_mapify(item: TokenStream) -> TokenStream {
    hash_mapify_impl(item)
}

Next, you're going to define a data structure to hold your input data. In this case, you need to know the value type passed, as well as a list of key-value pairs.

We are going to extract the implementation to a separate file, which is where you'll also implement the data types and parsing logic.

Create new file hash_mapify.rs and declare the data type to hold input data:

touch my-app-macros/src/hash_mapify.rs

How to Parse `hash_mapify`'s Input

// my-app-macros/src/hash_mapify.rs

use proc_macro::TokenStream;
use quote::{quote, ToTokens};
use syn::parse::{Parse, ParseStream};
use syn::{parse_macro_input, Lit, LitStr, Token, Type};

pub struct ParsedMapEntry(String, proc_macro2::TokenStream);

pub struct ParsedMap {
    value_type: Type,
    entries: Vec,
}

You store the value as TokenStream directly because you need to support both literal values as well as variables, both of which only have 1 common type in this context, TokenStream.

You also might have noticed that we save the value_type as Type which is a type provided by syn crate which is an enum of the possible types that a Rust value could have. That was a mouthful!

You won't need to handle each variant of this enum, since this type can also directly be converted to TokenStream. You'll better understand what that means shortly.

Next, you need to implement the syn::parse::Parse trait for ParsedMap declared previously, so that it can be computed from the TokenStream passed as arguments to the macro.

// my-app-macros/src/hash_mapify.rs

impl Parse for ParsedMap {
    fn parse(input: ParseStream) -> syn::Result<Self> {
        let mut entries = Vec::::new();
    }
}

input, which is of type ParsedStream in this case, works similar to an iterator. You need to parse tokens out of the input using the method parse on it, which will also advance the stream to the beginning of the next token.

For example, if you have a stream of tokens representing [a, b, c], as soon as you parse [ out of this stream, the stream will be mutated to only contain a, b, c] . This is very similar to iterators, where as soon as you take a value out, the iterator is advanced by one position and only holds the remaining items.

Before you parse anything, you need to check if input is empty, and panic if it is:

// my-app-macros/src/hash_mapify.rs

impl Parse for ParsedMap {
    fn parse(input: ParseStream) -> syn::Result<Self> {
        // ...

        // Check if input is empty (no arguments are passed). If
        // not, then panic as we cannot continue further.
        if input.is_empty() {
            panic!("At least a type must be specified for an empty hashmap");
        }

        // ...
    }
}

Since we expect the first argument passed to the macro to be the type of the value in our hashmap, let's parse that out of the token stream:

// my-app-macros/src/hash_mapify.rs

impl Parse for ParsedMap {
    fn parse(input: ParseStream) -> syn::Result<Self> {
        // ...

        // Since the first argument should be of type `Type`, you try
        // to parse `Type` out of input and returns an error otherwise.
        let ty = input.parse::()?;

        // ...
    }
}

Parse takes a single type argument which represents what to parse.

If the first argument cannot be parsed as a valid type, an error will be returned. Do note that this doesn't verify if the type you passed actually exists or not, this will only validate whether the tokens in the first argument are valid for a type definition, and that's all.

This means that if you pass SomeRandomType where SomeRandomType isn't actually defined, the parsing will still succeed. It will only fail after expanding the macro during compile time.

Moving on, we also expect the user to use , to separate the arguments. Let's parse that as the next token after type:

// my-app-macros/src/hash_mapify.rs

impl Parse for ParsedMap {
    fn parse(input: ParseStream) -> syn::Result<Self> {
        // ...

        // Next, parse the `,` token, which you expect to be used to
        // separate the arguments.
        input.parse::()?;

        // ...
    }
}

You might notice the usage of the Token! macro when providing the type argument for the parse method. It's a macro provided by syn to easily convert built-ins such as keywords (type, async , fn and so on) as well as punctuation marks (,, ., ; and so on) and delimiters ({, [, ( and so on). This macro takes a single argument, which is the keyword/punctuation/delimiter literal for which the type is needed.

The official docs define it as:

A type-macro that expands to the name of the Rust type representation of a given token.

Now that you have the type of value as well as the first separator (comma), it's time to start parsing key-value pairs. All of the key-value pairs follow the same structure key = value and are separated by commas.

Do note that white-space isn't important, as that is entirely handled during the tokenization process and isn't something that you need to handle.

Since you won't know how many key-value pairs are passed, you need something to tell you when all of it is parsed:

// my-app-macros/src/hash_mapify.rs

impl Parse for ParsedMap {
    fn parse(input: ParseStream) -> syn::Result<Self> {
        // ...

        // Loop until the input is empty (there is nothing else
        // left to parse).
        while !input.is_empty() {
            // ..
        }

        // ...
    }
}

As I explained previously, tokens are taken out of the stream and it's advanced each time you parse something. This means that when all of the tokens are parsed, the stream will be empty. We utilise this fact here to figure out when to break out of the loop.

Each key-value pair can be parsed in a similar fashion as you parsed the type argument:

// my-app-macros/src/hash_mapify.rs

impl Parse for ParsedMap {
    fn parse(input: ParseStream) -> syn::Result<Self> {
        // ...

        // Loop until the input is empty (there is nothing else
        // left to parse).
        while !input.is_empty() {
            // Try to parse the key as an identifier
            let key = if let Ok(key) = input.parse::() {
                key.to_string()
                // If it's not an identifier, try to parse it as
                // a string literal
            } else if let Ok(key) = input.parse::() {
                key.value()
                // If it's neither an identifier nor a string literal,
                // it is not a valid key, so panic with appropriate
                // error.
            } else {
                panic!("Key must be either a string literal or an identifier!");
            };

            // Parse the `=` sign, which should be the next token after
            // a key.
            input.parse::()?;

            // Next, try to parse the value as an identifier. If it is, it
            // means that it's a variable, so we should convert it to token
            // stream directly.
            let value = if let Ok(value) = input.parse::() {
                value.to_token_stream()
                // If the input isn't an identifier, try to parse it as a
                // literal value such as `"string"` for strings, `42`
                // for numbers `false` for boolean value, etc.
            } else if let Ok(value) = input.parse::() {
                value.to_token_stream()
            } else {
                // If the input is neither an identifier nor a literal value
                // panic with appropriate error.
                panic!("Value must be either a literal or an identifier!");
            };

            // Push the parsed key value pair to our list.
            entries.push(ParsedMapEntry(key, value));

            // Check if next token is a comma, without advancing the stream
            if input.peek(Token![,]) {
                // If it is, then parse it out and advance the stream before
                // moving on to the next key-value pair
                input.parse::()?;
            }
        }

        // ...
    }
}

The only thing here that is new, is the call to peek method at the end. This is a special method that returns a boolean if the token that is passed to peek is the next token in the stream, and false otherwise.

As the name might suggest, this only performs a check, so it doesn't take that token out of the stream or advance the stream in any form.

Once all of the parsing is done, you just return the information as part of ParsedMap struct we declared earlier. The complete implementation for this trait is as below if that's easier for you to read through:

// my-app-macros/src/hash_mapify.rs

impl Parse for ParsedMap {
    fn parse(input: ParseStream) -> syn::Result<Self> {
        let mut entries = Vec::::new();

        // Check if input is empty (no arguments are passed). If not, then
        // panic as we cannot continue further.
        if input.is_empty() {
            panic!("At least a type must be specified for an empty hashmap");
        }

        // Since the first argument should be of type `Type`, you try
        // to parse `Type` out of input and returns an error otherwise.
        let ty = input.parse::()?;

        // Next, parse the `,` token, which you expect to be used to
        // separate the arguments.
        input.parse::()?;

        // Loop until the input is empty (there is nothing else
        // left to parse).
        while !input.is_empty() {
            // Try to parse the key as an identifier
            let key = if let Ok(key) = input.parse::() {
                key.to_string()
                // If it's not an identifier, try to parse it as
                // a string literal
            } else if let Ok(key) = input.parse::() {
                key.value()
                // If it's neither an identifier nor a string literal,
                // it is not a valid key, so panic with appropriate
                // error.
            } else {
                panic!("Key must be either a string literal or an identifier!");
            };

            // Parse the `=` sign, which should be the next token after
            // a key.
            input.parse::()?;

            // Next, try to parse the value as an identifier. If it is, it
            // means that it's a variable, so we should convert it to token
            // stream directly.
            let value = if let Ok(value) = input.parse::() {
                value.to_token_stream()
                // If the input isn't an identifier, try to parse it as a
                // literal value such as `"string"` for strings, `42`
                // for numbers `false` for boolean value, etc.
            } else if let Ok(value) = input.parse::() {
                value.to_token_stream()
            } else {
                // If the input is neither an identifier nor a literal value
                // panic with appropriate error.
                panic!("Value must be either a literal or an identifier!");
            };

            // Push the parsed key value pair to our list.
            entries.push(ParsedMapEntry(key, value));

            // Check if next token is a comma, without advancing the stream
            if input.peek(Token![,]) {
                // If it is, then parse it out and advance the stream before
                // moving on to the next key-value pair
                input.parse::()?;
            }
        }

        Ok(ParsedMap {
            value_type: ty,
            entries,
        })
    }
}

How to Generate the Output Code

You can now finally write the actual macro implementation, which is going to be pretty-straightforward:

// my-app-macros/src/hash_mapify.rs

pub fn hash_mapify_impl(item: TokenStream) -> TokenStream {
    // Parse input token stream as `ParsedMap` defined by us.
    // This will use the logic from parse trait we implemented
    // earlier.
    let input = parse_macro_input!(item as ParsedMap);

    let key_value_pairs = input.entries;
    let ty = input.value_type;

    // Generate the output hashmap inside a code block so that
    // we don't shadow any existing variables. Return the hashmap
    // from the block.
    quote!({
        // Create a new hashmap with `String` for key type and `#ty` for 
        // value type, which parsed from the macro input arguments.
        let mut hash_map = std::collections::HashMap::<String, #ty>::new();

        // Insert all key-value pairs into the hashmap.
        #(
            hash_map.insert(#key_value_pairs);
        )*

        // Return the generated hashmap
        hash_map
    })
    .into()
}

If you're coding along with me, or if you have a keen eye, you might have noticed that there is an error here. The type of variable key_value_pairs is Vec. We are trying to use it in the output as:

#(hash_map.insert(#key_value_pairs);)*

which is the correct syntax for working with lists, but the underlying type ParsedMapEntry is a custom type. And neither syn nor quote would know how to convert it to a token stream. So we cannot use it with this syntax.

But if we try to manually write an implementation where we loop it ourselves, generate a separate tokens stream in each loop, and extend the existing one, it's going to be quite tedious. Wouldn't it be great if there was a better solution? Turns out there is: ToTokens trait.

How to Convert Custom Data Types to Output Tokens

This trait can be implemented for any of our custom types and defines how the type looks like when converted into the token stream.

// my-app-macros/src/hash_mapify.rs

impl ToTokens for ParsedMapEntry {
    fn to_tokens(&self, tokens: &mut proc_macro2::TokenStream) {
        let key = self.0.clone();
        let value = self.1.clone();

        tokens.extend(quote!(String::from(#key), #value));
    }
}

As part of the implementation, you need to mutate the tokens argument and extend it to contain the token stream that we want our type to generate. The syntax I used to do that should all be familiar by now.

Once you've done this, quote can now easily convert the problematic code to token stream. So this: #(hash_map.insert(#key_value_pairs);)* will now work directly.

As usual, here's the complete implementation if that's easier to understand:

// my-app-macros/src/hash_mapify.rs

use proc_macro::TokenStream;
use quote::{quote, ToTokens};
use syn::parse::{Parse, ParseStream};
use syn::{parse_macro_input, Lit, LitStr, Token, Type};

pub struct ParsedMapEntry(String, proc_macro2::TokenStream);

pub struct ParsedMap {
    value_type: Type,
    entries: Vec,
}

impl ToTokens for ParsedMapEntry {
    fn to_tokens(&self, tokens: &mut proc_macro2::TokenStream) {
        let key = self.0.clone();
        let value = self.1.clone();

        tokens.extend(quote!(String::from(#key), #value));
    }
}

impl Parse for ParsedMap {
    fn parse(input: ParseStream) -> syn::Result<Self> {
        let mut entries = Vec::::new();

        // Check if input is empty (no arguments are passed). If not, then
        // panic as we cannot continue further.
        if input.is_empty() {
            panic!("At least a type must be specified for an empty hashmap");
        }

        // Since the first argument should be of type `Type`, you try
        // to parse `Type` out of input and returns an error otherwise.
        let ty = input.parse::()?;

        // Next, parse the `,` token, which you expect to be used to
        // separate the arguments.
        input.parse::()?;

        // Loop until the input is empty (there is nothing else
        // left to parse).
        while !input.is_empty() {
            // Try to parse the key as an identifier
            let key = if let Ok(key) = input.parse::() {
                key.to_string()
                // If it's not an identifier, try to parse it as
                // a string literal
            } else if let Ok(key) = input.parse::() {
                key.value()
                // If it's neither an identifier nor a string literal,
                // it is not a valid key, so panic with appropriate
                // error.
            } else {
                panic!("Key must be either a string literal or an identifier!");
            };

            // Parse the `=` sign, which should be the next token after
            // a key.
            input.parse::()?;

            // Next, try to parse the value as an identifier. If it is, it
            // means that it's a variable, so we should convert it to token
            // stream directly.
            let value = if let Ok(value) = input.parse::() {
                value.to_token_stream()
                // If the input isn't an identifier, try to parse it as a
                // literal value such as `"string"` for strings, `42`
                // for numbers `false` for boolean value, etc.
            } else if let Ok(value) = input.parse::() {
                value.to_token_stream()
            } else {
                // If the input is neither an identifier nor a literal value
                // panic with appropriate error.
                panic!("Value must be either a literal or an identifier!");
            };

            // Push the parsed key value pair to our list.
            entries.push(ParsedMapEntry(key, value));

            // Check if next token is a comma, without advancing the stream
            if input.peek(Token![,]) {
                // If it is, then parse it out and advance the stream before
                // moving on to the next key-value pair
                input.parse::()?;
            }
        }

        Ok(ParsedMap {
            value_type: ty,
            entries,
        })
    }
}

pub fn hash_mapify_impl(item: TokenStream) -> TokenStream {
    // Parse input token stream as `ParsedMap` defined by us.
    // This will use the logic from parse trait we implemented
    // earlier.
    let input = parse_macro_input!(item as ParsedMap);

    let key_value_pairs = input.entries;
    let ty = input.value_type;

    // Generate the output hashmap inside a code block so that
    // we don't shadow any existing variables. Return the hashmap
    // from the block.
    quote!({
        // Create a new hashmap with `String` for key type and `#ty` for
        // value type, which parsed from the macro input arguments.
        let mut hash_map = std::collections::HashMap::<String, #ty>::new();

        // Insert all key-value pairs into the hashmap.
        #(
            hash_map.insert(#key_value_pairs);
        )*

        // Return the generated hashmap
        hash_map
    })
    .into()
}

How to Use the `hash_mapify` Macro

We can verify that our macro works by writing a simple usage:

// src/main.rs

fn main() {
    test_hashmap();
}

fn test_hashmap() {
    let some_variable = "Some variable value";

    let hash_map = hash_mapify!(
        &str,
        "first_key" = "first_value",
        "second_variable" = some_variable,
        some_key = "value for variable key",
    );

    let number_hash_map =
        hash_mapify!(usize, "first_key" = 1, "second_variable" = 2, some_key = 3,);

    dbg!(hash_map);
    dbg!(number_hash_map);
}

If you run this code, you should see the following output:

[src/main.rs:62:5] hash_map = {
    "first_key": "first_value",
    "some_key": "value for variable key",
    "second_variable": "Some variable value",
}
[src/main.rs:63:5] number_hash_map = {
    "second_variable": 2,
    "first_key": 1,
    "some_key": 3,
}

which is what we would expect to happen.

And now that we've covered all three types of procedural macros, we're going to wrap up the examples here.

Beyond Writing Macros

Now that you've learned how to write basic derive macros, I'd like to take some time to quickly introduce some additional tools and techniques that will be helpful when working with macros. I'll also point out some drawbacks of why and when to avoid them.

Helpful Crates/Tools

cargo-expand

This is a CLI tool that can generate macro expanded code for any file in your project. Another great project by David Tolnay. You do need the nightly toolchain for Rust to use this, though. Don't worry – it's only required for the tool itself to work. You don't need to make your project use the nightly toolchain as well. Your project can stay in the stable zone.

Install nightly toolchain:

rustup toolchain install nightly

Install cargo-expand:

cargo install cargo-expand

Now that this is done, you can see what the actual expansion of your code in main looks like. Simply run the following in the my-app project directory:

cargo expand

and it will output the expanded code in the terminal output. You will see some unfamiliar stuff as well, such as what the dbg! macro expands to, but you can ignore those.

trybuild & macrotest

These are 2 crates that are extremely useful if you want to unit-test your procedural macros' expanded forms, or assert any expected compilation errors.

Downsides of Macros

Debugging (or lack thereof)

You cannot put a breakpoint into any line of code that is generated by the macro. Nor can you get to it from the stacktrace of an error. This makes debugging generated code very difficult.

In my usual workflow, I either put logging into the generated code, or if that is not enough, I replace the usage of macro with the code given to me by cargo expand temporarily to debug it, make changes, and then update the macro code based on that.

There might be better ways out there, and if you know any, I'd be grateful if you can share them with me.

Compile Time Costs

There's a non-zero cost for macro expansion that the compiler needs to run and process, and then check that the code it generated is valid. This becomes even more expensive when recursive macros are involved.

As a very crude estimation, each macro expansion adds 10ms to the compile time of the project. If you're interested, I encourage you to read through this introduction on how the compiler processes macros internally.

Lack of Auto-complete and Code Checks

Code written as part of a macro output isn't presently supported fully by any IDE, nor is it supported by rust-analyzer. So in most cases, you're writing code without relying on features such as auto-complete, auto-suggestions, and so on.

Where Do We Draw the Line?

Given the insane potential of macros, it's very easy to get carried away with them. It's important to remember all of the drawbacks and make decisions accordingly, ensuring that you are not indulging yourselves into premature abstraction.

As a general rule, I personally avoid implementing any "business logic" with macros, nor do I attempt to write macros for generating code that I will need to step through with a debugger time and again. Or the code that I will need to make micro changes in for performance testing and improvement.

Wrapping Up

This was a long journey! But I wanted anyone with basic knowledge and experience with Rust to be able to follow and come out of this able to write macros in their own projects.

Hopefully, I was able to do that for you. I will be writing a lot more about macros in general, so stay tuned for that.

You can find the complete code for everything we looked at in this article in https://github.com/anshulsanghi-blog/macros-handbook repository.

Also, feel free to contact me if you have any questions or opinions on this topic.

Enjoying my work?

Consider buying me a coffee to support my work!

☕Buy me a coffee

Till next time, happy coding and wishing you clear skies!

Rust Tutorial – How to Build a Naïve Star Detector for Images

Anshul Sanghi — Tue, 16 Apr 2024 19:34:07 +0000

Star detection is a crucial step in many of the processing and analysis routines that we perform on astronomical images. It is extremely important for a process called plate-solving, which is the process of figuring out which part of the sky an image shows, or which part of the sky your telescope is pointed at.

All modern telescope mounts can make use of plate solving software to automatically figure out where they're pointed at, and in which direction they need to move to point at the correct location.

Star detection, sometimes, is also used in correcting the effect of atmosphere on the sharpness of targets such as galaxies. It is also crucial for combining astronomical images from multiple nights, telescopes, locations and so on into a single output image that has a very high signal-to-noise ratio.

With this tutorial, I'd like to introduce a very naïve technique for detecting stars in an image.

A quick note:

Star detection is a very complex topic, and I've only scratched the surface both in my own understanding and in this article.

The steps I use and describe in this article are derived from public documentation on existing real world applications (both for star detection and for edge detection), as well as some blog posts from incredibly knowledgeable people (which I link to at the end of the article, be sure to check them out).

As such, this implementation is intended for learning purposes only.

Before You Read

Prerequisites for the first part of the tutorial

The process described builds upon the concept of multi-scale processing of images using a trous wavelet transform. If you're not aware of what that is, I encourage you to learn more about it using my previous article that I just linked to, and then come back to this one.

This article also assumes that you have a basic understanding of Centroids. Just knowing what they mean is enough, as you don't have to calculate them yourself. Since the article focuses on image processing and analysis, a basic understanding of how pixels work in digital format is helpful, but not mandatory.

Prerequisites for the second part of this tutorial

Here, we focus on implementing the algorithm using the Rust programming language, without going much into the details of the language itself. So being comfortable writing Rust programs, and comfortable reading crate documentations is required.

If this is not you, you can still read Part 1 and learn the technique, and then maybe you'll want to then try it out in a language of your choice.

If you're not familiar with Rust, I highly encourage you to learn the basics. Here's an interactive Rust course that can get you started.

How Star Detection Works
How to Implement it in Rust
Further Reading
Wrapping Up

How Star Detection Works

Since this process involves a lot of steps, let's see how it works, with an increasing level of detail about what actually happens as we go along. With each increasing level, we'll be unwrapping the black box bit by bit.

What is Star Detection?

Star detection, at it's simplest form, involves isolating the stars from the rest of the image, and then performing edge detection on it.

1. Input image

2. Detected stars visualised using green circles

How Star Detection Works

First, you try to extract away the pixels that you think might be stars from the rest of the pixels in the image. This new image, that only contains the extracted pixels, is then analysed using edge detection techniques to find the star positions in 2D space.

1. Input image

2. Extracted pixels that are potentially stars

3. Detected stars visualised using green circles

An Intermediary Look At The Process

Then, you decompose your input image into multiple layers, each layer containing a part of the original data such that adding all layers gives us back the original data.

You then isolate the layers that would only contain small sized structures, such as noise and stars, and throw away the rest of the data.

Different layers of structure in the image that the input is decomposed into. We throw away the final layer and retain the rest in this example

With this filtered data, you find the edges in the image using the contouring technique (which is explained in the next section). Each contour gives us multiple "points" in the 2D space. You then try to draw a closed shape using the points you have.

Once you've done this, all you need is to find the center of this shape and you have the location of the stars.

1. Input image

2. Image after decomposing into layers and throwing away large scale data

3. Image after binarisation

4. Detected contours visualised using green outlines

5. Detected stars visualised using green circles

Picking It Apart

Using a multi-scale analysis technique facilitated by the à trous transform algorithm, you break down the image into multiple layers, each containing different scaled structures from the original image. You take the layers containing smaller scale structures and throw away the rest.

To these layers, you apply a bilateral denoising filter to reduce noise so that you can ensure that you're only left with stars and not noise that the algorithm might pick up as stars later on.

Different layers of structure in the image that the input is decomposed into. We throw away the final layer and retain the rest in this example.

1. Input image

2. Image after decomposing into layers and throwing away large scale data

3. Noise reduced image

Once you've filtered out the noise, you binarize your image using thresholding. Thresholding and binarization is the process of converting all of the pixels to either pure black or pure white, so that they're easier to work with. You can do this by selecting a certain intensity value, and all pixels with intensity less than this become black and all pixels with intensity more than this become white.

To find the optimum intensity value to binarize the image with, you define a minimum number of stars that you expect to find in the image, which is usually determined based on what you actually need to do with your star locations.

In our example, we'll start with a minimum of 500 and slowly push it to the limit of the sample image to see what happens.

Binarizing noise-reduced and wavelet filtered image

This makes the process of edge detection (which is the next step in our process) using contouring much more reliable.

Contouring is a term that describes the process of figuring out where the structures are in your image, and drawing a border along those structures – these are known as contours.

It is similar to edge-detection, but edge-detection helps you differentiate between individual neighbouring pixels, whereas contours are designed to work with a complete boundary of any structures in an image.

The library we'll be using finds the contours in an image using the algorithm proposed by Suzuki and Abe: Topological Structural Analysis of Digitized Binary Images by Border Following. Contouring in this manner will give you a collection of points that lie on the border of each contour.

For each contour it finds, you create a polygon by joining all of the border points within that contour. If this shape is an open shape, then you just extrapolate the final border to create a polygon, which needs to be a closed shape. You then use the centroid formulae on this polygon to find the center of mass of your shape, which gives you the center of your star (in most cases).

You also need to find the euclidean distances between the center of mass and each border point, the longest of which becomes the size of the star.

Contouring the binarized image to find closed polygons around stars visualised here using green outlines

Once you have your star size, you reject any stars that are either smaller than 1 pixel or larger than 24 pixels. These are educated guesses that I use, and they seem to give me the best results for sample images (but this is definitely a potential point of improvement).

After all of this, you should have the x and y coordinates of the star, as well as its size in pixels.

Detected stars visualised using green circles around them

We're going to stop there, but there's a lot more that you can do after this step to remove false-positives and fix the centroid/size of stars.

How to Implement it in Rust

Let's create a new library project:

cargo new --lib stardetect-rs && cd stardetect-rs

Prerequisites

You need a couple of dependencies to get started. Let's add them and I'll explain why you need them:

cargo add image imageproc image-dwt geo

image is a Rust library we'll use to work with images of all of the standard formats and encodings. It also helps us convert between various formats, and provides easy access to pixel data as buffers.
imageproc is another library by the people who created the image library. It's an extension for the same as it implements image processing functions and algorithms for the image lib.
image-dwt is my own library (shameless plug) that implements the à trous wavelet decomposition algorithm for image crate. This is needed to break down our image into multiple scales that I mentioned previously.
geo is a Rust library that allows us to easily work with geometric types (like points in 2d space), shapes (such as polygons), and algorithms implemented for them. We use this library to build our polygon based on contour data, and to also find the centroid of the polygon that I described above. It also helps us compute euclidean distances between points, which we use for determining star size.

How to read and decompose the input image

You start by reading the input image and decomposing it so that you're only left with stars (and noise).

You need to define a new struct that will act as a wrapper for your input image, and add a constructor for it to create an instance of this struct based on input:

// lib.rs
use image::{DynamicImage, GrayImage};

pub struct StarDetect {
    source: GrayImage,
}

impl From for StarDetect {
    fn from(source: DynamicImage) -> Self {
        Self {
            source: source.to_luma8(),
        }
    }
}

You then need to add the ability to extract the first n layers from wavelet decomposition of your image:

// lib.rs

use image_dwt::kernels::LinearInterpolationKernel;
use image_dwt::recompose::{OutputLayer, RecomposableWaveletLayers};
use image_dwt::transform::ATrousTransform;

impl StarDetect {
    fn extract_small_scale_structures(&mut self) {
        let (width, height) = self.source.dimensions();

        // Decompose the image into 8 layers
        let filtered_image = ATrousTransform::new(
            &DynamicImage::ImageLuma8(self.source.clone()),
            8,
            LinearInterpolationKernel,
        )
        // Filter out the residue image and keep the rest
        .filter(|item| item.pixel_scale.is_some())
        // Recompose the first 3 layers into a grayscale image.
        .recompose_into_image(width as usize, height as usize, OutputLayer::Grayscale);

        // Update the source image that we will work with
        // going forward.
        self.source = filtered_image.to_luma8();
    }
}

Noise reduction

Now that you have the input image (which should only contain noise and stars), let's get rid of the noise:

// lib.rs

impl StarDetect {
    fn apply_noise_reduction(&mut self) {
        self.source = imageproc::filter::bilateral_filter(&self.source, 10, 10., 3.);
    }
}

Next, you need to determine the optimum threshold value for a given minimum star count. You find it by picking a value and iteratively optimising it until you hit a star count that's more than the minimum.

How to optimize the threshold and binarization

Start by creating a new file threshold.rs and defining a trait with necessary methods. You need a method to optimise your threshold value and another for performing the binarization operation:

// threshold.rs

pub(crate) trait ThresholdingExtensions {
    fn optimize_threshold_for_star_count(&self, min_star_count: usize) -> u8;
    fn binarize(&mut self, threshold: u8);
}

Let's implement both of these:

// threshold.rs

use crate::centroid::find_star_centres_and_size;
use crate::StarDetect;

impl ThresholdingExtensions for StarDetect {
    fn optimize_threshold_for_star_count(&self, min_star_count: usize) -> u8 {
        // Current star count
        let mut star_count = 0;

        // Starting threshold value
        let mut threshold = u8::MAX;

        // Iterate until you've found the best threshold
        while star_count < min_star_count {
            // Panic if we reach the 0 intensity value while iterating.
            // This means that there are fewer stars than we hoped for.
            if threshold == 0 {
                panic!("Maximum iteration count reached");
            }

            // Reduce threshold to 95% of its previous value.
            // Using this, we check finer and finer differences
            // in threshold for each iteration.
            threshold = (0.95 * threshold as f32) as u8;

            // Clone the source data since we need to modify it
            // without affecting original data.
            let mut source = self.clone();

            // Binarize the source data image using current threshold
            ThresholdingExtensions::binarize(&mut source, threshold);

            // Find the number of stars detected with the current threshold
            star_count = find_star_centres_and_size(&source.source).len();
        }

        threshold
    }

    fn binarize(&mut self, threshold: u8) {
        // Iterate over every pixel in source image
        for pixel in self.source.iter_mut() {
            if *pixel > threshold {
                // If pixel intensity is greater than threshold
                // set it to maximum intensity instead.
                *pixel = u8::MAX;
            } else {
                // Otherwise, set it to 0 intensity.
                *pixel = 0;
            }
        }
    }
}

You might notice that we use the find_star_centres_and_size function when trying to find the optimised threshold value. We'll get to that shortly, as we need to declare some types that will hold the state of our computation before we implement the function.

Create a new file centroid.rs.

Define a new struct that will hold the coordinates and size of the star:

// centroid.rs

use imageproc::point::Point;

#[derive(Eq, PartialEq, Copy, Clone, Debug)]
pub struct StarCenter {
    coord: Point<u32>,
    radius: u32,
}

impl StarCenter {
    pub fn coord(&self) -> &Point<u32> {
        &self.coord
    }
    pub fn radius(&self) -> u32 {
        self.radius
    }
}

We've also defined methods to retrieve these fields. Point is a type provided to you by imageproc crate to store coordinates in an image.

How to construct polygons around stars

We're going to implement this function inside out. We first need a way to construct our polygon from contours. Let's implement that:

// centroid.rs

use geo::LineString;
use imageproc::contours::Contour;

pub(crate) fn construct_closed_polygon(contour: &Contour<u32>) -> LineString<f32> {
    // Create a new line string that connects all points
    // in the contour. This can create either an open
    // or a closed shape.
    let mut line_string = LineString::from_iter(contour.points.iter().map(|point| Coord {
        x: point.x as f32,
        y: point.y as f32,
    }));

    // If it is an open shape, close the shape to create a
    // polygon. This does nothing otherwise.
    line_string.close();

    line_string
}

Contour is a type provided by the imageproc crate, which is what it returns as the result of contouring operation on an image. It contains a list of points that lie on the border of the contour.

LineString is a type provided by geo and is defined by them as "An ordered collection of two or more Coords, representing a path between locations.". In this case, we use this type to construct the polygon shape.

How to detect star size and location using contours

Next, you need a way to compute the StarCenter type we declared previously from contour data:

// centroid.rs

use geo::{Centroid, Coord, EuclideanDistance};

pub(crate) fn filter_map_contour_to_star_centers(contour: &Contour<u32>) -> Option {
    // If there are no points in the contour
    // it is not a star.
    if contour.points.is_empty() {
        return None;
    }

    if contour.points.len() == 1 {
        // If there's only 1 point in the contour
        // consider it to be the center of the star
        // of size 1px.
        let center = contour.points.first().unwrap();
        let radius = 1_u32;

        return Some(StarCenter {
            coord: *center,
            radius,
        });
    }

    // Otherwise, construct a polygon around the star based on
    // contour information.
    let polygon = construct_closed_polygon(contour);

    // Find the centre of gravity of this polygon (centroid)
    let center = polygon.centroid().unwrap();

    // Find the radius of the star based on maximum distance between
    // the centroid and any of the points in contour.
    let radius = polygon.points().fold(0., |distance, point| {
        point.euclidean_distance(¢er).max(distance)
    });

    // If the radius is less than 1px or more than 24px
    // we reject it as a non-star.
    if !(1. ..=24.).contains(&radius) {
        return None;
    }

    // Construct star center based on previously computed information
    Some(StarCenter {
        coord: Point {
            x: center.x() as u32,
            y: center.y() as u32,
        },
        radius: radius as u32,
    })
}

This function utilises the construct_closed_polygon function you defined previously to compute the final star centers and sizes. Now for the easy part: let's implement the missing find_star_centres_and_size:

// centroid.rs

use image::GrayImage;

pub(crate) fn find_star_centres_and_size(image: &GrayImage) -> Vec {
    // Compute the contours in source image
    let contours = imageproc::contours::find_contours::<u32>(image);

    contours
        .iter()
        // Iterate over all contours and create a list
        // of star center and size data.
        .filter_map(filter_map_contour_to_star_centers)
        .collect()
}

How to encapsulate the process

All you need now is to implement one last method on the StarDetect struct that encapsulates the entire process:

// lib.rs

use crate::centroid::{find_star_centres_and_size, StarCenter};
use crate::threshold::ThresholdingExtensions;

impl StarDetect {
    pub fn find_stars(&mut self, min_stars: usize) -> Vec {
        self.extract_small_scale_structures();
        self.apply_noise_reduction();

        let threshold = self.optimize_threshold_for_star_count(min_stars);
        self.binarize(threshold);

        find_star_centres_and_size(&self.source)
    }
}

This method only calls the functions we've written so far. The user of your library will only need to call this function and nothing else.

You can now use what you've created to find stars in an image. For this article going forward, the image I'll be using to demonstrate is shown below. If you'd like to follow along, you can download the image I'll be using from here.

M42 Orion Nebula, The Dark Horse Nebula, The Flaming Star Nebula And The Surrounding H-Alpha Gas

As you might notice, we have a wide range of star shapes, sizes and colors in this image, but the same goes for noise and other large-scale nebulae structures too.

How to test the implementation on astronomical images

Create a new file main.rs and declare it as a binary target in the Cargo.toml file. It should look like this:

[package]
name = "stardetector"
version = "0.1.0"
edition = "2021"

[[bin]]
name = "stardetector"
path = "src/main.rs"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
geo = "0.28.0"
image = "0.25.1"
image-dwt = "0.3.2"
imageproc = "0.24.0"

You can finally use the lib we created to process the sample image. The final code in main.rs should look like this:

use image::Rgba;
use stardetector::StarDetect;

fn main() {
    // Load the image as mutable. You need mutability so that
    // you can draw on this image.
    let mut image = image::open("m42-star-detection.jpg").unwrap();

    // Create a new star detector instance. You clone the image
    // here because you need to also draw on the image for
    // visualisation purposes in this example.
    let mut star_detector = StarDetect::from(image.clone());

    // Run the star finder function with a minimum star count of
    // 500
    let stars = star_detector.find_stars(500);

    // Iterate over all stars you've found
    for star in stars {
        // Draw a hollow circle on the image so that you
        // can see what the algorithm found
        imageproc::drawing::draw_hollow_circle_mut(
            &mut image,
            (star.coord().x as i32, star.coord().y as i32),
            // Extend the radius by 4px so that it's easier to see
            // in the visualisation.
            star.radius() as i32 + 4,
            // Draw the circle with a pure green color
            Rgba([0, u8::MAX, 0, 1]),
        );
    }

    // Save the image with star positions annotated with
    // green circles.
    image.save("annotated.jpg").unwrap();
}

Ensure that the downloaded image is present at the root of this project folder.

We can finally run the program and see what it gives us:

cargo run --release

A part of the orion region with detected stars annotated with green circles

That looks pretty good! If we zoom in to a small part of the image:

A part of the orion region with detected stars annotated with green circles

We can see that there are some minor issues with the algorithm, such as stars that are very close to each other and have an overlap of their halos are considered as a single star. The problem is quite an interesting one.

There are various techniques to solve this issue, but they're out of the scope of this article.

How to optimize minimum star count

Let's crank up the minimum star count to 1000 and see what happens:

A part of the orion region with detected stars annotated with green circles

This time, it picked up many of the fainter stars since the threshold had to be lower to accommodate for the higher minimum star count.

It's time to crank it up further! Let's try 2000.

A part of the orion region with detected stars annotated with green circles

It picked up even more stars this time, but it has also started hallucinating some stars where there are none. This is being caused by lower threshold retaining more noise in the image, which is then picked up as a star. But noise isn't as visible in the final image unless you really pixel-peep, which is why it appears that the algorithm is hallucinating stars.

Noise, in this particular situation, not only refers to the noise in the traditional sense – but also to any pixels that do no belong to a star for this particular purpose.

But there is one more thing...

Let's crank the minimum star count up to the absolute maximum for this particular image, which I found to be 3500.

A part of the orion region with detected stars annotated with green circles

The algorithm now seems to have failed us miserably, which is expected when the noise is too high. There are too many false-positives for this data to be of any use at all.

I wanted to show you this anyway because it shows you the flaws in the algorithm. It also shows you what star detection on noise signal looks like and why we need to pre-process an image to remove everything that isn't a star before we run the star detection.

We're going to stop here for the implementation, but there's many resources you can find below if you're interested in learning more about the topic.

The complete code for everything I talked about today can be found here: https://github.com/anshulsanghi-blog/stardetector

Wrapping Up

I hope you enjoyed the journey so far. If image processing and analysis techniques or their implementation in Rust is something that interests you, then stay tuned for more as these are the topics I love writing about.

Also, feel free to contact me if you have any questions or opinions on this topic.

Enjoying my work?

Consider buying me a coffee to support my work!

Till next time, happy coding and wishing you clear skies!

Rust Tutorial – Learn Multi-Scale Processing of Astronomical Images

Anshul Sanghi — Wed, 10 Apr 2024 15:48:11 +0000

Recently, there's been a massive amount of effort put into developing novel image processing techniques. And many of them are derived from digital signal processing methods such as Fourier and Wavelet transforms.

These techniques have not only enabled a wide range of image processing techniques such as noise reduction, sharpening, and dynamic-range extension, but have also enabled many techniques used in compute vision such as edge detection, object detection, and so on.

Multi-scale analysis is one of the newer techniques (relatively speaking) that has been adopted in a wide range of applications, especially in the astronomical image and data processing applications. This technique, which is based on Wavelet transform, allows us to divide our data into multiple signals, that all add up to make the final signal.

We can then perform our processing or analysis work on this individual sub-signals, allowing us to do targeted operations that do not affect other sub-signals.

In this tutorial, we'll first be exploring what the technique is all about, through the lens of a particular algorithm for performing multi-scale analysis on images. We'll then move on to looking at how we can implement what we discussed in the first part in Rust programming language and recreate the examples you see in the first half of the article.

Before You Read:

Prerequisites for Part 1:

The technique described is derived from the concept of "Wavelet Transforms". You don't need to know everything about it, but a very basic understanding will help you grasp the material better.

Since the article focuses on image processing and analysis, a basic understanding of how pixels work in digital format is helpful, but not mandatory.

Prerequisites for Part 2:

If this is not you, you can still read Part 1 and learn the technique, and then maybe you'll want to then try it out in a language of your choice. If you're not familiar with Rust, I highly encourage you to learn the basics. Here's an interactive Rust course that can get you started.

Part 1: Understanding Multi-Scale Processing Technique And Algorithm
Part 2: How to Implement À Trous Tranform in Rust
Wrapping Up

Part 1: Understanding the Multi-Scale Processing Technique and Algorithm

So what do we mean when we talk about multi-scale processing or analysis of some data? Well, we usually mean breaking down the input data into multiple signals, each representing a particular scale of information.

Scale, when talking about image analysis, simply refers to the size of structures that we are looking at at any given time. It ignores everything else that's either smaller or larger than the current scale.

What is multi-scale image processing?

For images, "scales" generally refer to the size in pixels of various structures or details in the image. You'll be able to get an intuitive understanding by looking at the following example:

Messier 33, AKA Triangulum Galaxy

Assuming our naïve understanding is correct, we can derive images of at-least the following 3 scales:

Very small structures, usually the size of a single pixel. This layer, when separated from the rest of the image, will only contain the noise and some sharp stars for the most part.
Small structures, usually a few pixels in size. This layer, when separated, will contain all of the stars and the very fine details in the galaxy arms.
Large and very large scale structures, usually 100s of pixels in size. This layer, when separated, will contain the general size and shape of the galaxy at the center.

Now the question becomes, why do we need to do all of this in the first place?

The answer is simple: it allows us to make targeted enhancements and changes to an image.

For example, noise reduction on the overall image will usually result in a loss of sharpness in the galaxy. But since we have broken our image down into multiple scales, we can easily apply noise reduction to only the first few layers, as most of the random noise that is easy to remove resides only in lower scale layers.

We then re-combine the noise-reduced low-scale layers with unmodified large-scale ones, and we have an output that gives us noise reduction without a loss in quality.

Another peculiar thing about noise is that it's almost always present in just one of these layers, making noise reduction process both easy and non-destructive.

If you're more of a visual learner, let's see this in practice using the image we used above. We're gonna be working with the following grayscale version of that image, where I've also added random gaussian noise:

Messier 33 AKA Triangulum Galaxy, Converted to grayscale and with added Gaussian noise

Performing scale-based layer separation on this image, we get the following results. Note that the results are rescaled to a range where they can be viewed as an image for representational purpose. The actual transform produces pixel values that don't make sense when looked at independently, but all of the techniques and calculations described in this tutorial can still be safely applied without rescale. The recomposition process automatically gives us back the correct range:

9-level À Trous Decomposition. From top-left to bottom-right, we have images at the following pixel scales: 1, 2 4, 8, 16, 32, 64, 128, 256 (powers of 2)

The first and second layers contain the noise and stars. In this particular example, noise is mixed in with the stars. But using the first and second layers, we can easily target areas that are not present in the second layer, as we can be sure that those are where the noise is present in the first layer.
With the third layer, we still see the residue luminance from stars. But if you look closely, we also see very faintly the arms of the galaxy starting to appear.
From the fourth layer onwards, we see the galaxy at varying scales and detail levels, completely without the stars. We start with the finer details (relatively small scale details) and increasingly move on to larger and larger scale samples. By the end, we only see a vague shape where the galaxy used to be.

From here on, we can selectively apply noise reduction to the first two layers. Then we can recombine all of the layers to create the following image that has very little noise while preserving the same amount of details in the stars and the galaxy arms:

Messier 33 AKA Triangulum Galaxy, result of recombining all layers but with noise reduction applied to the pixel scale 1 & 2 layers

In its most basic form, multi-scale analysis involves breaking up your source image, commonly referred to as the "signal", into multiple "signals" – each containing the data for a particular scale in the source signal.

Scale, when talking about image signal here, refers to the distance between adjacent pixels that we take when creating the layer from the source image.

In practice, this technique is used as the one of the first steps in all kinds of astronomical data analysis and image processing.

As an example, you can use the technique to detect locations of stars while ignoring larger structures much more easily than would be possible otherwise.

The À Trous Wavelet Transform

All of what I've showed you previously, and all of what you're going to see in this tutorial, was achieved with wavelet decomposition and recomposition using the à trous algorithm for discreet wavelet transforms.

This algorithm has been used throughout the years for various applications. But it's become particularly important recently in astronomical image processing applications, where different objects and signals in an image can be completely separated based on structural scales.

Here's how the algorithm works:

We start with the source image input and number of levels to decompose into n.
For each level n:
- We convolve the image with our scaling function (we'll see what this is in a bit), where adjacent pixels are considered to be 2ⁿ units apart from each other, giving us the result result_n. This is where the "À Trous" name comes from, which literally translates to "with holes".
- The layer output output_n is then computed using input - result_n.
- We then update input to equal result_n. This is also known as residue data which serves as the source data for next layer.
Repeat the above steps for all levels.
In the end, we have 9 wavelet layers, and 1 residue layer. All 10 layers are required for the recomposition.

For a more mathematical approach to understanding this algorithm, I encourage you to read about the à trous algorithm here.

The recomposition process is very straightforward: we just need to add all 10 layers together. We can chose to apply positive or negative bias to any of the layers, which is a factor by which to multiply the layer pixel values during recomposition. You can use it either to enhance or diminish the characteristics of that particular layer.

Scaling Functions

Scaling functions are specific convolution kernels that help us better represent data at a particular scale based on our use case. There are 3 most commonly used scaling functions, which are shown below:

The images above show the 3 most commonly used scaling functions in the À Trous algorithm, visualised using 3rd level decomposition of the triangulum galaxy image used previously:

B3 Spline is a very smooth kernel. It is mostly used in isolation of large scale structures. If we wanted to sharpen our galaxy, we would have used this kernel.
Low-scale is a very sharply peaked kernel, and is best at working with small scale structures.
Linear interpolation kernel gives us the best of both worlds, and hence is used when we need to work with both small scale and large scale structures. This is what we have used in all of our previous examples.

Convolution Pixels At Each Scale

I mentioned in the algorithm that at each scale, the pixels in the image are considered to be 2ⁿ units apart. Let's try to grasp a better understanding of this using the following visualisation:

Consider the following 8px by 8px image. Each pixel is labeled 1 through 64, which is their index.

A representational pixel grid of a 8x8px image

We're going to focus on a convolution operation of one of the center pixels only for this example, let's say pixel number 28.

Scale 0: At scale 0, the value of 2ⁿ becomes 1. This means that for convolution, we'll consider pixels that are 1 unit apart from our target center pixel. These pixels are highlighted below:

8x8px grid with pixels that are involved in convolution for pixel number 28 highlighted at scale 0

Scale 1: This is where things get interesting. At scale 1, the value of 2ⁿ becomes 2. This means that for convolution, we'll jump directly to pixels that are 2 locations apart from the target pixel:

8x8px grid with pixels that are involved in convolution for pixel number 28 highlighted at scale 1

As you can see, we've created "holes" in our computation of the value of the target pixel by skipping 2ⁿ - 1 adjacent pixels and selecting the 2ⁿth pixel. This is the basis of the algorithm.

This process is repeated for every pixel in the image, just like a regular convolution process. And each time, we consider increasing distances between pixels for computation of final values at increasing scales.

Let's look at just one more scale.

Scale 2: This is where things get even more interesting. At scale 2 the value of 2ⁿ becomes 4. This means that for convolution, we'll jump directly to pixels that are 4 locations apart from the target pixel:

8x8px grid with pixels that are involved in convolution for pixel number 28 highlighted at scale 2

Wait what? Why are we choosing pixels 1, 4, 8, 25, & 57? 1 & 4 are only 3 locations apart, 25 is only 2 locations apart, and 8 & 57 are not even diagonally aligned with the target pixel. What's going on?

Handling Boundary Conditions

As we've mentioned that this process is executed for all of the pixels in an image, we also need to consider cases where the pixel locations for convolution lie outside of the image.

This is not a concept unique to this algorithm. During convolution, this is referred to as a boundary condition or handling boundary pixels. There are various techniques for dealing with this, and all of them involve virtually extending the image in order to make it seem like we're not encountering the boundary at all.

Some of the techniques are:

Extending as much as needed by copying the value of the last row/column
Mirroring the image on all edges and corners
Wrapping the image around the edges.

In our example, we're employing the "mirroring" technique. When implementing such an algorithm, we don't need to actually create an extended image. Any boundary handling is implementable using just basic mathematical formulae.

Our extended image, with the correct pixels selected for scale 2, is as follows:

Source image extended on all edges and corners using the mirroring technique. All of the faded regions represent extended areas.

Again, the extension is only logical and is completely computed using formulae, as opposed to actually extending the source image and then checking. We can easily see that with the mirrored images in place, our basic rule of picking pixels that are 2ⁿ locations apart is still followed.

Computing Maximum Possible Scales for Any Given Image

If you think about it carefully, you'll see that the maximum layers an image can be decomposed into can be calculated by computing the log₂ of the image width or height (whichever is lower) and throwing away the fractional part.

In our 5x5 image, log₂(5) ~= 2.32. If we throw away the fractional part, that leaves us with 2 layers. Similarly, for a 1000x1000px image, log₂1000 ~= 9.96, which means we can decompose a 1000x1000 px image into a maximum of 9 layers. It simply implies that our "holes" cannot be larger than the width or height.

Even with the mirroring extension we used above, if the holes are larger than the width of the image, they'll still end up outside of the extended regions, specially for corner or boundary pixels, making it impossible to perform convolution at that scale.

Closing Notes

Thinking about the examples and visualisations a bit more, you can clearly see how and why this algorithm works, and how it's able to separate out structures in an image based on their sizes. The increasing hole sizes make it so that only structures larger than the hole itself are retained for any given layer.

A big advantage of using this algorithm is the computational cost. Since this doesn't involve Fourier or Wavelet transforms, the computational cost is quite low, relatively speaking. The memory cost, however, is indeed higher. But more often than not that is a good tradeoff.

Another advantage of this algorithm when comparing it to other discreet wavelet transform algorithms is that the size of source image is preserved throughout the entire process. There's no decimation or upscaling happening here, making this algorithm one of the easiest ones to understand and implement.

The algorithm is used in almost all of the astronomical image processing softwares such as PixInsight, Siril, and many others.

This algorithm is also known by other names such as Stationary Wavelet Transform and Starlet Transform.

Part 2: How to Implement À Trous Tranform in Rust

Now I'm going to show you how you can implement this algorithm in Rust.

For the purposes of this tutorial, I'm going to assume that you're pretty familiar with Rust and its basic concepts, such as data-types, iterators, and traits and are comfortable writing programs that use these concepts.

I'm also going to assume that you have an understanding of what convolution and convolution kernels mean in this context.

Prerequisites

We're going to need a couple of dependencies. Before we get to that, let's quickly create a new project:

cargo new --lib atrous-rs
cd atrous-rs

Now let's all of the dependencies we need. We actually only need 2:

cargo add image ndarray

image is a Rust library we'll use to work with images of all of the standard formats and encodings. It also helps us convert between various formats, and provides easy access to pixel data as buffers.

ndarray is a Rust library that helps you you create, manipulate, and work with 2D, 3D, or N-Dimensional arrays. We can use nested Vectors, but using a project like ndarray is better in this case because we need to perform a lot of operations on both individual values as well as their neighbours. Not only is it much easier to do with ndarray, but they also have performance optimisations built in for many operations and CPU types.

Although I'll be covering the basic functions/traits/methods/data-types we use from these crates, I'm not going to go into too much detail for them. I encourage you to read the docs instead.

We're actually going to jump straight to algorithm implementation, and come back later to see how we can use it.

The À Trous Transform

Create a new file that will hold our implementation. Let's name it transform.rs.

Start with adding the following struct, that will hold the information we need to perform the transform:

// transform.rs

use ndarray::Array2;

pub struct ATrousTransform {
    input: Array2<f32>, // `Array2` is a 2D array where each value is of type `f32`. This will hold our pixel data for input image.
    levels: usize, // The number of levels or scales to decompose the image into
    current_level: usize, // Current level that we need to generate. This holds the state of our iterator.
    width: usize, // Width of input image
    height: usize, // Height of input image
}

We also need a way to create this struct easily. In our case, we want to be able to create it from the input image directly. Also, input image can be of any of the supported format and encoding, but we want a consistent color-type to implement the calculations, so we'll also need to convert the image to our expected format.

It's helpful to extract all of this logic away using the "constructor" pattern in Rust. Let's implement that:

// transform.rs

use image::GenericImageView;

impl ATrousTransform {
    pub fn new(input: &image::DynamicImage, levels: usize) -> Self {
        let (width, height) = input.dimensions();
        let (width, height) = (width as usize, height as usize);

        // Create a new 2D array with proper size for each dimension to hold all of our input's pixel data. Method `zeros` takes a "shape" parameter, which is a tuple of (rows_count, columns_count).
        let mut data = Array2::<f32>::zeros((height, width));

        // Convert the image to be a grayscale image where each pixel value is of type `f32`. Loop over all pixels in the input image along with its 2D location.
        for (x, y, pixel) in input.to_luma32f().enumerate_pixels() {
            // Put the pixel value at appropriate location in our data array. The `[[]]` syntax is used to provide a 2-dimensional index such as `[[row_index, col_index]]`
            data[[y as usize, x as usize]] = pixel.0[0];
        }

        Self {
            input: data,
            levels,
            current_level: 0,
            width,
            height
        }
    }
}

This takes care of converting the image to grayscale and converting the pixel values to f32. If you're not already aware, for images with floating-point pixel values, the values are always normalized. This means that they are always between 0 and 1 – 0 representing black and 1 representing white.

Iterators and the À Trous Transform

Before we continue, let's think about the algorithm for a second. We need to be able to generate images at increasing scales, until we hit the maximum number of levels we need.

We want the consumer of our library to have access to all of these scales, and be able to manipulate them and also easily recombine once they're done. They need to be able to filter layers to ignore structures at certain scales, manipulate or "map" them to change their characteristics, perform operations on them, or even store each image if they so need.

This sounds an awful lot like Iterators! Iterators give us methods like filter, skip, take, map, for_each, and so on, all of which are exactly all we need to work with our layers before recomposition.

One added advantage of Iterators is that it allows you to finish processing each layer all the way through before you move on to the next one. If you're unsure why this is, I suggest reading more about processing a series of items with Iterators in Rust.

We're going implement the Iterator trait for our ATrousTransform type which should produce a wavelet layer as output for each iteration.

We're going to be implementing the inner-most parts of the algorithm first, and build out from there. So we first need a way to convolve an input data buffer with the scaling function while making sure that adjacent pixels are 2ⁿ locations apart, which is the first step in our loop.

Convolution

We need to define our convolution kernel before we can do anything else. Create a new file kernel.rs and add it to lib.rs with the following contents:

// kernel.rs

#[derive(Copy, Clone)]
pub struct LinearInterpolationKernel {
    values: [[f32; 3]; 3]
}

impl Default for LinearInterpolationKernel {
    fn default() -> Self {
        Self {
            values: [
                [1. / 16., 1. / 8., 1. / 16.],
                [1. / 8., 1. / 4., 1. / 8.],
                [1. / 16., 1. / 8., 1. / 16.],
            ]
        }
    }
}

We define it using a struct instead of a constant array of arrays because we need to define some tiny helpful methods on it related to index handling. We'll come back to that later.

Create another file convolve.rs. This is where all of the code for handling convolution for individual pixels will go. We'll define a Convolution trait that will define methods needed to perform the convolution on every pixel in current layer.

// convolve.rs

pub trait Convolution {
    fn compute_pixel_index(
        &self,
        distance: usize,
        kernel_index: [isize; 2],
        target_pixel_index: [usize; 2]
    ) -> [usize; 2];

    fn compute_convoluted_pixel(
        &self, 
        distance: usize, 
        index: [usize; 2]
    ) -> f32;
}

You may ask why we need a trait here instead of a simple impl block. We are only working with Grayscale images in this article, but you may want to extend it to implement it for RGB or other color modes as well.

Now, you need to implement this trait for your ATrousTransform struct:

// convolve.rs

impl Convolution for ATrousTransform {
    fn compute_pixel_index(
        &self, 
        distance: usize, 
        kernel_index: [isize; 2], 
        target_pixel_index: [usize; 2]
    ) -> [usize; 2] {
        let [kernel_index_x, kernel_index_y] = kernel_index;

        // Compute the actual distance of adjacent pixel
        // by multiplying their relative position with the
        // size of the hole.
        let x_distance = kernel_index_x * distance as isize;
        let y_distance = kernel_index_y * distance as isize;

        let [x, y] = target_pixel_index;

        // Compute the index of adjacent pixel in the 2D
        // image based on the index of current pixel.
        let mut x = x as isize + x_distance;
        let mut y = y as isize + y_distance;

        // If x index is out of bounds, consider x to be
        // the nearest boundary location
        if x < 0 {
            x = 0;
        } else if x > self.width as isize - 1 {
            x = self.width as isize - 1;
        }

        // If y index is out of bounds, consider y to be
        // the nearest boundary location
        if y < 0 {
            y = 0;
        } else if y > self.height as isize - 1 {
            y = self.height as isize - 1;
        }

        // The final 2D index of pixel.
        [y as usize, x as usize]
    }

    fn compute_convoluted_pixel(
        &self, 
        distance: usize, 
        [x, y]: [usize; 2]
    ) -> f32 {
        // Create new variable to hold the result of convolution
        // for current pixel.
        let mut pixels_sum = 0.0;

        let kernel = LinearInterpolationKernel::default();

        // Iterate over relative position of pixels from the center
        // pixel to perform convolution with. In other words, 
        // these are the indexes of neighbouring pixels from the
        // center pixel.
        for kernel_index_x in -1..=1 {
            for kernel_index_y in -1..=1 {
                // Get the computed pixel location that maps to
                // the current position in kernel
                let pixel_index = self.compute_pixel_index(
                    distance,
                    [kernel_index_x, kernel_index_y],
                    [x, y]
                );

                // Get the multiplicative factor (kernel value) for 
                // this relative location from the kernel.
                let kernel_value = kernel.value_from_relative_index(
                    kernel_index_x,
                    kernel_index_y
                );

                // Multiply the pixel value with kernel scaling
                // factor and add it to the pixel sum.
                pixels_sum += kernel_value * self.input[pixel_index];
            }
        }

        // Return the value of computed pixel from convolution process.
        pixels_sum
    }
}

We need to do computations to figure out each pixel's location based on the relative position in the kernel from the center pixel as well as ensure that the "hole size" is also being taken into consideration for the final pixel index. As you might notice, you also want to handle the boundary conditions when computing indexes.

I encourage to take your time here and go through the code and the comments.

Implementing the Iterator

It's finally time to implement the Iterator trait for your ATrousTransform:

// transform.rs

impl Iterator for ATrousTransform {
    // Our output is an image as well as the current level for each
    // iteration. The current level is an `Option` to represent the
    // final residue layer after the intermediary layers have been
    // generated.
    type Item = (Array2::<f32>, Option<usize>);

    fn next(&mut self) -> Option {
        let pixel_scale = self.current_level;
        self.current_level += 1;

        // We've already generated all the layers. Return None to 
        // exit the iterator.
        if pixel_scale > self.levels {
            return None;
        }

        // We've generated all intermediary layers, return the 
        // residue layer.
        if pixel_scale == self.levels {
            return Some((self.input.clone(), None))
        }

        let (width, height) = (self.width, self.height);

        // Distance between adjacent pixels for convolution (also 
        // referred to as size of "hole").
        let distance = 2_usize.pow(pixel_scale as u32);

        // Create new buffer to hold the computed data for this layer.
        let mut current_data = Array2::<f32>::zeros((height, width));

        // Iterate over each pixel location in the 2D image
        for x in 0..width {
            for y in 0..height {
                // Set the current pixel in current layer to
                // the result of convolution on the current
                // pixel in input data.
                current_data[[y, x]] = self.compute_convoluted_pixel(
                    distance, 
                    [x, y]
                );
            }
        }

        // Create current layer by subtracting currently computed pixels 
        // from previous layer
        let final_data = self.input.clone() - ¤t_data;

        // Set the input layer to equal the current computed layer so 
        // that it can be used as the "previous layer" in next iteration.
        // This is also our residue data for each layer.
        self.input = current_data;

        // Return the current layer data as well as current level information.
        Some((final_data, Some(self.current_level)))
    }
}

I'm going to point out that there's a lot of potential for optimizing for performance here, but that's out of the scope of this article.

We'll finally look at how we can take all of these layers and reconstruct our input image.

Recomposition

As I've said previously, reconstructing an image that was decomposed with the A Trous transform is as simple as summing all of the layers together.

We're going to define a trait for this. Why we need a trait here should be clear once you look at the implementation.

Create a new file recompose.rs with the following contents:

// recompose.rs

use image::{DynamicImage, ImageBuffer, Luma};
use ndarray::Array2;

pub trait RecomposableLayers: Iteratorf32>, Option<usize>)> {
    fn recompose_into_image(
        self,
        width: usize,
        height: usize,
    ) -> DynamicImage
        where
            Self: Sized,
    {
        // Create a result buffer to hold the pixel data for our output image.
        let mut result = Array2::<f32>::zeros((height, width));

        // For each layer, add the layer data to current value of result buffer.
        for layer in self {
            result += &layer.0;
        }

        // Compute min and max pixel intensity values in the final data so that
        // we can perform a "rescale", which normalizes all pixel values to be
        // between the range of 0 & 1, as is expected by float 32 images.
        let min_pixel = result.iter().copied().reduce(f32::min).unwrap();
        let max_pixel = result.iter().copied().reduce(f32::max).unwrap();

        // Create a new `ImageBuffer`, which is a type provided by `image` crate to
        // serve as buffer for pixel data of an image. Here, we're creating a new
        // `Luma` ImageBuffer with pixel value of type `u16`. Luma just refers to
        // grayscale.
        let mut result_img: ImageBufferu16>, Vec<u16>> =
            ImageBuffer::new(width as u32, height as u32);

        // Pre-compute the denominator for scaling computation so that we don't
        // repeat this unnecessarily for every iteration.
        let rescale_ratio = max_pixel - min_pixel;

        // Iterate over all pixels in the `ImageBuffer` and fill it based on data
        // from the `result` buffer after rescaling the value.
        for (x, y, pixel) in result_img.enumerate_pixels_mut() {
            let intensity = result[(y as usize, x as usize)];

            *pixel =
                Luma([((intensity - min_pixel) / rescale_ratio * u16::MAX as f32) as u16]);
        }

        // Convert the `ImageBuffer` into `DynamicImage` and return it
        DynamicImage::ImageLuma16(result_img)
    }
}

// Implement this trait for anything that implements the Iterator trait
// with the given item type
impl RecomposableLayers for T where T: Iteratorf32>, Option<usize>)> {}

If you haven't noticed, since we implement this trait for a generic, this will work with any iterator, such as Filter, Map, and so on. If you didn't use a trait here, you'll have had to implement the same thing again and again for every built-in iterator type, and your code wouldn't have worked with 3rd party types.

Using the À Trous Transform

After all of that, it's finally time to reproduce the processing that I showed you for the galaxy image with lots of noise. Create a new file main.rs with the following contents:

use image::{DynamicImage, ImageBuffer, Luma};
use atrous::recompose::RecomposableLayers;
use atrous::transform::ATrousTransform;

fn main() {
    // Open our noisy image
    let image = image::open("m33-noise-lum.jpg").unwrap();

    // Create a new instance of the transform with 9 layers
    let transform = ATrousTransform::new(&image, 9);

    // Map over each layer
    transform.map(|(mut buffer, pixel_scale)| {
        // Create a new image buffer to hold the pixel data. This
        // will be populated from the raw buffer for this layer.
        let mut new_buffer =
            ImageBuffer::u16>, Vec<u16>>::new(buffer.ncols() as u32, buffer.nrows() as u32);

        // Iterate over all pixels of the `ImageBuffer` to populate it. We also
        // convert from `f32` pixels to `u16` pixels.
        for (x, y, pixel) in new_buffer.enumerate_pixels_mut() {
            *pixel = Luma([(buffer[[y as usize, x as usize]] * u16::MAX as f32) as u16])
        }

        // If the present layer is a small scale layer (< 3), 
        // perform noise reduction
        if pixel_scale.is_some_and(|scale| scale < 3) {
            let mut image = DynamicImage::ImageLuma16(new_buffer).to_luma8();

            // Bilateral filter is a de-noising filter. Apply it to the image.
            image = imageproc::filter::bilateral_filter(&image, 10, 10., 3.);

            // Modify the raw buffer to contain the updated pixel values after
            // filtering.
            for (x, y, pixel) in image.enumerate_pixels() {
                buffer[[y as usize, x as usize]] = pixel.0[0] as f32 / u8::MAX as f32;
            }

            // Return the updated buffer.
            (buffer, pixel_scale)
        } else {
            // Return the unmodified buffer for larger scale layers.
            (buffer, pixel_scale)
        }
    })
        // Call the recomposition method on iterator
        .recompose_into_image(image.width() as usize, image.height() as usize)
        // Convert output to 8-bit grayscale image
        .to_luma8()
        // Save it to jpg file
        .save("noise-reduced.jpg")
        .unwrap()
}

You also need to add a new dependency, imageproc, which provides useful image processing implementations on top of the image crate.

cargo add imageproc

To make this work, we also need to modify our Cargo.toml to explicitly define both binary and library targets:

// Cargo.toml

[package]
name = "atrous-rs"
version = "0.1.0"
edition = "2021"

[[bin]]
name = "atrous"
path = "src/main.rs"

[lib]
name = "atrous"
path = "src/lib.rs"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
image = "0.25.1"
imageproc = "0.24.0"
ndarray = "0.15.6"

You may download the test image from here. Move it to the root directory of your project, and run cargo run --release. Once it finishes, you should have a new file noise-reduced.jpg as the output of our process.

And there we have it.

Wrapping Up

I hope you enjoyed the journey so far. If image processing techniques or their implementation in Rust is something that interests you, then stay tuned for more as these are the topics I love writing about.

Also, feel free to contact me if you have any questions or opinions on this topic.

Enjoying my work?

Consider buying me a coffee to support my work!

Till next time, happy coding and wishing you clear skies!

Anshul Sanghi - freeCodeCamp.org

How to Blend Images in Rust Using Pixel Math

Prerequisites

Table Of Contents

Introduction

How Image Blending Works

Project Setup

How to Read Pixel Values

How to Blend Functions

Average Blend

Multiply Blend

Lighten Blend

Darken Blend

Screen Blend

Addition Blend

Subtraction Blend

How to Apply Blend Functions To Images

Putting It All Together

Glossary

Enjoying my work?

Rust Tutorial – Learn Advanced Iterators & Pattern Matching by Building a JSON Parser

Disclaimer

Prerequisites

Table Of Contents

What are Iterators in Rust?

How to implement iterators in Rust

What are peekable iterators in Rust?

What is The Match Statement in Rust?

How to use iterators in match statements in Rust

What are match guards in Rust?

What is binding in Rust?

Pattern Binding

The @ Binding

How to Build a JSON Parser – Stage 1: Reader

What is the UTF-8 byte encoding?

How to read the data

How to Build a JSON Parser – Stage 2: Prepare Intermediate Data Types

The value type

How to add helpful conversion methods

How to Build a JSON Parser – Stage 3: Tokenization

How to define expected valid tokens

How to implement the tokenizer struct

How to tokenize an iterator of characters

How to parse string tokens

How to parse number tokens

How to parse boolean tokens

How to parse Null Literal

How to parse delimiters

How to parse a terminating character

How to Build a JSON Parser – Stage 4: From Tokens To Value

How to parse primitives

How to parse arrays

How to parse objects

How to Use the JSON parser

Wrapping Up

Enjoying my work?

Procedural Macros in Rust – A Handbook for Beginners

Table of Contents

What are Macros in Rust?

Types of Macros in Rust

Declarative macros

Procedural macros

Types of Procedural Macros

Derive macros

Attribute macros

Functional macros

Prerequisites

Helpful dependencies

How to Write a Simple Derive Macro

The IntoStringHashMap Derive Macro

How to Declare a Derive Macro

How to Parse the Macro's Input

How to Ensure a struct Target for the Macro

How to Build the Output Code

How to Use Your Derive Macro

How to Improve Our Implementation

A More Elaborate Derive Macro

The DeriveCustomModel Macro

How to Separate the Implementation from the Declaration

How to Parse Derive Macro Arguments

The `@` Binding

The `IntoStringHashMap` Derive Macro

How to Ensure a `struct` Target for the Macro

The `DeriveCustomModel` Macro

How to Implement `DeriveCustomModel`

How to Use Your `DeriveCustomModel` Macro

The `log_duration` Attribute

How to Implement the `log_duration` Attribute Macro

How to Use Your `log_duration` Macro

The `cached_fn` Attribute

How to Implement the `cached_fn` Attribute Macro

`cached_fn` Attribute Arguments

How to Use the `cached_fn` Macro

The `constant_string` Macro

How to Implement the `constant_string` Macro

How to Use the `constant_string` Macro

The `hash_mapify` Macro

How to Implement the `hash_mapify` Macro

How to Parse `hash_mapify`'s Input

How to Use the `hash_mapify` Macro