QuRT - freeCodeCamp.org

How to Use SCons to Build Software Projects [Full Handbook]

Nikheel Vishwas Savant — Thu, 07 May 2026 21:22:30 +0000

If you've ever wrestled with Makefile syntax, fought tab-versus-spaces bugs, or tried to make a build system work across Linux, macOS, and Windows, SCons is worth your attention. It replaces Make, autoconf, and automake with a single tool where every build file is a real Python script.

This handbook walks through SCons from first principles. You'll install it, build a multi-file C++ project with a static library, set up cross-compilation for an embedded target (Qualcomm's QuRT real-time operating system), and learn the internals that make SCons different from Make and CMake.

By the end, you'll have a working build system you can adapt to your own projects.

The full example code is self-contained. You can type it out, run it, and see real output at every step.

Prerequisites
What is SCons and Why Does it Exist
How SCons Compares to Make, CMake, and Meson
A Side-by-Side Look at Make Versus SCons
Installing SCons
Core Concepts You Need Before Writing a Build File
The Three Environments in SCons
Construction Variables Reference
Your First SConstruct File
Building a Multi-File C++ Project Step by Step
Detailed Walkthrough of Every File in the Project
Running the Build and Understanding the Output
What Happens During an Incremental Build
Cross-Compiling for QuRT (Qualcomm Real-Time OS)
Writing QuRT-Specific Application Code
Building Both Native and QuRT From One SConstruct
How SCons Detects Dependencies and Decides What to Rebuild
Writing a Custom Scanner
The Shared Build Cache
Working with Shared Libraries
Adding Command-Line Options with AddOption
Configure Checks for Portability
Custom Builders for Non-Standard File Types
Aliases, Default Targets, and Install Rules
Platform-Specific Configuration
Customizing Build Output
How to Debug SCons Build Files
The SCons Command-Line Reference
Common Mistakes and How to Avoid Them
Summary

Prerequisites

You need Python 3.7 or newer installed on your system. You also need a C++ compiler (GCC, Clang, or MSVC). Familiarity with basic C/C++ compilation (what a compiler and linker do) is assumed. Prior experience with Make or any build system is helpful but not required.

For the QuRT cross-compilation sections, you need the Qualcomm Hexagon SDK installed on your machine. Those sections are self-contained, so you can skip them if you're only interested in native builds.

What is SCons and Why Does it Exist?

SCons is an open-source, cross-platform software construction tool written entirely in Python. Steven Knight created it in 2001 after his design won the Software Carpentry SC Build competition in August 2000.

The competition asked participants to design a better build tool, and Knight's "ScCons" entry beat out the alternatives. The name was later shortened to "SCons" after the project separated from Software Carpentry.

Knight's design drew heavily from Cons, a Perl-based build tool created by Bob Sidebotham in the late 1990s. Cons introduced several ideas that were radical at the time: content-based change detection (using MD5 hashes instead of timestamps), automatic dependency scanning for C/C++ headers, and a single global dependency graph that eliminated the problems with recursive Make.

SCons took all of these ideas and reimplemented them in Python, adding a proper configuration API, cross-platform support, and extensibility through Python's object model.

The project is currently maintained by William Deegan and Gary Oberbrunner, and it's released under the MIT license. The current stable version is 4.10.x. Development happens on GitHub, and the community communicates through a Discord server, IRC (#scons on Libera.Chat), and mailing lists.

How SCons Works

The central idea behind SCons is straightforward: build files should be written in a real programming language, not a domain-specific language with quirky syntax rules.

An SConstruct file is a Python script. You have access to loops, conditionals, functions, classes, and every Python library on your system. There are no special syntax rules to memorize, no tab-sensitivity bugs, and no distinction between spaces and tabs that silently breaks your build. If you can write Python, you can write SCons build files.

SCons also differs from Make in how it determines what needs to be rebuilt. Make compares file timestamps. If you run touch main.c, Make will recompile it even though nothing actually changed.

SCons computes a content hash (MD5 by default) of every source file. If the content hasn't changed, SCons skips the rebuild. This eliminates an entire class of unnecessary recompilations. It also means you never need to run make clean because you are unsure whether the build state is consistent. SCons' build state is always correct, because it tracks content, not time.

Several large projects have used SCons in production. The Godot game engine uses SCons as its build system. MongoDB used SCons for years. PlatformIO, the embedded development ecosystem, uses SCons as its core build engine. National Instruments has used it for projects with over 5,000 source files. NSIS (the Nullsoft Scriptable Install System) and several aerospace projects (including the Aerosonde UAV) have also relied on SCons.

How SCons Compares to Make, CMake, and Meson

Understanding where SCons fits relative to other build tools helps you decide when to reach for it.

SCons versus Make

Make uses a custom DSL that is notoriously finicky. Tabs matter (a space where a tab should be silently does nothing). Variable expansion rules are complex and have multiple flavors (=, :=, ?=, +=). Dependency detection for C/C++ headers requires manual setup or external tools like makedepend or compiler-generated .d files.

Recursive Make (the standard pattern for multi-directory projects) can miss cross-directory dependencies entirely, a problem documented in Peter Miller's famous 1997 paper "Recursive Make Considered Harmful."

SCons solves all of these problems. It scans C/C++ source files automatically, builds a single global dependency graph across all directories in a single pass, and uses content hashing instead of timestamps.

The tradeoff is startup speed. SCons must read every build file and construct the full dependency graph before building anything, which adds overhead that Make doesn't have. On small to medium projects (up to a few thousand source files), this overhead is negligible. On very large projects (tens of thousands of files), it can add several seconds to every invocation.

SCons versus CMake

CMake is not a build tool. It's a meta-build system that generates Makefiles, Ninja files, or Visual Studio project files. You write CMakeLists.txt, run cmake to generate the native build files, then run make or ninja to actually build.

SCons builds directly. There is no generation step. CMake has a much larger ecosystem, better IDE integration (it can generate Xcode projects, Visual Studio solutions, and CLion configurations), and a huge library of find_package modules for locating third-party libraries like Boost, OpenSSL, and Qt. SCons has nothing comparable.

Where SCons wins is in simplicity and debuggability. Your build files are Python. You can print() variables, set breakpoints with pdb, use list comprehensions, and call any Python function. CMake's custom language is harder to debug, has surprising scoping rules, and requires learning a distinct syntax that's not used anywhere else.

SCons versus Meson

Meson is a newer build tool that generates Ninja files for fast parallel builds. It uses a custom DSL that is intentionally not Turing-complete. You can't write loops over source files or call arbitrary external programs during the configuration phase. This sounds limiting, but it prevents an entire class of build file bugs (like accidentally depending on host state that doesn't exist on other developers' machines).

Meson is faster than SCons on large projects because Ninja, its backend, is extremely optimized for incremental builds. Meson also has better built-in support for cross-compilation through a dedicated "cross file" format.

SCons gives you more flexibility through Python, but Meson's opinionated approach catches more mistakes at configuration time and produces faster builds.

The short version: use SCons when you want the full power of Python in your build files, when you need content-based rebuild detection, when you're working on a project that already uses it, or when you're doing embedded work where the build system needs to handle unusual toolchains and file types.

Use CMake when IDE integration and ecosystem size matter most. Use Meson when build speed on large projects is the primary concern.

A Side-by-Side Look at Make Versus SCons

Seeing the same build expressed in both Make and SCons makes the differences concrete. Consider a simple project with two C files and a header.

The Makefile looks like this:

CC = gcc
CFLAGS = -Wall -O2
OBJECTS = main.o utils.o

myapp: $(OBJECTS)
	\((CC) \)(CFLAGS) -o \(@ \)^

main.o: main.c utils.h
	\((CC) \)(CFLAGS) -c $<

utils.o: utils.c utils.h
	\((CC) \)(CFLAGS) -c $<

clean:
	rm -f myapp $(OBJECTS)

This Makefile has 13 lines and requires you to manually list every header dependency. If you add a new header file and forget to update the Makefile, your build will succeed but produce incorrect output. The indented lines must use literal tab characters, not spaces. The $@, $^, and $< automatic variables are cryptic until you memorize them.

The equivalent SConstruct file looks like this:

env = Environment(CCFLAGS=['-Wall', '-O2'])
env.Program('myapp', ['main.c', 'utils.c'])

Two lines. SCons detects the header dependency on utils.h automatically by scanning the #include directives in the source files. There's no clean target because scons -c handles cleanup. There are no tab sensitivity issues because this is Python.

The Makefile approach has one advantage: it starts faster on large projects because it doesn't need to scan every source file for includes.

On a two-file project, this difference is unmeasurable. On a 10,000-file project, the SCons overhead might add 2 to 5 seconds. Whether that tradeoff matters depends on your project size and your tolerance for manual dependency management.

Installing SCons

The simplest installation method is pip, since SCons is a pure Python package with no compiled dependencies.

pip install scons

This installs the scons command globally (or in your active virtual environment). The package name on PyPI is SCons. On some systems, you may need to use pip3 instead of pip to target Python 3.

You can also install through system package managers:

# Debian / Ubuntu
sudo apt install scons

# Fedora
sudo dnf install scons

# macOS with Homebrew
brew install scons

# Arch Linux
sudo pacman -S scons

# Conda
conda install -c conda-forge scons

The pip install line pulls the SCons package from PyPI and places the scons executable on your PATH. System package managers do the same thing but integrate with your OS's package database. Either approach works. The pip method tends to give you the latest version, while system packages may lag behind by one or two releases.

Verify the installation by checking the version.

scons --version

You should see output showing the SCons version number and the Python version it's running under. If the command isn't found, make sure your Python scripts directory is on your PATH. On Linux, this is typically ~/.local/bin for user installs. On macOS with Homebrew Python, it's usually /usr/local/bin or /opt/homebrew/bin.

Core Concepts You Need Before Writing a Build File

SCons organizes builds around five core concepts. Understanding them before you write any code saves confusion later.

The SConstruct Build File

This is the top-level build file. When you run scons in a directory, it looks for a file named SConstruct (capital S, capital C, no file extension). SCons also accepts the alternative names Sconstruct and sconstruct, but the capitalized version is the convention.

This file is a Python script. It defines what to build and how. There is exactly one SConstruct per project, and it lives in the project root.

SConscript Build Files

These are subsidiary build files for subdirectories. The top-level SConstruct calls SConscript('src/SConscript') to pull in build definitions from the src directory.

All file paths inside an SConscript are relative to that SConscript's location, not the project root. The # character at the start of a path means "relative to the SConstruct directory," which is useful for referencing shared include directories from any SConscript at any depth.

For example, #include always refers to the include directory at the project root, regardless of which subdirectory's SConscript uses it.

Construction Environment

This is a Python object (created with Environment()) that holds all the configuration for a build: which compiler to use, what flags to pass, where to find headers, what libraries to link. You can create multiple environments for different build configurations (debug vs. release, or native vs. cross-compiled).

Every environment has a set of construction variables (like CC, CCFLAGS, CPPPATH, LIBS) and a set of builders (like Program, Library, Object). When you modify an environment with env.Append() or env.Replace(), you change the configuration for all subsequent builder calls on that environment. To isolate changes, clone the environment first with env.Clone().

Builder Methods

These are methods on the Environment object that know how to produce specific types of output.

env.Program() compiles and links an executable.
env.StaticLibrary() creates a static library (.a on Linux, .lib on Windows).
env.SharedLibrary() creates a shared library (.so on Linux, .dylib on macOS, .dll on Windows).
env.Object() compiles a single source file to an object file.
env.Command() runs an arbitrary shell command.

Every builder returns a list of Node objects representing the files it will produce. You can define your own builders for file types that SCons doesn't know about, such as protocol buffer definitions, shader files, or firmware images.

Nodes

These are SCons' internal representation of files and directories. When you call env.Object('main.cpp'), you get back a Node object, not a string. You can pass Node objects to other builders, concatenate them with the + operator, and use them anywhere SCons expects a file reference.

Working with Nodes instead of raw strings makes your build files portable across platforms because SCons handles platform-specific file extensions and path separators internally.

You can also create Nodes explicitly: File('foo.c') creates a file Node, Dir('src') creates a directory Node, and Entry('ambiguous') creates a Node whose type (file or directory) SCons determines later.

The Three Environments in SCons

SCons distinguishes three types of environments, and confusing them is a common source of bugs. Understanding the distinction upfront prevents a category of hard-to-diagnose build failures.

The External Environment is your shell's environment, accessible through os.environ in Python. It contains variables like PATH, HOME, PKG_CONFIG_PATH, and anything else you have set in your .bashrc or .zshrc.

SCons doesn't automatically import this environment. This is deliberate. If SCons inherited your shell environment, your build would depend on whatever happened to be set in each developer's shell, making builds non-reproducible. A build that works on your machine but fails on a colleague's machine because they have a different PATH is exactly the kind of problem SCons tries to prevent.

The Construction Environment is the Environment() object you create in your SConstruct file. It holds construction variables that control how SCons invokes tools.

CC specifies the C compiler.
CXX specifies the C++ compiler.
CCFLAGS holds flags for both C and C++ compilation.
CPPPATH lists header search directories.
LIBS lists libraries to link.
LIBPATH lists library search directories.

These variables don't come from your shell. SCons populates them with platform-appropriate defaults (for example, CC defaults to gcc on Linux and cl on Windows with MSVC).

The Execution Environment is a dictionary stored at env['ENV'] inside the construction environment. This is the environment that gets passed to child processes (compilers, linkers, archivers) when SCons runs them.

By default, it contains a minimal PATH sufficient to find the compiler. If your build tools need additional environment variables (for example, a cross-compiler that reads HEXAGON_SDK_ROOT), you must add them to env['ENV'] explicitly.

When a build fails because a tool is "not found," the problem is almost always that the tool is on your shell's PATH (external environment) but not on the execution environment's PATH (env['ENV']['PATH']). The fix is to pass it through:

import os
env = Environment()
env['ENV']['PATH'] = os.environ['PATH']

This line copies your shell's PATH into the execution environment so child processes can find the same tools you can find in your terminal.

A broader approach is env = Environment(ENV=os.environ.copy()), which copies everything, but this reduces reproducibility because your build now depends on every variable in your shell.

Construction Variables Reference

SCons has dozens of construction variables. The ones you'll use most frequently for C/C++ projects are worth knowing by name.

CC is the C compiler command. Defaults to the platform's default C compiler (gcc on Linux, clang on macOS, cl on Windows with MSVC). Override it to use a different compiler or a cross-compiler.

CXX is the C++ compiler command. Same defaults as CC but for C++.

CCFLAGS holds flags passed to both the C and C++ compilers during compilation. Use this for warnings (-Wall), optimization (-O2), and other flags that apply regardless of language.

CFLAGS holds flags passed only to the C compiler. Use this for C-specific flags like -std=c11.

CXXFLAGS holds flags passed only to the C++ compiler. Use this for C++-specific flags like -std=c++17.

CPPPATH is a list of directories to search for header files. SCons translates each entry into a -I flag. The # prefix means relative to the SConstruct directory.

CPPDEFINES is a list of preprocessor definitions. env.Append(CPPDEFINES=['DEBUG', ('VERSION', '2')]) translates to -DDEBUG -DVERSION=2. Using CPPDEFINES instead of adding -D flags to CCFLAGS is preferred because SCons tracks them as structured data and can compare them correctly for rebuild decisions.

LIBS is a list of libraries to link against. LIBS=['pthread', 'm'] translates to -lpthread -lm. You can also pass Node objects returned by StaticLibrary or SharedLibrary builders.

LIBPATH is a list of directories to search for libraries. Translates to -L flags.

LINKFLAGS holds flags passed to the linker. Use this for linker-specific options like -nostdlib, -Wl,--gc-sections, or -static.

AR is the static library archiver command. Defaults to ar on POSIX systems.

LINK is the linker command. Defaults to the C or C++ compiler (which invokes the linker internally).

PROGSUFFIX is the suffix for executable files. Empty on POSIX, .exe on Windows. You rarely need to set this, as SCons detects it from the platform.

All of these variables can be set in the Environment() constructor, modified with env.Append(), env.Prepend(), or env.Replace(), or overridden per-builder-call by passing them as keyword arguments.

Your First SConstruct File

Create a directory for experimentation and put a single C file in it.

// hello.c
#include 

int main() {
    printf("Hello from SCons!\n");
    return 0;
}

This is a minimal C program that prints a message and exits. Nothing complicated. It exists solely to give SCons something to build.

Now create an SConstruct file in the same directory.

Program('hello.c')

This single line is a complete SConstruct file. Program is a default builder that's available without creating an explicit Environment. Behind the scenes, SCons creates a default environment with platform-appropriate compiler settings and uses it for this Program call. It tells SCons to compile hello.c and link it into an executable.

Run the build.

scons

SCons prints output showing the compilation and linking commands it executes. On Linux with GCC, you'll see something like gcc -o hello.o -c hello.c followed by gcc -o hello hello.o. The resulting executable is named hello (on Linux/macOS) or hello.exe (on Windows). SCons derives the output name from the source file name by stripping the extension.

Run scons again without changing anything. SCons prints scons: 'hello' is up to date. and does nothing. It read the content hash of hello.c, compared it to the stored hash from the previous build, and determined that no rebuild was necessary. This is the content-based rebuild detection in action.

Now run touch hello.c and then scons again. SCons still does nothing. The content of hello.c didn't change, so the hash is identical. Make would have recompiled here. SCons does not.

For a slightly more realistic example, create an explicit environment with custom flags.

env = Environment(
    CC='gcc',
    CCFLAGS=['-Wall', '-Wextra', '-O2'],
)
env.Program('hello', 'hello.c')

This version creates a construction environment, sets the compiler to gcc explicitly, enables extra warnings with -Wextra, and optimizes with -O2. The Program call now takes two arguments: the target name 'hello' and the source file 'hello.c'. When you provide both, you control the output name directly.

You can add multiple programs in the same SConstruct:

env = Environment(CCFLAGS=['-Wall', '-O2'])
env.Program('hello', 'hello.c')
env.Program('goodbye', 'goodbye.c')

Running scons builds both executables. Running scons hello builds only the first one. SCons accepts target names on the command line to build selectively.

Building a Multi-File C++ Project Step by Step

A single-file example is useful for verifying your installation, but real projects have multiple source files, libraries, and header directories. This section builds a complete project with all of those elements.

The project structure looks like this:

myproject/
    SConstruct
    include/
        config.h
    lib/
        SConscript
        mathutils.h
        mathutils.cpp
        stringutils.h
        stringutils.cpp
    src/
        SConscript
        main.cpp
        app.h
        app.cpp

This diagram shows a project with three directories beneath the root. The include directory holds a shared configuration header that defines version constants. The lib directory contains two utility modules (math and string operations) that get compiled into a static library called libmyutils.a. The src directory holds the main application code that depends on the library.

Each directory with compilable source files has its own SConscript file. The top-level SConstruct orchestrates everything.

The build system compiles the library first, then the application, and places all build artifacts in a separate build directory to keep the source tree clean. This separation means you can delete the entire build directory and rebuild from scratch without touching any source files.

Create the project directory and all subdirectories first.

mkdir -p myproject/include myproject/lib myproject/src
cd myproject

These commands create the full directory tree. The -p flag on mkdir creates parent directories as needed and does not error if they already exist.

Now create each file. Start with the shared configuration header.

// include/config.h
#ifndef CONFIG_H
#define CONFIG_H
#define APP_VERSION "1.0.0"
#define APP_NAME "SCons Demo"
#endif

This header defines version and name constants that the application code will reference. The include guards (#ifndef / #define / #endif) prevent double-inclusion, which is standard practice in C/C++ headers. Because this header is in the include directory, any source file that wants to use it must have include on its header search path. The SConstruct file handles this through the CPPPATH variable.

Next, the math utility library:

// lib/mathutils.h
#ifndef MATHUTILS_H
#define MATHUTILS_H

int factorial(int n);
double circle_area(double radius);

#endif

// lib/mathutils.cpp
#include "mathutils.h"
#include 

int factorial(int n) {
    if (n <= 1) return 1;
    return n * factorial(n - 1);
}

double circle_area(double radius) {
    return M_PI * radius * radius;
}

The mathutils module provides two functions: a recursive factorial calculation and a circle area computation. The header declares the function signatures so that other translation units can call them. The implementation file defines the function bodies. The cmath include brings in M_PI, the mathematical constant for pi.

When SCons processes mathutils.cpp, it scans the #include directives and discovers that mathutils.cpp depends on both mathutils.h and the system header cmath. If you later modify mathutils.h, SCons knows to recompile mathutils.cpp without any manual dependency declaration.

Now the string utility:

// lib/stringutils.h
#ifndef STRINGUTILS_H
#define STRINGUTILS_H
#include 

std::string to_upper(const std::string& s);

#endif

// lib/stringutils.cpp
#include "stringutils.h"
#include 
#include 

std::string to_upper(const std::string& s) {
    std::string result = s;
    std::transform(result.begin(), result.end(),
                   result.begin(), ::toupper);
    return result;
}

The stringutils module has a single function that converts a string to uppercase using the standard library's transform algorithm. The ::toupper passed as the transformation function is the C locale version from . Together with mathutils, these two modules form a small utility library that the application will link against.

Now the application layer:

// src/app.h
#ifndef APP_H
#define APP_H

void run_app();

#endif

// src/app.cpp
#include "app.h"
#include "config.h"
#include "mathutils.h"
#include "stringutils.h"
#include 

void run_app() {
    std::cout << "Application: " << APP_NAME << std::endl;
    std::cout << "Version: " << APP_VERSION << std::endl;
    std::cout << "5! = " << factorial(5) << std::endl;
    std::cout << "Circle area (r=3): " << circle_area(3.0) << std::endl;
    std::cout << to_upper("hello scons") << std::endl;
}

// src/main.cpp
#include "app.h"

int main() {
    run_app();
    return 0;
}

The app.cpp file includes headers from all three directories: config.h from include, mathutils.h and stringutils.h from lib, and its own app.h.

This cross-directory dependency pattern is common in real projects and is precisely the scenario where Make's manual dependency tracking becomes error-prone. SCons handles it automatically. The main.cpp file is deliberately thin, delegating all work to run_app(). This pattern (a thin main that calls into application logic) makes the code easier to test because you can link app.cpp against a test harness without pulling in main.

Now the build files. Start with the top-level SConstruct:

# SConstruct
import os

env = Environment(
    CPPPATH=['#include', '#lib'],
    CCFLAGS=['-Wall', '-std=c++17'],
)

debug = ARGUMENTS.get('debug', '0')
if debug == '1':
    env.Append(CCFLAGS=['-g', '-O0', '-DDEBUG'])
    variant = 'build/debug'
else:
    env.Append(CCFLAGS=['-O2', '-DNDEBUG'])
    variant = 'build/release'

Export('env')

lib = SConscript('lib/SConscript',
                 variant_dir=variant + '/lib',
                 duplicate=0)

SConscript('src/SConscript',
           variant_dir=variant + '/src',
           duplicate=0,
           exports={'mylib': lib})

This SConstruct file is the control center of the build. The next section walks through every line in detail.

The library's SConscript file:

# lib/SConscript
Import('env')

lib = env.StaticLibrary('myutils', [
    'mathutils.cpp',
    'stringutils.cpp',
])

Return('lib')

This file imports the shared environment, compiles both library source files into a static library named libmyutils.a (on Linux) or myutils.lib (on Windows), and returns the resulting Node to the caller.

The source file paths mathutils.cpp and stringutils.cpp are relative to this SConscript file's directory, which is lib/. You don't need to write lib/mathutils.cpp because SCons already knows the context.

The application's SConscript file:

# src/SConscript
Import('env')
Import('mylib')

app = env.Program(
    target='myapp',
    source=['main.cpp', 'app.cpp'],
    LIBS=[mylib, 'm'],
    LIBPATH=['#build/release/lib', '#build/debug/lib'],
)

Return('app')

This file imports both the shared environment and the library Node. It compiles the application sources and links them against the myutils library and the math library (-lm). The LIBPATH tells the linker where to find libmyutils.a.

Both the debug and release library paths are listed so the linker finds the library regardless of which build variant is active.

Detailed Walkthrough of Every File in the Project

This section explains the SConstruct and SConscript files line by line. Understanding each line is the difference between cargo-culting a build system and being able to modify it confidently.

The SConstruct File

import os

Standard Python import. You might need os.environ later to pass shell environment variables into the build, os.path.join to construct portable file paths, or os.path.exists to check for optional toolchains. Even if you don't use it immediately, having it available is common practice in SConstruct files.

env = Environment(
    CPPPATH=['#include', '#lib'],
    CCFLAGS=['-Wall', '-std=c++17'],
)

Environment() creates a construction environment. This is the central configuration object that holds everything SCons needs to compile and link your code. CPPPATH sets the header search path. The # prefix means "relative to the directory containing SConstruct." So #include resolves to myproject/include and #lib resolves to myproject/lib, regardless of which SConscript file uses this environment.

When SCons invokes the compiler, it translates CPPPATH entries into -I flags automatically: -Iinclude -Ilib. CCFLAGS holds compiler flags passed to both the C and C++ compilers. -Wall enables all standard warnings. -std=c++17 selects the C++17 standard. Note that -std=c++17 is a language standard flag, so it could also go in CXXFLAGS (C++ only), but placing it in CCFLAGS is harmless here because this project has no C files.

debug = ARGUMENTS.get('debug', '0')
if debug == '1':
    env.Append(CCFLAGS=['-g', '-O0', '-DDEBUG'])
    variant = 'build/debug'
else:
    env.Append(CCFLAGS=['-O2', '-DNDEBUG'])
    variant = 'build/release'

ARGUMENTS is a global dictionary that SCons populates from command-line key=value pairs. Running scons debug=1 sets ARGUMENTS['debug'] to the string '1'. The get method provides a default of '0' when the key is absent, so running scons without arguments builds in release mode.

Depending on the value, the code appends debug flags (-g for debug symbols so GDB can show source lines, -O0 for no optimization so variable values are not optimized away, and -DDEBUG to define a preprocessor macro your code can check with #ifdef DEBUG) or release flags (-O2 for optimization and -DNDEBUG to disable assert() statements).

The variant variable determines the output directory for build artifacts. env.Append() adds to an existing variable without overwriting what is already there. If CCFLAGS already contains ['-Wall', '-std=c++17'], appending ['-g', '-O0', '-DDEBUG'] produces ['-Wall', '-std=c++17', '-g', '-O0', '-DDEBUG'].

Export('env')

Export makes the env variable available to SConscript files that call Import('env'). This is SCons' mechanism for sharing data between build files. It works through a global namespace managed by SCons, not through Python's module import system. You can export any Python object: environments, strings, lists, dictionaries, or Node objects. Multiple variables can be exported at once: Export('env', 'version', 'platform').

lib = SConscript('lib/SConscript',
                 variant_dir=variant + '/lib',
                 duplicate=0)

SConscript() reads and executes a subsidiary build file. The first argument is the path to the SConscript file relative to the SConstruct. The variant_dir parameter redirects all build output from lib/ into the variant directory (for example, build/release/lib). This keeps compiled object files and libraries out of your source tree. duplicate=0 tells SCons not to copy (or symlink) source files into the variant directory.

Without this flag, SCons creates copies of your source files inside build/release/lib so that the build tool sees sources and outputs in the same directory. This duplication is rarely necessary and can be confusing because you end up with two copies of every source file. Setting duplicate=0 tells SCons to reference the original source files in place. The return value of SConscript() is whatever the subsidiary file passes to Return(). In this case, it's the Node object representing the built static library.

SConscript('src/SConscript',
           variant_dir=variant + '/src',
           duplicate=0,
           exports={'mylib': lib})

This second SConscript call reads the application's build file. The exports parameter is different from the global Export() function. It passes the library Node (returned from the library SConscript) into the application SConscript under the name mylib.

This is a scoped export: only this specific SConscript call receives mylib. The application SConscript retrieves it with Import('mylib'). This is how the application build file knows about the library without hardcoding paths to .a files.

The Library SConscript

Import('env')

Import retrieves a variable from SCons' global export namespace. This pulls in the environment that the SConstruct file exported with Export('env'). After this line, env refers to the same Environment object created in SConstruct. Any modifications you make to env here will affect it everywhere. If you need local modifications, use env.Clone() first.

lib = env.StaticLibrary('myutils', [
    'mathutils.cpp',
    'stringutils.cpp',
])

env.StaticLibrary() is a builder that compiles the listed source files into object files and then archives them into a static library using ar.

The first argument is the library name. SCons automatically adds the platform-appropriate prefix and suffix: libmyutils.a on Linux/macOS, myutils.lib on Windows. You never need to hard-code these. The source file paths are relative to this SConscript file's directory (which is lib/).

SCons also automatically scans these .cpp files for #include directives to establish implicit dependencies on header files. If mathutils.cpp includes mathutils.h, that dependency is tracked without any action from you.

Return('lib')

Return sends the library Node back to the calling SConscript() function in SConstruct. The string 'lib' is the name of the local variable to return, not a file path. This is similar to a Python return statement, but it works across SCons' build file execution model. You can return multiple values: Return('lib', 'headers').

The Application SConscript

Import('env')
Import('mylib')

Two imports: the shared construction environment (from the global Export) and the library Node (from the scoped exports parameter of the SConscript() call in the SConstruct file). These are separate Import calls, but you can also write Import('env', 'mylib') on a single line.

app = env.Program(
    target='myapp',
    source=['main.cpp', 'app.cpp'],
    LIBS=[mylib, 'm'],
    LIBPATH=['#build/release/lib', '#build/debug/lib'],
)

env.Program() compiles source files and links them into an executable. target is the output executable name (SCons adds .exe on Windows automatically). source lists the C++ files to compile. The order of source files doesn't matter for the final result, but convention is to list main.cpp first.

LIBS specifies libraries to link against. Passing the mylib Node directly (instead of a string like 'myutils') is the correct approach because SCons then knows the exact file dependency and will rebuild the executable if the library changes.

The 'm' string links the system math library (-lm on the command line), needed because mathutils.cpp uses functions from . LIBPATH tells the linker where to search for libraries, translated to -L flags. Both debug and release paths are listed so the correct one is found regardless of build type.

These keyword arguments (LIBS, LIBPATH) override the environment's values for this specific builder call only. They don't modify the shared env.

Return('app')

Returns the application Node to the caller. The SConstruct doesn't use this return value in the current example, but returning it is good practice because it allows future extensions. You might later add env.Install('/usr/local/bin', app) in the SConstruct, or create an env.Alias('run', app, './build/release/src/myapp') to define a scons run command.

Running the Build and Understanding the Output

With all files in place, run the build from the project root.

scons

SCons produces output like this (on Linux with GCC):

scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
g++ -o build/release/lib/mathutils.o -c -Wall -std=c++17 -O2 -DNDEBUG -Iinclude -Ilib lib/mathutils.cpp
g++ -o build/release/lib/stringutils.o -c -Wall -std=c++17 -O2 -DNDEBUG -Iinclude -Ilib lib/stringutils.cpp
ar rc build/release/lib/libmyutils.a build/release/lib/mathutils.o build/release/lib/stringutils.o
ranlib build/release/lib/libmyutils.a
g++ -o build/release/src/main.o -c -Wall -std=c++17 -O2 -DNDEBUG -Iinclude -Ilib src/main.cpp
g++ -o build/release/src/app.o -c -Wall -std=c++17 -O2 -DNDEBUG -Iinclude -Ilib src/app.cpp
g++ -o build/release/src/myapp build/release/src/main.o build/release/src/app.o -Lbuild/release/lib -Lbuild/debug/lib build/release/lib/libmyutils.a -lm
scons: done building targets.

The first two lines show SCons reading all SConstruct and SConscript files. During this phase, it constructs the complete dependency graph in memory. No compilation happens yet.

The "Building targets" section shows the actual commands executed. Each g++ call includes the -I flags derived from CPPPATH (note -Iinclude -Ilib), the flags from CCFLAGS (-Wall -std=c++17 -O2 -DNDEBUG), and the -c flag for compilation (producing an object file, not linking).

The ar rc command creates the static library archive, and ranlib generates the archive index so the linker can find symbols efficiently.

The final g++ line links everything together, with -L flags from LIBPATH pointing the linker to the library directories, the explicit library file path, and -lm for the system math library.

Run the resulting executable:

./build/release/src/myapp

The output is:

Application: SCons Demo
Version: 1.0.0
5! = 120
Circle area (r=3): 28.2743
HELLO SCONS

Each line corresponds to a function call in run_app(). The version and name come from config.h. The factorial and circle area come from mathutils. The uppercase string comes from stringutils. All libraries linked correctly and all header paths resolved.

Now build the debug version:

scons debug=1

This creates a parallel set of build artifacts under build/debug/. The release build artifacts under build/release/ remain untouched.

You can switch between debug and release builds without triggering a full recompile of the other variant. Each variant has its own .o files, .a library, and executable. The directory structure under build/debug/ mirrors build/release/.

What Happens During an Incremental Build

Understanding what SCons does on the second and subsequent builds helps you trust the system and diagnose unexpected rebuilds.

Run scons again after a successful build. The output is:

scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `.' is up to date.
scons: done building targets.

SCons still reads every SConscript file and constructs the full dependency graph. It then walks the graph and checks every node.

For each source file, it computes the content hash and compares it to the hash stored in .sconsign.dblite. For each target file, it checks whether the source hashes, compiler command, and flags match the values from the previous build. Everything matches, so nothing is rebuilt.

Now modify lib/mathutils.h by adding a new function declaration:

// Add this line to mathutils.h
int fibonacci(int n);

Run scons again. SCons recompiles mathutils.cpp (because it includes mathutils.h, which changed), recompiles app.cpp (because it also includes mathutils.h), re-archives the static library (because mathutils.o changed), and re-links the executable (because both the library and app.o changed).

It doesn't recompile stringutils.cpp (it doesn't include mathutils.h) or main.cpp (it only includes app.h, which didn't change).

This is the dependency graph at work. SCons knows the complete chain: mathutils.h changed, so every file that directly or transitively depends on it gets rebuilt. Files that don't depend on it are untouched. You didn't need to specify any of these dependencies manually.

Now add a comment to stringutils.cpp without changing any actual code:

// This is just a comment
#include "stringutils.h"

Run scons. SCons recompiles stringutils.cpp because its content hash changed (comments are part of the content).

But here's where SCons gets clever: after recompiling, it computes the hash of the new stringutils.o. If the compiler produced an identical object file (which it often does for comment-only changes because comments don't affect the compiled output), SCons doesn't re-archive the library or re-link the executable.

This "short-circuiting" behavior prevents unnecessary downstream rebuilds. Make can't do this because it only looks at timestamps, not content.

Cross-Compiling for QuRT (Qualcomm Real-Time OS)

One of SCons' strengths is that setting up cross-compilation does not require a separate toolchain file format (like CMake's toolchain files). You configure everything in Python, using the same Environment API you already know.

What is QuRT

QuRT is Qualcomm's proprietary real-time operating system that runs on the Hexagon DSP (Digital Signal Processor) found in Snapdragon processors. The Hexagon DSP is a separate processor core on the Snapdragon SoC (System on Chip), distinct from the ARM application cores that run Android or Linux.

While the ARM cores handle the user interface and general application logic, the Hexagon DSP handles computationally intensive, latency-sensitive tasks: audio processing, sensor fusion, camera image processing, and machine learning inference.

QuRT provides the threading, memory management, and interrupt handling layer on the Hexagon DSP. It's a microkernel RTOS with hard real-time guarantees: interrupt latencies are bounded and predictable, which is essential for applications like audio where a missed deadline produces an audible glitch. QuRT supports POSIX-like threading (with qurt_thread_create instead of pthread_create), mutexes, semaphores, signals, and memory-mapped I/O.

Building code for QuRT requires the Hexagon SDK, which includes the Hexagon compiler (hexagon-clang and hexagon-clang++), linker, assembler, archiver, and QuRT-specific system headers and libraries. The SDK also includes a simulator (hexagon-sim) that can run Hexagon binaries on your development machine for testing without physical hardware.

The Hexagon SDK Directory Structure

The Hexagon SDK follows a specific layout that you need to know to configure your build system. A typical installation looks like this:

$HEXAGON_SDK_ROOT/
    tools/
        HEXAGON_Tools/
            8.8.06/
                Tools/
                    bin/
                        hexagon-clang
                        hexagon-clang++
                        hexagon-ar
                        hexagon-ranlib
                        hexagon-as
                        hexagon-sim
                    include/
                    lib/
    rtos/
        qurt/
            computev66/
                include/
                    qurt.h
                    qurt_thread.h
                    qurt_mutex.h
                    posix/
                lib/
                    libqurt.a
            computev73/
                include/
                lib/
    libs/
        common/

The tools/HEXAGON_Tools directory contains the compiler toolchain. The version number (like 8.8.06) corresponds to the Hexagon Tools release. The rtos/qurt directory contains the QuRT kernel headers and prebuilt libraries, organized by architecture variant. computev66 targets the Hexagon V66 architecture (found in older Snapdragon chips), while computev73 targets the V73 (found in newer ones like Snapdragon 8 Gen 2). Each variant has its own include and lib directories because the kernel is compiled differently for each architecture version.

The Cross-Compilation SConstruct

The following SConstruct file configures a cross-compilation environment for QuRT. It assumes the Hexagon SDK is installed and the HEXAGON_SDK_ROOT environment variable points to it.

# SConstruct for QuRT / Hexagon cross-compilation
import os
import sys

hexagon_sdk = os.environ.get('HEXAGON_SDK_ROOT',
                              '/opt/hexagon/sdk')
if not os.path.isdir(hexagon_sdk):
    print('Error: HEXAGON_SDK_ROOT not set or directory does not exist')
    print('Set it with: export HEXAGON_SDK_ROOT=/path/to/hexagon/sdk')
    Exit(1)

hexagon_tools = os.path.join(hexagon_sdk, 'tools', 'HEXAGON_Tools')
hexagon_ver = os.environ.get('HEXAGON_TOOLS_VER', '8.8.06')
tool_base = os.path.join(hexagon_tools, hexagon_ver, 'Tools')
tool_bin = os.path.join(tool_base, 'bin')

hexagon_arch = ARGUMENTS.get('arch', 'v73')
qurt_root = os.path.join(hexagon_sdk, 'rtos', 'qurt')
qurt_variant = 'compute' + hexagon_arch
qurt_inc = os.path.join(qurt_root, qurt_variant, 'include')
qurt_lib = os.path.join(qurt_root, qurt_variant, 'lib')

env = Environment(
    CC=os.path.join(tool_bin, 'hexagon-clang'),
    CXX=os.path.join(tool_bin, 'hexagon-clang++'),
    AR=os.path.join(tool_bin, 'hexagon-ar'),
    RANLIB=os.path.join(tool_bin, 'hexagon-ranlib'),
    AS=os.path.join(tool_bin, 'hexagon-as'),
    LINK=os.path.join(tool_bin, 'hexagon-clang++'),
    CPPPATH=[
        '#include',
        '#lib',
        qurt_inc,
        os.path.join(qurt_inc, 'posix'),
    ],
    CCFLAGS=[
        '-m' + hexagon_arch,
        '-G0',
        '-Wall',
        '-O2',
        '-fPIC',
        '-DQURT',
        '-D__QURT',
    ],
    LINKFLAGS=[
        '-m' + hexagon_arch,
        '-G0',
        '-nostdlib',
    ],
    LIBPATH=[
        '#build/qurt/lib',
        qurt_lib,
    ],
    LIBS=[
        'qurt',
        'qcc',
        'timer',
    ],
    ENV={
        'PATH': tool_bin + ':' + os.environ.get('PATH', ''),
        'HEXAGON_SDK_ROOT': hexagon_sdk,
    },
)

env['CCCOMSTR'] = '  HEX-CC   $TARGET'
env['CXXCOMSTR'] = '  HEX-CXX  $TARGET'
env['LINKCOMSTR'] = '  HEX-LINK $TARGET'
env['ARCOMSTR'] = '  HEX-AR   $TARGET'

Export('env')

lib = SConscript('lib/SConscript',
                 variant_dir='build/qurt/lib',
                 duplicate=0)

SConscript('src/SConscript',
           variant_dir='build/qurt/src',
           duplicate=0,
           exports={'mylib': lib})

This file does a lot, so it's worth going through the key parts in detail.

The first block validates and constructs file paths to the Hexagon toolchain. HEXAGON_SDK_ROOT is the standard environment variable set when you install the Hexagon SDK. If it's not set, the build exits with a clear error message instead of failing later with a cryptic "compiler not found" error. The tool_bin variable points to the directory containing hexagon-clang, hexagon-clang++, hexagon-ar, and other cross-compilation tools.

The architecture is configurable through the command line with scons arch=v66 or scons arch=v73. The hexagon_arch variable defaults to v73 and feeds into both the compiler flags (-mv73) and the QuRT directory path (computev73). This makes it easy to target different Hexagon versions from the same build file.

The qurt_root, qurt_inc, and qurt_lib variables locate the QuRT headers and prebuilt libraries. The posix subdirectory inside the include path contains POSIX-compatible wrappers that let you use familiar function signatures (like pthread_mutex_init) that map to QuRT's native API underneath.

The Environment() call overrides every tool. CC, CXX, AR, RANLIB, AS, and LINK all point to the Hexagon cross-compiler tools instead of the host system's native compiler.

This is the fundamental mechanism for cross-compilation in SCons: you swap out the tools in the construction environment. The same SConscript files that work for native builds work for cross-builds because they only interact with the environment through the env variable, never by calling gcc directly.

The CCFLAGS array contains Hexagon-specific flags. -mv73 (assembled from -m + the architecture variable) targets the V73 architecture and tells the compiler to generate Hexagon V73 instructions.

-G0 disables the small data section. On the Hexagon DSP, the small data section uses a special register (GP) for faster access to small global variables, but disabling it with -G0 is standard practice for shared libraries and position-independent code where the GP register cannot be relied upon.

-fPIC generates position-independent code, required for shared objects on the DSP. The -DQURT and -D__QURT defines are preprocessor macros that QuRT headers and application code check with #ifdef to detect a QuRT build and enable RTOS-specific code paths.

The LINKFLAGS include -nostdlib because QuRT provides its own C runtime. The standard GNU C library (glibc) is built for Linux and would pull in Linux system calls that don't exist on the Hexagon DSP. QuRT provides its own versions of functions like malloc, printf, and memcpy that are implemented on top of the QuRT kernel.

The LIBS list specifies QuRT-specific libraries: qurt (the RTOS kernel interface, providing threading, mutexes, and memory management), qcc (Qualcomm C compiler runtime, providing low-level arithmetic helpers and compiler intrinsics), and timer (hardware timer access for profiling and delay functions).

The ENV dictionary controls what environment the child processes (compilers, linkers) see when SCons invokes them. The Hexagon tool binary directory is prepended to PATH so that tools can find each other (for example, hexagon-clang may internally invoke hexagon-as for assembly steps). HEXAGON_SDK_ROOT is passed through because some Hexagon tools reference it internally to locate standard headers and runtime libraries.

The CCCOMSTR, CXXCOMSTR, LINKCOMSTR, and ARCOMSTR variables customize the build output. Instead of printing the full compiler command line (which can be hundreds of characters long with all the flags and paths), SCons prints a short summary like HEX-CXX build/qurt/lib/mathutils.o. This makes it easy to see at a glance that you're using the cross-compiler, not the host compiler.

To see the full commands (useful for debugging), remove these four lines or run scons with verbose=1 and add the corresponding check in the SConstruct.

Everything after the environment setup is identical to the native build: Export, SConscript calls with variant directories, and the same library and application SConscript files.

The SConscript files don't know or care whether they're building for the host or for QuRT. They just use whatever environment they receive through Import('env'). This separation is a key design advantage. Your build logic (what files to compile, what libraries to create) stays in the SConscript files. Your toolchain configuration stays in the SConstruct.

To build for QuRT, set the SDK path and run SCons.

export HEXAGON_SDK_ROOT=/path/to/hexagon/sdk
scons

The output shows the Hexagon compiler being invoked instead of GCC.

  HEX-CXX  build/qurt/lib/mathutils.o
  HEX-CXX  build/qurt/lib/stringutils.o
  HEX-AR   build/qurt/lib/libmyutils.a
  HEX-CXX  build/qurt/src/main.o
  HEX-CXX  build/qurt/src/app.o
  HEX-LINK build/qurt/src/myapp

Each line confirms that the Hexagon tools are running, not the host tools. The resulting myapp binary is a Hexagon executable. You can't run it directly on your development machine (it contains Hexagon instructions, not x86 or ARM). To test it, use the Hexagon simulator: hexagon-sim build/qurt/src/myapp.

To target a different Hexagon architecture, pass the arch argument.

scons arch=v66

This changes the compiler flag to -mv66 and selects the computev66 QuRT headers and libraries. Everything else remains the same.

Writing QuRT-Specific Application Code

Real QuRT applications use the RTOS API for threading, synchronization, and hardware interaction. The following example replaces the generic main.cpp with a QuRT-specific version that creates threads and uses a mutex.

// src/main_qurt.cpp
#include "app.h"
#include 
#include 
#include 
#include 

#define STACK_SIZE 4096

static qurt_mutex_t print_mutex;
static char worker_stack[STACK_SIZE];

void worker_thread(void *arg) {
    int id = (int)(long)arg;
    qurt_mutex_lock(&print_mutex);
    printf("Worker thread %d running on QuRT\n", id);
    run_app();
    qurt_mutex_unlock(&print_mutex);
    qurt_thread_exit(0);
}

int main() {
    qurt_thread_t thread_id;
    qurt_thread_attr_t attr;

    qurt_mutex_init(&print_mutex);

    qurt_thread_attr_init(&attr);
    qurt_thread_attr_set_name(&attr, "worker");
    qurt_thread_attr_set_stack_addr(&attr, worker_stack);
    qurt_thread_attr_set_stack_size(&attr, STACK_SIZE);
    qurt_thread_attr_set_priority(&attr, 100);

    qurt_thread_create(&thread_id, &attr,
                       worker_thread, (void *)1);

    int status;
    qurt_thread_join(thread_id, &status);

    qurt_mutex_destroy(&print_mutex);
    return 0;
}

This code demonstrates the core QuRT threading API.

qurt_mutex_init initializes a mutex for synchronizing access to printf (which isn't thread-safe on QuRT without protection).
qurt_thread_attr_init creates a thread attribute structure, and the subsequent calls configure the thread's name (visible in the debugger), stack memory (you provide the buffer, QuRT doesn't allocate it for you), stack size (4096 bytes is typical for lightweight threads), and priority (QuRT uses priority-based preemptive scheduling where lower numbers mean higher priority).
qurt_thread_create spawns the thread, passing a function pointer and an argument.
qurt_thread_join blocks until the thread completes, similar to pthread_join.
qurt_mutex_destroy cleans up the mutex.

Several differences from POSIX threading matter for correctness. On QuRT, you must provide the stack memory yourself as a statically allocated buffer (or dynamically allocated via qurt_malloc). The RTOS doesn't have a general-purpose malloc-like stack allocator the way Linux does. Thread priorities are explicit and mandatory – there's no default priority. And qurt_thread_exit must be called at the end of every thread function: falling off the end of the function without calling it is undefined behavior on QuRT.

To build with this QuRT-specific main instead of the generic one, modify the src/SConscript to select the right file:

# src/SConscript (QuRT-aware version)
Import('env')
Import('mylib')

import os
is_qurt = 'DQURT' in ' '.join(env.get('CCFLAGS', []))

main_src = 'main_qurt.cpp' if is_qurt else 'main.cpp'

app = env.Program(
    target='myapp',
    source=[main_src, 'app.cpp'],
    LIBS=[mylib, 'm'],
    LIBPATH=['#build/qurt/lib', '#build/release/lib', '#build/debug/lib'],
)

Return('app')

This SConscript inspects the environment's CCFLAGS to determine whether the QuRT preprocessor define is present. If it is, the build uses main_qurt.cpp. If not, it uses the standard main.cpp.

This is a simple example of using Python logic in a build file to adapt to different targets, something that requires convoluted syntax in Make and a separate toolchain file in CMake.

Building Both Native and QuRT From One SConstruct

If you need both a native build (for running unit tests on your development machine) and a QuRT build (for deployment to the DSP), you can configure both in a single SConstruct.

# SConstruct (dual-target: native + QuRT)
import os
import sys

native_env = Environment(
    CPPPATH=['#include', '#lib'],
    CCFLAGS=['-Wall', '-std=c++17', '-O2'],
)

hexagon_sdk = os.environ.get('HEXAGON_SDK_ROOT', '')
build_qurt = os.path.isdir(hexagon_sdk)

if build_qurt:
    hexagon_tools = os.path.join(hexagon_sdk, 'tools', 'HEXAGON_Tools')
    hexagon_ver = os.environ.get('HEXAGON_TOOLS_VER', '8.8.06')
    tool_bin = os.path.join(hexagon_tools, hexagon_ver, 'Tools', 'bin')
    hexagon_arch = ARGUMENTS.get('arch', 'v73')
    qurt_root = os.path.join(hexagon_sdk, 'rtos', 'qurt')
    qurt_variant = 'compute' + hexagon_arch
    qurt_inc = os.path.join(qurt_root, qurt_variant, 'include')
    qurt_lib = os.path.join(qurt_root, qurt_variant, 'lib')

    qurt_env = Environment(
        CC=os.path.join(tool_bin, 'hexagon-clang'),
        CXX=os.path.join(tool_bin, 'hexagon-clang++'),
        AR=os.path.join(tool_bin, 'hexagon-ar'),
        RANLIB=os.path.join(tool_bin, 'hexagon-ranlib'),
        LINK=os.path.join(tool_bin, 'hexagon-clang++'),
        CPPPATH=['#include', '#lib', qurt_inc,
                 os.path.join(qurt_inc, 'posix')],
        CCFLAGS=['-m' + hexagon_arch, '-G0', '-Wall',
                 '-O2', '-fPIC', '-DQURT', '-D__QURT'],
        LINKFLAGS=['-m' + hexagon_arch, '-G0', '-nostdlib'],
        LIBPATH=[qurt_lib],
        LIBS=['qurt', 'qcc', 'timer'],
        ENV={'PATH': tool_bin + ':' + os.environ.get('PATH', ''),
             'HEXAGON_SDK_ROOT': hexagon_sdk},
    )
    qurt_env['CXXCOMSTR'] = '  HEX-CXX  $TARGET'
    qurt_env['LINKCOMSTR'] = '  HEX-LINK $TARGET'
    qurt_env['ARCOMSTR'] = '  HEX-AR   $TARGET'

native_lib = SConscript('lib/SConscript',
                        variant_dir='build/native/lib',
                        duplicate=0,
                        exports={'env': native_env})
SConscript('src/SConscript',
           variant_dir='build/native/src',
           duplicate=0,
           exports={'env': native_env, 'mylib': native_lib})

if build_qurt:
    qurt_lib_node = SConscript('lib/SConscript',
                               variant_dir='build/qurt/lib',
                               duplicate=0,
                               exports={'env': qurt_env})
    SConscript('src/SConscript',
               variant_dir='build/qurt/src',
               duplicate=0,
               exports={'env': qurt_env, 'mylib': qurt_lib_node})

Each SConscript call passes a different environment through the exports parameter. The SConscript files themselves remain completely unchanged from the single-target version. SCons executes both variants in a single invocation and correctly handles dependencies between them. The native build always runs. The QuRT build runs only when HEXAGON_SDK_ROOT points to a valid directory. This means developers who don't have the Hexagon SDK installed can still build and test the native version without errors.

This pattern shows why Python build files are powerful. Conditional logic, environment detection, path validation, and multi-target builds all use standard Python constructs. There's no special cross-compilation syntax to learn, no separate toolchain file format, and no need to run the build tool twice with different arguments.

How SCons Detects Dependencies and Decides What to Rebuild

SCons ships with built-in scanners for C/C++ (#include directives), Fortran (INCLUDE and USE statements), Java (import statements), D (import statements), and LaTeX (\include and \input commands).

When SCons compiles app.cpp, it reads the file, finds #include "config.h", #include "mathutils.h", and the other includes, resolves them against the CPPPATH search path, and automatically adds those headers to the dependency graph.

If you change mathutils.h, SCons knows to recompile app.cpp even though you didn't list that dependency anywhere. Make requires you to set this up manually or use a tool like gcc -MM to generate dependency files, and if you forget, your build produces incorrect results silently.

The default rebuild strategy uses content hashing. SCons computes an MD5 hash of every source file and stores it in a database file called .sconsign.dblite in the project root. On the next build, it recomputes hashes and compares. If the hash hasn't changed, the file isn't rebuilt.

This extends to the build outputs themselves: if recompiling a .cpp file produces an identical .o file (for example, because you only changed a comment), SCons won't re-link the final executable.

This "short-circuiting" behavior can save significant time on large projects where a header change triggers recompilation of many files but only a few actually produce different object code.

The .sconsign.dblite file stores more than just content hashes. It records the full build signature for each target: the content hashes of all source files, the compiler command line (including all flags), and the implicit dependencies discovered by scanners. If you change a compiler flag (for example, switching from -O2 to -O3), SCons detects that the build signature has changed and recompiles everything, even though no source files changed. Make can't do this because it only tracks file timestamps.

You can change the rebuild strategy with the Decider function:

Decider('content')            # Default: MD5 hash comparison
Decider('timestamp-newer')    # Make-like: rebuild if source is newer
Decider('timestamp-match')    # Rebuild if timestamp changed at all
Decider('content-timestamp')  # Hybrid: only hash if timestamp changed

'content' is the default and the most correct. It reads every source file on every build to compute hashes, which is thorough but adds I/O overhead.

'timestamp-newer' mimics Make's behavior: rebuild if the source file's modification time is newer than the target's. This is fast but misses cases where a file is restored from backup (older timestamp, different content).

'timestamp-match' rebuilds if the timestamp has changed in either direction, which handles the restore case.

'content-timestamp' is the best hybrid: it only reads file contents (to compute hashes) when the timestamp has changed, skipping the I/O for files that haven't been touched. On projects with thousands of source files, this can cut SCons' startup overhead noticeably.

You can also change the hash algorithm:

SetOption('hash_format', 'sha256')

This switches from MD5 to SHA-256. MD5 is not collision-resistant for adversarial inputs, but for build system purposes (detecting accidental changes to source files), it's perfectly adequate. SHA-256 is an option for environments with strict compliance requirements.

You can write a custom decider function for specialized rebuild logic:

def my_decider(dependency, target, prev_ni, repo_node=None):
    return dependency.get_timestamp() != prev_ni.timestamp

env.Decider(my_decider)

The custom decider receives the dependency node, the target node, and the "node info" from the previous build. It returns True to trigger a rebuild or False to skip. This is useful for exotic scenarios like triggering rebuilds based on external state (database versions, API schemas) that aren't captured by file content.

Writing a Custom Scanner

If your project uses a file format that includes other files (similar to C's #include), you can write a custom scanner so SCons tracks those dependencies automatically.

Consider a custom configuration file format where @import filename.cfg includes another file:

import re

import_re = re.compile(r'^@import\s+(\S+)', re.MULTILINE)

def cfg_scan(node, env, path):
    contents = node.get_text_contents()
    includes = import_re.findall(contents)
    return [env.File(f) for f in includes]

cfg_scanner = Scanner(
    function=cfg_scan,
    skeys=['.cfg'],
    recursive=True,
)

env.Append(SCANNERS=cfg_scanner)

The cfg_scan function reads the file contents, finds all @import directives using a regular expression, and returns a list of File nodes representing the imported files.

The skeys parameter tells SCons to apply this scanner to files with the .cfg extension.

The recursive=True parameter tells SCons to scan the imported files as well, so transitive dependencies are tracked. After appending the scanner to the environment, any builder that processes .cfg files will automatically detect and track @import dependencies.

The Shared Build Cache

SCons supports CacheDir, a shared build cache that stores compiled artifacts indexed by their build signature (a hash incorporating the source content, compiler command, and flags). If another developer on your team has already built an identical configuration, you get the cached result instead of recompiling.

CacheDir('/shared/network/build_cache')

This line is all you need to enable caching. When SCons builds a file, it stores a copy in the cache directory, named by the build signature hash. On subsequent builds (by you or anyone else pointing to the same cache), if the build signature matches, the cached file is copied into the build directory instead of running the compiler. This works like ccache but applies to any build artifact, not just compiled objects. Libraries, executables, generated code, and any other builder output can be cached.

The build signature is comprehensive. It incorporates the content hashes of all source files, the full compiler command line (including flags), and the tool version. Different compiler flags produce different cache entries, so debug and release builds don't interfere with each other. If two developers use the same compiler version and the same flags on the same source code, they share cache hits.

Several command-line flags control cache behavior:

scons --cache-show       # Show what command would have run for cached targets
scons --cache-disable    # Ignore cache for this run
scons --cache-readonly   # Read from cache but do not write new entries
scons --cache-force      # Update cache even if target is up to date

--cache-show is useful for debugging. When a target is retrieved from cache, SCons normally prints nothing (or a short message). With --cache-show, it prints the command that would have been executed, so you can verify the cached entry matches your expectations.

--cache-readonly is useful for CI systems that should consume cache entries built by developers but not pollute the cache with CI-specific configurations.

Working with Shared Libraries

Building shared libraries (.so on Linux, .dylib on macOS, .dll on Windows) requires different compiler and linker flags than static libraries. SCons handles most of this automatically through the SharedLibrary builder.

env = Environment()
shared_lib = env.SharedLibrary('myutils', [
    'mathutils.cpp',
    'stringutils.cpp',
])

On Linux, this produces libmyutils.so. SCons automatically adds -fPIC to the compilation flags for source files that go into a shared library (it uses SharedObject internally instead of StaticObject). On Windows, it produces myutils.dll plus myutils.lib (the import library).

For versioned shared libraries on POSIX systems, use the SHLIBVERSION parameter:

shared_lib = env.SharedLibrary('myutils', sources,
                                SHLIBVERSION='1.2.3')

This produces three files: libmyutils.so.1.2.3 (the actual library), libmyutils.so.1 (the soname symlink used at runtime), and libmyutils.so (the development symlink used at link time). SCons creates all three and manages the symlinks.

You can't mix StaticObject and SharedObject files. If you compile a file with env.Object() (which creates a static object without -fPIC), you can't put it into a SharedLibrary. SCons enforces this and produces an error if you try. If you need the same source file compiled both ways, call each builder separately.

static_objs = [env.StaticObject(f) for f in sources]
shared_objs = [env.SharedObject(f) for f in sources]

static_lib = env.StaticLibrary('myutils', static_objs)
shared_lib = env.SharedLibrary('myutils', shared_objs)

Each source file gets compiled twice: once without -fPIC for the static library, once with -fPIC for the shared library. The resulting object files have different names (SCons appends different suffixes) so they don't collide.

Adding Command-Line Options with AddOption

The ARGUMENTS dictionary works for simple key=value pairs, but for more complex command-line interfaces (flags like --prefix, --enable-feature, or --with-library), use AddOption.

AddOption('--prefix',
    dest='prefix',
    type='string',
    nargs=1,
    action='store',
    metavar='DIR',
    default='/usr/local',
    help='Installation prefix (default: /usr/local)')

AddOption('--enable-tests',
    dest='enable_tests',
    action='store_true',
    default=False,
    help='Build and run unit tests')

prefix = GetOption('prefix')
build_tests = GetOption('enable_tests')

env = Environment(PREFIX=prefix)

app = env.Program('myapp', sources)
env.Install(os.path.join(prefix, 'bin'), app)

if build_tests:
    test_env = env.Clone()
    test_env.Program('test_runner', test_sources)

AddOption uses Python's optparse module under the hood, so the parameter names (dest, type, action, metavar, default, help) follow the same conventions. GetOption retrieves the parsed value. These options appear in scons --help output alongside SCons' built-in options, giving users a clean command-line interface.

Running scons --prefix=/opt/myapp --enable-tests installs to /opt/myapp/bin and builds the test suite. Running scons --help shows all available options with their descriptions.

The advantage over ARGUMENTS is discoverability. ARGUMENTS requires the user to know which key=value pairs your build file accepts. AddOption makes them visible in --help output and provides type checking and default values.

Configure Checks for Portability

SCons includes an autoconf-like system for probing the build environment. You can check for headers, libraries, functions, and type sizes before building.

env = Environment()
conf = Configure(env)

if not conf.CheckCHeader('math.h'):
    print('Error: math.h not found')
    Exit(1)

if not conf.CheckCXXHeader('iostream'):
    print('Error: C++ standard library headers not found')
    Exit(1)

if not conf.CheckLib('pthread', language='C'):
    print('Error: pthread library not found')
    Exit(1)

if conf.CheckFunc('posix_memalign'):
    conf.env.Append(CPPDEFINES=['HAVE_POSIX_MEMALIGN'])

if conf.CheckFunc('aligned_alloc'):
    conf.env.Append(CPPDEFINES=['HAVE_ALIGNED_ALLOC'])

if conf.CheckTypeSize('long') == 8:
    conf.env.Append(CPPDEFINES=['HAVE_64BIT_LONG'])

env = conf.Finish()

Configure() creates a configuration context that compiles and links small test programs behind the scenes to determine whether headers exist, libraries can be linked, and functions are available. Each Check method writes a tiny C or C++ program, compiles it with the current environment settings, and returns True or False based on whether compilation and linking succeeded. conf.Finish() returns the (possibly modified) environment and cleans up.

CheckCHeader verifies that a C header can be included. CheckCXXHeader does the same for C++ headers. CheckLib verifies that a library can be linked; the language parameter determines whether to use the C or C++ compiler for the test. CheckFunc checks whether a function is available (it creates a test program that references the function and attempts to link it). CheckTypeSize compiles a program that uses sizeof() and returns the size as an integer.

The CPPDEFINES added by the checks (like HAVE_POSIX_MEMALIGN) follow the standard autoconf convention. Your source code can then use these defines:

#ifdef HAVE_POSIX_MEMALIGN
    posix_memalign(&ptr, alignment, size);
#elif defined(HAVE_ALIGNED_ALLOC)
    ptr = aligned_alloc(alignment, size);
#else
    ptr = malloc(size);
#endif

This pattern makes your code portable across systems that may or may not have specific functions, without hardcoding platform assumptions.

Configure checks are cached in .sconf_temp/ and .sconsign.dblite. On subsequent builds, if the environment hasn't changed, SCons skips the checks and uses the cached results. You can force rechecking with scons --config=force.

Custom Builders for Non-Standard File Types

You can define builders for file types that SCons doesn't know about. A builder wraps a shell command (or a Python function) with source/target suffix handling.

Builder with an External Command

protobuf = Builder(
    action='protoc --cpp_out=\(TARGET.dir \)SOURCE',
    suffix='.pb.cc',
    src_suffix='.proto',
)
env.Append(BUILDERS={'Protobuf': protobuf})
env.Protobuf('messages.proto')

This creates a Protobuf builder that runs protoc on .proto files and produces .pb.cc files. The action string uses SCons variable substitution: $SOURCE expands to the input file path and $TARGET.dir expands to the directory of the output file. The suffix and src_suffix parameters let SCons infer target and source file names automatically. After appending the builder to the environment, you call env.Protobuf('messages.proto') and SCons produces messages.pb.cc.

The critical detail: use env.Append(BUILDERS={...}) to add your builder. If you set BUILDERS directly in the Environment() constructor, like Environment(BUILDERS={'Protobuf': protobuf}), you overwrite the entire builder dictionary and lose all the default builders (Program, Library, Object, and so on).

Builder with a Python Function

def generate_version_header(target, source, env):
    version = env.get('APP_VERSION', '0.0.0')
    with open(str(target[0]), 'w') as f:
        f.write('#ifndef VERSION_H\n')
        f.write('#define VERSION_H\n')
        f.write('#define VERSION "%s"\n' % version)
        f.write('#endif\n')
    return 0

version_builder = Builder(action=generate_version_header,
                           suffix='.h',
                           src_suffix='.ver')
env.Append(BUILDERS={'VersionHeader': version_builder})
env.VersionHeader('version.h', 'version.ver',
                  APP_VERSION='2.1.0')

The Python function receives three arguments: target (a list of target Node objects), source (a list of source Node objects), and env (the construction environment). Node objects must be converted to strings with str() to get the file path. The function must return 0 for success or a non-zero value for failure.

Using a Python function instead of a shell command is useful when the build step involves logic that is awkward to express in shell (like reading a file, parsing JSON, or generating code with complex structure).

The Command Builder for One-Off Rules

For build rules that are used only once, the Command builder avoids the overhead of defining a named builder.

env.Command('config.h', 'config.h.in',
            "sed 's/@VERSION@/1.0.0/g' < \(SOURCE > \)TARGET")

This runs sed to substitute a version placeholder in config.h.in and writes the result to config.h. The Command builder is the SCons equivalent of a Make rule with a custom recipe. It takes the target, source, and action as arguments. The action can be a shell command string, a Python function, or a list of either.

Aliases, Default Targets, and Install Rules

env.Alias() creates named targets you can invoke from the command line. Default() specifies what gets built when you run scons with no arguments.

app = env.Program('myapp', sources)
tests = env.Program('test_runner', test_sources)

Default(app)
env.Alias('test', tests)
env.Alias('all', [app, tests])

Running scons builds only myapp because it's the default target. Running scons test builds the test executable. Running scons all builds everything. Without the Default call, SCons builds everything in the current directory and below, which includes both the application and the tests.

Install targets copy built files to a destination directory.

env.Install('/usr/local/bin', app)
env.Install('/usr/local/lib', shared_lib)
env.InstallAs('/usr/local/bin/my-application', app)

env.Alias('install', '/usr/local/bin')
env.Alias('install', '/usr/local/lib')

env.Install() copies the specified file to the destination directory. env.InstallAs() copies it with a different name. Install targets aren't built by default because they write outside the project tree. You must invoke them explicitly with scons install (which works because the Alias connects the name "install" to the install directories).

You can combine Alias with a command action to create a "run" target.

env.Alias('run', app, './build/release/src/myapp')

Running scons run builds the application (if needed) and then executes it. The third argument to Alias is an action that runs after the target is built.

Platform-Specific Configuration

Because SConstruct files are Python, platform-specific configuration uses standard Python constructs.

import sys
import os

env = Environment(
    CPPPATH=['#include'],
    CCFLAGS=['-Wall'],
)

if sys.platform == 'win32':
    env.Append(LIBS=['ws2_32', 'advapi32'])
    env.Append(CPPDEFINES=['_WIN32', 'NOMINMAX'])
elif sys.platform == 'darwin':
    env.Append(FRAMEWORKS=['CoreFoundation', 'Security'])
    env.Append(CCFLAGS=['-mmacosx-version-min=10.15'])
elif sys.platform.startswith('linux'):
    env.Append(LIBS=['pthread', 'dl', 'rt'])
    env.Append(CPPDEFINES=['_GNU_SOURCE'])

sys.platform returns 'win32' on Windows, 'darwin' on macOS, and 'linux' on Linux. The FRAMEWORKS variable is macOS-specific and translates to -framework CoreFoundation -framework Security on the linker command line. On Linux, -lrt links the POSIX realtime library (for clock_gettime on older glibc versions), and -ldl links the dynamic loading library (for dlopen).

For more granular detection, use platform.machine() to check the CPU architecture.

import platform

if platform.machine() == 'aarch64':
    env.Append(CCFLAGS=['-march=armv8-a'])
elif platform.machine() == 'x86_64':
    env.Append(CCFLAGS=['-march=x86-64-v2'])

You can also use env['PLATFORM'] which SCons sets to 'posix', 'win32', or 'darwin'.

For integrating with system libraries that provide pkg-config metadata, use ParseConfig.

env.ParseConfig('pkg-config --cflags --libs libpng')
env.ParseConfig('pkg-config --cflags --libs zlib')

ParseConfig runs the specified command, captures its output, and parses the flags into the appropriate construction variables. -I flags go into CPPPATH, -L flags go into LIBPATH, -l flags go into LIBS, and remaining flags go into CCFLAGS. This is the SCons equivalent of $(pkg-config --cflags --libs libpng) in a Makefile.

Customizing Build Output

By default, SCons prints the full compiler command line for every file it processes. On projects with long include paths and many flags, this produces walls of text that obscure the build progress. You can customize the output with COMSTR variables:

env = Environment()

env['CCCOMSTR'] = '  CC    $TARGET'
env['CXXCOMSTR'] = '  CXX   $TARGET'
env['LINKCOMSTR'] = '  LINK  $TARGET'
env['ARCOMSTR'] = '  AR    $TARGET'
env['SHCCCOMSTR'] = '  CC    $TARGET (shared)'
env['SHCXXCOMSTR'] = '  CXX   $TARGET (shared)'
env['SHLINKCOMSTR'] = '  LINK  $TARGET (shared)'
env['RANLIBCOMSTR'] = '  INDEX $TARGET'
env['INSTALLSTR'] = '  INST  $TARGET'

With these settings, the build output looks clean and scannable. Each line shows the action type and the target file. The $TARGET variable in the string is expanded by SCons at runtime.

To support both quiet and verbose modes, check a command-line argument.

if ARGUMENTS.get('verbose', '0') != '1':
    env['CCCOMSTR'] = '  CC    $TARGET'
    env['CXXCOMSTR'] = '  CXX   $TARGET'
    env['LINKCOMSTR'] = '  LINK  $TARGET'
    env['ARCOMSTR'] = '  AR    $TARGET'

Running scons shows the short output. Running scons verbose=1 shows the full command lines. This pattern is common in SCons projects and mimics the V=1 convention used by the Linux kernel's build system.

How to Debug SCons Build Files

When a build doesn't do what you expect, SCons provides several debugging tools.

Print Variables

Because SConstruct files are Python, you can print anything.

env = Environment(CCFLAGS=['-Wall', '-O2'])
print('CCFLAGS:', env['CCFLAGS'])
print('CC:', env['CC'])
print('CPPPATH:', env.get('CPPPATH', []))

This prints the current values of construction variables. Use this to verify that your flags are set correctly, especially after Append, Prepend, or Clone calls.

The `--debug` flag

SCons has a --debug option with several modes.

scons --debug=explain

This tells SCons to print the reason for every rebuild. Instead of silently recompiling a file, it prints something like scons: rebuilding 'build/release/lib/mathutils.o' because 'lib/mathutils.h' changed. This is invaluable for understanding unexpected rebuilds.

scons --debug=tree

This prints the full dependency tree for every target, showing which files depend on which other files. The output can be large, so combine it with a specific target: scons --debug=tree build/release/src/myapp.

scons --debug=includes

This prints the include files found by the C/C++ scanner for each source file. Useful for diagnosing "header not found" errors or unexpected include paths.

scons --debug=presub

This prints the un-substituted command line (with $CC, $CCFLAGS, and so on still as variable names) before SCons expands them. Helps you understand which variables contribute to the final command.

The `--dry-run` flag

scons -n shows what SCons would do without actually doing it. Every command that would be executed is printed, but no files are created or modified. This is a safe way to verify your build logic before running it.

The `Dump` method

env.Dump() returns a formatted string of every construction variable and its value. It produces a lot of output, so pipe it to a file or search for specific variables.

print(env.Dump())

This is the nuclear option for debugging: it shows everything SCons knows about the environment.

The SCons Command-Line Reference

SCons accepts many command-line options. The ones you will use most frequently are listed here.

scons builds the default targets (or everything if no Default() is set).
scons -j N runs up to N build commands in parallel. Set N to the number of CPU cores on your machine for fastest builds. You can also set this in the SConstruct with SetOption('num_jobs', 4).
scons -c cleans (removes) all built targets. This is the equivalent of make clean but doesn't require you to write a clean rule. SCons knows exactly which files it created and removes only those.
scons -n is a dry run. Shows what would be built without building anything.
scons -Q suppresses SCons' status messages ("Reading SConscript files", "Building targets", etc.) and shows only the build commands. Useful for piping build output to other tools.
scons -s is silent mode. Suppresses both status messages and build commands. Only errors are printed.
scons --debug=explain explains why each target is being rebuilt.
scons --debug=tree prints the dependency tree.
scons --config=force forces re-running of all Configure checks, ignoring cached results.
scons target_name builds only the specified target and its dependencies. You can specify multiple targets: scons myapp test_runner.
scons key=value passes a key-value pair accessible through ARGUMENTS.get('key') in the SConstruct.
scons --help shows SCons' built-in options plus any options added with AddOption in the SConstruct.

Common Mistakes and How to Avoid Them

Overwriting default builders: Passing BUILDERS as a keyword argument to Environment() replaces the entire builder dictionary. You lose Program, Library, Object, and everything else. Always add custom builders with env.Append(BUILDERS={'Name': builder}).

Assuming shell environment variables are available: SCons deliberately doesn't import your shell environment. If your build fails because a tool isn't found, you probably need to pass PATH through explicitly.

The safest approach for finding the compiler is env['ENV']['PATH'] = os.environ['PATH']. Importing the entire environment with ENV=os.environ.copy() works but reduces build reproducibility because your build now depends on every variable in your shell.

Modifying a shared environment in a SConscript file: If the SConstruct exports one environment and multiple SConscript files import it, any Append or modification in one SConscript affects all of them because they all hold a reference to the same Python object. Clone the environment first with local_env = env.Clone() and modify the clone. The clone is a deep copy that can be modified independently.

Forgetting Return() in SConscript: If your SConstruct calls lib = SConscript('lib/SConscript') and the SConscript file has no Return() statement, lib is None. You'll get a confusing error later when you try to link against it, typically something like TypeError: expected a string or list of strings when None is passed as a library.

Confusing variant_dir with source paths: When you use variant_dir, the source file paths in your SConscript are still relative to the SConscript's original location, not the variant directory.

SCons handles the mapping internally. Don't use paths into the build directory in your SConscript files. Writing Object('build/release/lib/mathutils.cpp') is wrong, while writing Object('mathutils.cpp') inside lib/SConscript is correct.

Forgetting to add .sconsign.dblite to .gitignore: SCons stores its dependency database in this file. It should never be committed to version control because it contains absolute paths and machine-specific data.

Add .sconsign.dblite, the build/ directory, and the .sconf_temp/ directory (created by Configure checks) to your .gitignore.

# .gitignore
.sconsign.dblite
.sconf_temp/
build/

This .gitignore file has three entries.

.sconsign.dblite is the dependency database.
.sconf_temp/ is the directory where Configure check test programs are compiled.
build/ is the variant directory containing all compiled artifacts.

Expecting touch to trigger a rebuild: SCons uses content hashing by default. Running touch on a source file changes its modification time but not its content, so the hash is identical and SCons doesn't rebuild. If you need Make-like timestamp behavior, call Decider('timestamp-newer') in your SConstruct.

Using string file names instead of Nodes: Passing raw strings with platform-specific extensions makes your build files non-portable.

# Fragile: hardcodes the .o extension
Program('myapp', ['main.o', 'utils.o'])

# Portable: let SCons handle extensions
main_obj = env.Object('main.cpp')
utils_obj = env.Object('utils.cpp')
env.Program('myapp', [main_obj, utils_obj])

The first version breaks on Windows where object files use the .obj extension. The second version works everywhere because the Node objects carry platform-specific metadata.

Getting the target/source argument order wrong: Builder methods take the target first, then the source. Program('output_name', 'source.c') is correct. Program('source.c', 'output_name') compiles output_name (which doesn't exist) and tries to create source.c as the executable. The convention mimics assignment: target = source.

Expecting Install targets to build by default: env.Install('/usr/local/bin', app) creates an install target, but SCons does not build it unless you explicitly request it. Targets outside the project directory tree are never default targets. Use env.Alias('install', '/usr/local/bin') and run scons install to trigger the installation.

Using Glob without understanding it returns Nodes: Glob('*.cpp') returns a list of Node objects, not strings. You can concatenate them with other Node lists using +, pass them to builders, and use them in most places that accept source lists. You can't call string methods on them directly. Use [str(n) for n in Glob('*.cpp')] if you need strings, but prefer working with Nodes whenever possible.

Summary

SCons replaces Make with a build system where every configuration file is a Python script.

The Environment object holds your compiler, flags, and paths. Builders like Program, StaticLibrary, and SharedLibrary know how to produce specific output types. SConscript files organize multi-directory projects, and variant_dir keeps build artifacts separate from source code. Content hashing eliminates unnecessary rebuilds, and automatic header scanning removes the need to manually specify implicit dependencies.

Cross-compilation to targets like QuRT requires nothing more than pointing the environment's tool variables (CC, CXX, LINK) at the cross-compiler and adding the target's include paths and libraries. The same SConscript files work for both native and cross-compiled builds because they operate on whatever environment they receive through Import.

QuRT-specific features (threading, mutexes, hardware timers) are accessed through standard C function calls, and the build system's only responsibility is making sure the right compiler, headers, and libraries are in place.

The Configure subsystem replaces autoconf for probing the build environment. Custom builders extend SCons to handle file types it does not know about (protocol buffers, shaders, firmware images).

Aliases and install rules give users a clean command-line interface (scons, scons test, scons install). And the --debug=explain flag tells you exactly why any file is being rebuilt, eliminating the guesswork that plagues Make-based builds.

SCons isn't the fastest build tool for very large codebases, and its ecosystem is smaller than CMake's. But for projects where build file clarity, correctness, cross-compilation flexibility, and the ability to express complex logic in a real programming language matter more than raw speed, it's a strong choice.

The Python foundation means you already know the language, and the content-based rebuild strategy means you can trust that what gets built actually needs to be built.

QuRT: The Real-Time OS Inside Your Phone's Processor [Full Handbook]

Nikheel Vishwas Savant — Wed, 06 May 2026 23:12:45 +0000

The Hexagon DSP in every Qualcomm-powered phone handles wake word detection, sensor processing, noise cancellation, and Bluetooth audio streaming – all while the main ARM CPU runs Android.

The operating system orchestrating that work on the DSP is QuRT (Qualcomm Real-Time Operating System), a POSIX-like, priority-based, preemptive RTOS purpose-built for Qualcomm's Hexagon Digital Signal Processor.

This article is a practical guide to Qualcomm's Real-Time Operating System. It covers QuRT from the ground up: architecture, thread creation, synchronization primitives, memory management, interrupt handling, timers, inter-processor communication through FastRPC, and a complete sensor fusion pipeline. Every concept includes working code and an explanation of what's happening under the hood.

Why QuRT Matters
Setting Up Your Development Environment
The QuRT Programming Model
Creating Your First QuRT Thread
How Thread Creation Works Internally
Working with Multiple Threads
Synchronization Primitives
Memory Management
Timers and Timing
Interrupt Handling
Pipes and Message Queues
QuRT and FastRPC
Building a Sensor Fusion Pipeline
Debugging QuRT Applications
Common Pitfalls
Performance Optimization
API Quick Reference
Next Steps

Why QuRT Matters

Consider what happens during a phone call. The device is simultaneously running noise cancellation on the microphone audio, executing a neural network for wake word detection, reading accelerometer data 400 times per second, and managing Bluetooth audio streaming.

None of this runs on the main ARM CPU. It all happens on Qualcomm's Hexagon DSP, and the operating system coordinating it is QuRT.

QuRT (Qualcomm Real-Time Operating System) is a POSIX-like, priority-based, preemptive RTOS that runs on Qualcomm's Hexagon Digital Signal Processor. Where Linux is a general-purpose operating system designed for flexibility, QuRT is a precision instrument designed for deterministic, microsecond-level scheduling.

Where QuRT Fits in the System

This diagram shows the two-processor architecture inside a Qualcomm SoC. The ARM CPU on the left runs Android or Linux and handles general application logic. The Hexagon DSP on the right runs QuRT and handles latency-sensitive workloads: audio processing, sensor fusion, ML inference, and compute offload.

The two processors communicate through a framework called FastRPC. You write code for the DSP side using the Hexagon SDK, and QuRT is the OS that executes your code on the Hexagon processor.

Setting Up Your Development Environment

Before writing any QuRT code, you need the toolchain and either a simulator or physical hardware.

Prerequisites

You will need the Hexagon SDK (version 3.5+ or 4.x), which is Qualcomm's official SDK and includes the Hexagon Tools compiler toolchain.

For running your code, you can use either a Qualcomm development board (such as the Robotics RB5 or an SM8250 HDK) or the SDK's built-in simulator. A Linux host machine running Ubuntu 18.04 or 20.04 works best for development.

Installing the Hexagon SDK

# Download the Hexagon SDK from Qualcomm's developer portal
# https://developer.qualcomm.com/software/hexagon-dsp-sdk

# Extract and run the installer
chmod +x qualcomm_hexagon_sdk_4_x_x_x.bin
./qualcomm_hexagon_sdk_4_x_x_x.bin

# Set up environment variables
export HEXAGON_SDK_ROOT=~/Qualcomm/Hexagon_SDK/4.x.x.x
export HEXAGON_TOOLS_ROOT=~/Qualcomm/Hexagon_SDK/4.x.x.x/tools
source $HEXAGON_SDK_ROOT/setup_sdk_env.source

This installs the SDK to your home directory and sets up the environment variables that the build system and simulator need. The setup_sdk_env.source script configures your shell with paths to the compiler, simulator, and libraries.

Verifying Your Setup

# Check the Hexagon compiler
hexagon-clang --version

# You should see something like:
# Qualcomm Hexagon Clang version 8.x.xx

# Run the QuRT simulator to make sure it works
$HEXAGON_SDK_ROOT/tools/HEXAGON_Tools/8.x.xx/Tools/bin/hexagon-sim \
    --simulated_returnval --cosim_file \
    $HEXAGON_SDK_ROOT/libs/common/qurt/computev66/sdksim_bin/osam.cfg \
    -- $HEXAGON_SDK_ROOT/libs/common/qurt/computev66/sdksim_bin/bootimg.pbn

The first command confirms that the Hexagon Clang compiler is installed and accessible. The second command launches the QuRT simulator, which is analogous to an Android emulator: it lets you test QuRT programs without physical hardware. Timing won't match real hardware, but the simulator is valuable for validating correctness during development.

Project Structure

The Hexagon SDK uses SCons as its underlying build system. Projects live inside the SDK tree and are configured through .min files, which are declarative build descriptors that the SDK's SCons infrastructure parses.

A minimal project looks like this:

$HEXAGON_SDK_ROOT/examples/my_qurt_project/
├── src/
│   └── main.c              # Your QuRT application code
├── inc/
│   └── my_module.h         # Header files
├── hexagon.min              # SCons build config for Hexagon DSP side
└── android.min              # SCons build config for ARM side (if using FastRPC)

The hexagon.min file configures the DSP-side build, while android.min handles the ARM side when using FastRPC for cross-processor communication. Both are read by the SDK's top-level SConstruct file, which lives at $HEXAGON_SDK_ROOT/SConstruct. You don't need a separate Makefile or SConscript for projects inside the SDK tree.

Build Configuration with SCons

A minimal hexagon.min build file looks like this:

# hexagon.min - SCons build descriptor for the DSP side

BUILD_LIBS = libmy_qurt_app

# Source files
libmy_qurt_app_C_SRCS = src/main.c

# QuRT OS library
libmy_qurt_app_LIBS = atomic rpcmem

# Compiler flags
libmy_qurt_app_HEXAGON_CFLAGS = -O2 -Wall

# Link against QuRT
libmy_qurt_app_DLLS = libmy_qurt_app_skel

The .min file format is specific to the Hexagon SDK's SCons build system. BUILD_LIBS names the library target. C_SRCS lists source files. LIBS specifies libraries to link against. HEXAGON_CFLAGS sets compiler flags. DLLS defines the shared library output name, where the _skel suffix is a FastRPC convention for DSP-side implementations.

Under the hood, the SDK's SConstruct walks the project tree, reads each .min file, and translates its declarations into SCons build targets. The V (variant) parameter you pass at build time selects the target architecture, build type, and toolchain version. For example, V=hexagon_Release_dynamic_toolv84_v66 means: build for Hexagon, release mode, dynamic linking, using the v84 toolchain targeting the v66 DSP architecture.

For projects that need more control than the .min format provides, you can write a standalone SConscript file:

# SConscript - Standalone SCons build for a QuRT project

Import('env')

env = env.Clone()

# Add include paths
env.Append(CPPPATH = ['inc'])

# Compiler flags
env.Append(CCFLAGS = ['-O2', '-Wall'])

# Build the shared library
sources = ['src/main.c']
libs = ['atomic', 'rpcmem']

env.SharedLibrary(
    target = 'libmy_qurt_app_skel',
    source = sources,
    LIBS = libs
)

The SConscript approach gives you full access to SCons features: conditional compilation, custom build steps, dependency scanning, and variant builds. The Import('env') call pulls in the build environment configured by the SDK's top-level SConstruct, which already knows about Hexagon compiler paths, QuRT headers, and system libraries. env.Clone() creates a copy so your modifications do not affect other projects in the tree.

The QuRT Programming Model

The core mental model for QuRT programming is straightforward:

QuRT is a priority-based preemptive RTOS. That means everything runs in a thread (there is no bare-metal main loop). Higher priority threads always preempt lower priority ones, immediately and without negotiation. Threads at the same priority level are round-robin scheduled.

The scheduler is tick-less, meaning it doesn't wake up periodically. It only runs when something changes, such as a thread blocking, a signal being set, or a higher-priority thread becoming ready.

Priority Levels (0-255, lower number = higher priority)

 000  ┃ ████ Interrupt handlers (do not touch this)
 001  ┃ ████ Critical system tasks
 ...  ┃
 064  ┃ ████ Your high-priority audio processing
 ...  ┃
 128  ┃ ████ Your medium-priority sensor fusion
 ...  ┃
 192  ┃ ████ Your low-priority logging/reporting
 ...  ┃
 255  ┃ ████ Idle thread (QuRT's built-in background)

This priority map shows how QuRT's 256 priority levels are typically allocated. Priority 0 is the highest priority and 255 is the lowest. This is the opposite of FreeRTOS, where higher numbers mean higher priority.

Interrupt handlers occupy the top priority levels, system tasks sit just below, and user threads occupy the middle range. The idle thread at priority 255 runs only when nothing else is ready.

Creating Your First QuRT Thread

The simplest QuRT program creates a single thread that prints a message and exits.

/* main.c - First QuRT program */

#include 
#include 
#include 

#define STACK_SIZE 4096

/* Thread stack must be 8-byte aligned */
static char thread_stack[STACK_SIZE] __attribute__((aligned(8)));

void my_thread_func(void *arg)
{
    int thread_id = (int)(uintptr_t)arg;

    printf("Hello from QuRT thread %d!\n", thread_id);
    printf("My thread ID: %lu\n", qurt_thread_get_id());

    /* Thread must explicitly exit */
    qurt_thread_exit(QURT_EOK);
}

int main(void)
{
    qurt_thread_t      thread_id;
    qurt_thread_attr_t attr;

    printf("Main thread starting on QuRT!\n");

    /* Initialize thread attributes */
    qurt_thread_attr_init(&attr);

    /* Configure the thread */
    qurt_thread_attr_set_name(&attr, "my_first_thread");
    qurt_thread_attr_set_stack_addr(&attr, thread_stack);
    qurt_thread_attr_set_stack_size(&attr, STACK_SIZE);
    qurt_thread_attr_set_priority(&attr, 128);  /* Medium priority */

    /* Create and start the thread */
    int result = qurt_thread_create(&thread_id, &attr,
                                     my_thread_func,
                                     (void *)42);

    if (result != QURT_EOK) {
        printf("Thread creation failed with error: %d\n", result);
        return -1;
    }

    printf("Thread created successfully! ID: %lu\n", thread_id);

    /* Wait for the thread to finish */
    int status;
    qurt_thread_join(thread_id, &status);

    printf("Thread finished with status: %d\n", status);
    return 0;
}

This program demonstrates the four-step thread creation process in QuRT. First, qurt_thread_attr_init() initializes a thread attribute's structure. Second, the program configures the thread with a debug name (which shows up in crash dumps), a stack address, a stack size, and a priority. Third, qurt_thread_create() creates and immediately starts the thread, passing a function pointer and an argument. Fourth, qurt_thread_join() blocks the calling thread until the new thread calls qurt_thread_exit().

Two details are critical. QuRT doesn't allocate stack memory for you: you must provide a statically allocated, 8-byte-aligned buffer. And every thread must call qurt_thread_exit() before returning. If a thread function simply returns without calling exit, the behavior is undefined.

Thread Creation Flow

     qurt_thread_attr_init()
              │
              ▼
    ┌─────────────────────┐
    │  Set name           │
    │  Set stack address  │
    │  Set stack size     │
    │  Set priority       │
    └─────────────────────┘
              │
              ▼
     qurt_thread_create()
              │
              ▼
    Thread starts running ──► my_thread_func()
              │                      │
              ▼                      ▼
     qurt_thread_join()       qurt_thread_exit()
     (waits for exit)         (signals "I'm done")

This flow shows the lifecycle of a single thread. The attributes structure acts as a configuration object: you set all the thread parameters, then pass it to qurt_thread_create(). Once created, the thread runs its entry function. When the entry function calls qurt_thread_exit(), the thread terminates and any thread blocked in qurt_thread_join() is unblocked and receives the exit status code.

How Thread Creation Works Internally

Most tutorials skip what happens inside qurt_thread_create(). Understanding the internals makes debugging and priority design decisions much clearer.

What the Kernel Does During Thread Creation

When you call qurt_thread_create(), you're making a system call into the QuRT kernel. The kernel performs five steps in sequence:

  Your code calls qurt_thread_create()
         │
         ▼
  ┌──────────────────────────────────────────────────────────┐
  │  1. VALIDATE                                             │
  │     • Is the stack pointer non-NULL and aligned?         │
  │     • Is the stack size >= minimum (typ. 2KB)?           │
  │     • Is the priority in range 0-255?                    │
  │     • Is the entry function pointer non-NULL?            │
  │     (If any check fails → return QURT_EINVALID)          │
  ├──────────────────────────────────────────────────────────┤
  │  2. ALLOCATE THREAD CONTROL BLOCK (TCB)                  │
  │     • QuRT allocates a kernel-side data structure        │
  │     • This holds: thread ID, priority, state, saved      │
  │       registers, signal masks, mutex wait list, etc.     │
  ├──────────────────────────────────────────────────────────┤
  │  3. INITIALIZE THE STACK FRAME                           │
  │     • The kernel sets up a synthetic stack frame at the  │
  │       top of YOUR stack memory                           │
  │     • It writes the initial register values:             │
  │       ┌──────────────────────────────────────┐           │
  │       │  Stack Top (high address)            │           │
  │       │  ┌──────────────────────────────────┐│           │
  │       │  │ PC  = my_thread_func (entry)     ││           │
  │       │  │ SP  = stack_addr + stack_size    ││           │
  │       │  │ R0  = arg (your void* argument)  ││           │
  │       │  │ LR  = qurt_thread_exit           ││           │
  │       │  │ SR  = default status register    ││           │
  │       │  │ R1-R31 = 0                       ││           │
  │       │ └──────────────────────────────────┘│            │
  │       │  ... (rest of stack is untouched) ...│           │
  │       │  Stack Bottom (low address)          │           │
  │       └──────────────────────────────────────┘           │
  ├──────────────────────────────────────────────────────────┤
  │  4. INSERT INTO READY QUEUE                              │
  │     • The TCB is added to the scheduler's ready queue    │
  │       at the appropriate priority level                  │
  │     • The thread's state is set to READY                 │
  ├──────────────────────────────────────────────────────────┤
  │  5. TRIGGER A RESCHEDULE                                 │
  │     • The scheduler checks: "Is this new thread's        │
  │       priority higher than the currently running         │
  │       thread?"                                           │
  │     • If YES: context switch happens RIGHT NOW           │
  │       (the calling thread is preempted)                  │
  │     • If NO: the new thread waits in the ready queue     │
  │       until it's the highest priority runnable thread    │
  └──────────────────────────────────────────────────────────┘
         │
         ▼
  qurt_thread_create() returns to the caller
  (but the new thread may already be running!)

The most surprising aspect of this flow is step 5. If the new thread has higher priority than the thread that created it, the new thread starts running before qurt_thread_create() returns to the caller. The creating thread is preempted mid-call. This is what "preemptive" means in practice: the scheduler doesn't wait for a convenient moment. It enforces priority ordering immediately.

How the Stack Frame Launches Your Function

When the scheduler context-switches to a brand-new thread for the first time, it does exactly what it does for any context switch: it restores the saved registers from the TCB and jumps to the saved Program Counter.

For a new thread, those registers were set up synthetically by the kernel during step 3. The PC (Program Counter) was set to my_thread_func, so the processor jumps to your function. R0 was set to your arg parameter, so your function receives it as the first argument (following the Hexagon calling convention). The SP (Stack Pointer) was set to the top of your stack, so your function has a working stack. And the LR (Link Register) was set to qurt_thread_exit, so if your function returns normally (which you should not rely on), it falls through to qurt_thread_exit.

The illusion:
──────────────
To your thread function, it looks like someone
"called" it normally with the argument you passed.

The reality:
──────────────
The scheduler restored a set of synthetic registers
that make the processor THINK it is returning from
a function call into your entry point.

It's like waking up in a room you have never been in,
but someone arranged everything so perfectly that
you do not realize you did not walk in through the door.

This diagram contrasts the programmer's mental model (a normal function call) with what actually happens at the hardware level (a register restore that simulates a function call). The thread function has no way to distinguish between these two scenarios, which is exactly the point. The kernel creates a seamless illusion.

Context Switch Walkthrough

Consider a concrete example: thread A (priority 128) creates thread B (priority 64, which is higher priority). The following timeline shows what happens at each step:

Time ──────────────────────────────────────────────►

Thread A (pri 128)          Kernel/Scheduler         Thread B (pri 64)
────────────────           ────────────────           ────────────────
Calls                      
qurt_thread_create()       
   │                       
   ├─► System call ──────►  Validates params
                            Allocates TCB
                            Sets up stack frame
                            Inserts B into ready queue
                            
                            "B (64) > A (128)?  YES."
                            
                            SAVE A's registers   ──┐
                            to A's TCB             │
                                                   │
                            LOAD B's registers   ◄─┘
                            from B's TCB (the
                            synthetic ones)
                            
                            Jump to PC ─────────► my_thread_func(arg)
                                                   │
                                                   │ does work...
                                                   │ calls qurt_thread_exit()
                                                   │
                            B is removed ◄─────── Exit system call
                            from ready queue
                            
                            "Who's next? A."
                            
                            LOAD A's registers
   │                        Jump to A's PC
   │◄──────────────────────
   │
   ├─► qurt_thread_create()
   │   returns QURT_EOK
   │
   ▼ continues...

From thread A's perspective, qurt_thread_create() is just a function call that takes a while to return. Thread A has no idea it was suspended. It doesn't know thread B already ran to completion during that pause.

The scheduler makes preemption invisible to the preempted thread. This is a fundamental property of preemptive scheduling: threads don't need to cooperate or even be aware of each other's existence.

Thread Control Block Contents

The TCB is the kernel's internal data structure for tracking each thread. You never access it directly, but understanding its contents explains a lot of QuRT behavior:

/* Conceptual TCB layout (simplified, not actual QuRT source) */
struct qurt_tcb {
    /* Identity */
    qurt_thread_t   thread_id;
    char            name[16];
    
    /* Scheduling */
    uint8_t         base_priority;
    uint8_t         effective_priority; /* May differ due to priority inheritance */
    uint8_t         state;             /* READY, RUNNING, BLOCKED, SUSPENDED */
    
    /* Saved CPU context (filled during context switch) */
    uint32_t        saved_regs[32];
    uint32_t        saved_pc;
    uint32_t        saved_sp;
    uint32_t        saved_sr;
    
    /* Stack info (for debugging and overflow detection) */
    void           *stack_base;
    size_t          stack_size;
    
    /* Blocking info */
    void           *wait_object;  /* Mutex/signal/pipe being waited on */
    uint32_t        wait_mask;    /* Signal bits being waited for */
    
    /* Linked list pointers */
    struct qurt_tcb *next_ready;
    struct qurt_tcb *next_waiting;
    
    /* Join support */
    int             exit_status;  /* Value passed to qurt_thread_exit() */
    qurt_thread_t   joiner;      /* Thread waiting in qurt_thread_join() */
};

The TCB stores everything the scheduler needs: identity information (thread ID and debug name), scheduling state (base and effective priority, current state), saved CPU context (all 32 general-purpose registers plus PC, SP, and status register), stack bounds, blocking information (what the thread is waiting on), linked list pointers for the ready and wait queues, and join support fields.

The effective_priority field may differ from base_priority when priority inheritance is active, which is covered in the synchronization section.

Thread State Machine

A QuRT thread is always in one of four states:

                    qurt_thread_create()
                           │
                           ▼
                    ┌──────────┐
          ┌─────────│  READY   │◄──────────────────────────┐
          │         └──────────┘                           │
          │              │ ▲                               │
          │  Scheduler   │ │ Preempted by                  │
          │  picks this  │ │ higher-priority               │
          │  thread      │ │ thread                        │
          │              ▼ │                               │
          │         ┌──────────┐     Signal/mutex/         │
          │         │ RUNNING  │     timer event           │
          │         └──────────┘     unblocks thread       │
          │              │                                 │
          │  Thread calls│                                 │
          │  blocking    │                                 │
          │  API:        │                                 │
          │  - mutex_lock│                                 │
          │  - signal_   │                                 │
          │    wait      │                                 │
          │  - pipe_     │                                 │
          │    receive   ▼                                 │
          │         ┌──────────┐                           │
          │         │ BLOCKED  │───────────────────────────┘
          │         └──────────┘
          │
          │  qurt_thread_exit()
          │         │
          │         ▼
          │    ┌──────────┐
          └───►│  DEAD    │
               └──────────┘

READY means the thread can run and is waiting for a hardware thread slot.
RUNNING means the thread is currently executing on a hardware thread (only one thread per hardware thread slot is in this state at a time).
BLOCKED means the thread is waiting for an external event: a mutex to be released, a signal to be set, or a timer to expire.
DEAD means the thread called qurt_thread_exit(). If another thread called qurt_thread_join() on it, that thread receives the exit status.

Hardware Thread Slots

The Hexagon DSP is a hardware-multithreaded processor with multiple hardware thread slots per core (typically 2 to 4). This means QuRT can run multiple threads truly simultaneously on a single core, not just time-sliced.

┌─────────────────────────────────────────┐
│          Hexagon DSP Core               │
│                                         │
│  ┌───────────┐  ┌───────────┐           │
│  │ HW Thread │  │ HW Thread │           │
│  │ Slot 0    │  │ Slot 1    │  ...      │
│  │           │  │           │           │
│  │ Thread A  │  │ Thread B  │           │
│  │ (running) │  │ (running) │           │
│  └───────────┘  └───────────┘           │
│                                         │
│  Ready Queue: [C, D, E, F, ...]         │
│  The scheduler fills HW slots with      │
│  the highest-priority READY threads     │
└─────────────────────────────────────────┘

This diagram shows a single Hexagon core with two hardware thread slots. Each slot can execute a thread independently and simultaneously. The scheduler fills the hardware slots with the highest-priority ready threads. When there are more software threads than hardware slots, the scheduler time-slices the lower-priority threads. But the highest-priority threads get dedicated hardware slots and run without context switching at all.

On a typical Hexagon v66 with 4 hardware threads, the top 4 priority threads each have their own execution pipeline. Context switches only happen when a thread blocks or a higher-priority thread wakes up and displaces one from a hardware slot. This is why QuRT achieves such low scheduling latency.

Full Thread Lifecycle

The following code shows a complete thread lifecycle with annotations for what QuRT does at each step:

static char stack[8192] __attribute__((aligned(8)));

void my_func(void *arg)
{
    /* State: RUNNING. Stack is fresh, R0 contains arg. */
    int val = *(int *)arg;

    qurt_mutex_lock(&some_mutex);
    /* If mutex is held: state becomes BLOCKED until holder unlocks */

    shared_data = val;
    qurt_mutex_unlock(&some_mutex);

    qurt_thread_exit(QURT_EOK);
    /* State becomes DEAD. Joiner (if any) is unblocked. */
}

int main(void)
{
    qurt_thread_t tid;
    qurt_thread_attr_t attr;
    int my_arg = 42;

    qurt_thread_attr_init(&attr);
    qurt_thread_attr_set_stack_addr(&attr, stack);
    qurt_thread_attr_set_stack_size(&attr, sizeof(stack));
    qurt_thread_attr_set_priority(&attr, 100);

    qurt_thread_create(&tid, &attr, my_func, &my_arg);
    /* If my_func's priority (100) > main's: main is preempted here */

    int status;
    qurt_thread_join(tid, &status);
    /* Blocks until my_func exits; returns immediately if already exited */

    return 0;
}

When my_func starts running, the kernel has already set up its registers so that arg contains the pointer to my_arg. The thread's state is RUNNING.

When it calls qurt_mutex_lock(), one of two things happens: if the mutex is available, the thread acquires it and continues. If the mutex is held by another thread, the calling thread's state changes to BLOCKED, its registers are saved to its TCB, and the scheduler picks the next highest-priority ready thread.

When the mutex holder calls qurt_mutex_unlock(), the blocked thread moves back to READY and the scheduler re-evaluates priorities.

On the main side, qurt_thread_create() may or may not return before my_func finishes. If my_func has higher priority than main, the scheduler preempts main immediately, and qurt_thread_create() doesn't return until my_func completes (or blocks). qurt_thread_join() either blocks main until my_func exits, or returns immediately if my_func has already exited.

One important note about stack sizing: if you set STACK_SIZE to something too small (say, 256 bytes) and your thread calls printf, the result is a stack overflow. QuRT doesn't detect stack overflows for you. The crash will be silent and difficult to diagnose. Always give your threads at least 8192 bytes of stack and optimize later after profiling.

Building and Running on the Simulator

The Hexagon SDK provides a make wrapper that invokes SCons underneath. Both of the following commands produce the same result:

# Option 1: Use the make wrapper (invokes SCons internally)
cd $HEXAGON_SDK_ROOT
make V=hexagon_Release_dynamic_toolv84_v66 \
     tree=my_qurt_project

# Option 2: Invoke SCons directly
cd $HEXAGON_SDK_ROOT
python tools/build/scons/scons.py \
    V=hexagon_Release_dynamic_toolv84_v66 \
    my_qurt_project

Both commands build the project for the Hexagon v66 architecture using the v84 toolchain in release mode. The make wrapper is a convenience layer: it parses the V= and tree= arguments and forwards them to SCons. Using SCons directly gives you access to additional flags such as --jobs=N for parallel builds and --verbose for full compiler command output.

# Run on the simulator
hexagon-sim --simulated_returnval \
    --cosim_file osam.cfg \
    -- bootimg.pbn \
    -- my_qurt_app.so

The hexagon-sim command launches the QuRT simulator with your compiled application. The --simulated_returnval flag captures the return value from your main function, and --cosim_file points to the QuRT OS configuration.

Working with Multiple Threads

Real QuRT applications have multiple threads running simultaneously. The producer-consumer pattern is one of the most common in DSP programming: one thread reads from hardware, another processes the data.

#include 
#include 

#define STACK_SIZE    8192
#define BUFFER_SIZE   16
#define NUM_ITEMS     100

/* Thread stacks */
static char producer_stack[STACK_SIZE] __attribute__((aligned(8)));
static char consumer_stack[STACK_SIZE] __attribute__((aligned(8)));

/* Shared buffer */
static int buffer[BUFFER_SIZE];
static int head = 0;
static int tail = 0;
static int count = 0;

/* Synchronization primitives */
qurt_mutex_t buffer_mutex;
qurt_cond_t  not_full;
qurt_cond_t  not_empty;

void producer_thread(void *arg)
{
    for (int i = 0; i < NUM_ITEMS; i++) {
        qurt_mutex_lock(&buffer_mutex);

        /* Wait until there is space in the buffer */
        while (count == BUFFER_SIZE) {
            qurt_cond_wait(¬_full, &buffer_mutex);
        }

        /* Produce an item */
        buffer[head] = i;
        head = (head + 1) % BUFFER_SIZE;
        count++;

        printf("[Producer] Put item %d (buffer count: %d)\n", i, count);

        /* Signal the consumer that data is available */
        qurt_cond_signal(¬_empty);
        qurt_mutex_unlock(&buffer_mutex);
    }

    qurt_thread_exit(QURT_EOK);
}

void consumer_thread(void *arg)
{
    for (int i = 0; i < NUM_ITEMS; i++) {
        qurt_mutex_lock(&buffer_mutex);

        /* Wait until there is data in the buffer */
        while (count == 0) {
            qurt_cond_wait(¬_empty, &buffer_mutex);
        }

        /* Consume an item */
        int item = buffer[tail];
        tail = (tail + 1) % BUFFER_SIZE;
        count--;

        printf("[Consumer] Got item %d (buffer count: %d)\n", item, count);

        /* Signal the producer that space is available */
        qurt_cond_signal(¬_full);
        qurt_mutex_unlock(&buffer_mutex);
    }

    qurt_thread_exit(QURT_EOK);
}

int main(void)
{
    qurt_thread_t producer, consumer;
    qurt_thread_attr_t attr;

    /* Initialize sync primitives BEFORE creating threads */
    qurt_mutex_init(&buffer_mutex);
    qurt_cond_init(¬_full);
    qurt_cond_init(¬_empty);

    /* Create producer (higher priority) */
    qurt_thread_attr_init(&attr);
    qurt_thread_attr_set_name(&attr, "producer");
    qurt_thread_attr_set_stack_addr(&attr, producer_stack);
    qurt_thread_attr_set_stack_size(&attr, STACK_SIZE);
    qurt_thread_attr_set_priority(&attr, 100);
    qurt_thread_create(&producer, &attr, producer_thread, NULL);

    /* Create consumer (lower priority) */
    qurt_thread_attr_init(&attr);
    qurt_thread_attr_set_name(&attr, "consumer");
    qurt_thread_attr_set_stack_addr(&attr, consumer_stack);
    qurt_thread_attr_set_stack_size(&attr, STACK_SIZE);
    qurt_thread_attr_set_priority(&attr, 110);
    qurt_thread_create(&consumer, &attr, consumer_thread, NULL);

    /* Wait for both threads to finish */
    int status;
    qurt_thread_join(producer, &status);
    qurt_thread_join(consumer, &status);

    /* Clean up */
    qurt_mutex_destroy(&buffer_mutex);
    qurt_cond_destroy(¬_full);
    qurt_cond_destroy(¬_empty);

    printf("All done! Produced and consumed %d items.\n", NUM_ITEMS);
    return 0;
}

This code implements a classic bounded-buffer producer-consumer pattern. The shared buffer is a circular array of 16 integers protected by a mutex. The producer writes items into the buffer and the consumer reads them out.

When the buffer is full, the producer blocks on the not_full condition variable. When the buffer is empty, the consumer blocks on not_empty. Each side signals the other after modifying the buffer.

The producer has higher priority (100) than the consumer (110) for a deliberate reason. In a real DSP scenario, the producer is typically reading from hardware (a microphone, a sensor). If the producer misses a hardware sample, that data is lost forever. The consumer can always process data later. This is a general RTOS design principle: never starve your hardware-facing threads.

Synchronization Primitives

QuRT provides five main synchronization mechanisms: mutexes, condition variables, signals, barriers, and semaphores.

┌──────────────┬────────────────────────────────────────────────────┐
│ Primitive    │ When to Use                                        │
├──────────────┼────────────────────────────────────────────────────┤
│ Mutex        │ Protecting shared data from concurrent access      │
│ Condition Var│ "Wait until X is true" (always paired with mutex)  │
│ Signal       │ One thread notifying another (like poking someone) │
│ Barrier      │ "Everyone wait here until all threads arrive"      │
├──────────────┼────────────────────────────────────────────────────┤
│ Semaphore    │ Controlling access to a limited resource pool      │
│              │ (for example, 4 DMA channels shared by 10 threads)        │
└──────────────┴────────────────────────────────────────────────────┘

This table summarizes each primitive and its primary use case. Mutexes enforce exclusive access to shared data. Condition variables let a thread sleep until a specific data condition becomes true, and are always used in combination with a mutex. Signals provide lightweight one-to-one notifications between threads. Barriers synchronize a group of threads at a common point. Semaphores control access to a pool of N identical resources.

Mutexes

A mutex ensures that only one thread accesses a critical section at a time. QuRT mutexes also support non-blocking acquisition through qurt_mutex_try_lock().

qurt_mutex_t my_mutex;

void init_example(void)
{
    /* Always initialize before use */
    qurt_mutex_init(&my_mutex);
}

void critical_section_example(void)
{
    qurt_mutex_lock(&my_mutex);

    /* Only one thread can be here at a time */
    shared_counter++;
    shared_buffer[index] = new_value;

    qurt_mutex_unlock(&my_mutex);
}

/* Non-blocking version */
void try_lock_example(void)
{
    int result = qurt_mutex_try_lock(&my_mutex);

    if (result == QURT_EOK) {
        shared_counter++;
        qurt_mutex_unlock(&my_mutex);
    } else {
        printf("Busy, will try later\n");
    }
}

void cleanup_example(void)
{
    qurt_mutex_destroy(&my_mutex);
}

The qurt_mutex_lock() call blocks the calling thread until the mutex is available, then acquires it. qurt_mutex_try_lock() attempts to acquire the mutex and returns immediately with QURT_EOK on success or an error code if the mutex is held. Always call qurt_mutex_destroy() when you're done with a mutex.

QuRT mutexes implement priority inheritance. If a high-priority thread is waiting for a mutex held by a low-priority thread, the low-priority thread temporarily gets boosted to the high-priority level. This prevents priority inversion, the classic bug that caused the Mars Pathfinder spacecraft to repeatedly reset during its mission.

QuRT handles priority inheritance automatically, but you should be aware it's happening so you don't get confused by unexpected priority behavior during debugging.

Signals

Signals in QuRT are a lightweight notification mechanism. A thread waits for specific signal bits, and another thread (or an ISR) sets those bits to wake it up.

#include 

#define SIGNAL_DATA_READY   0x01
#define SIGNAL_STOP         0x02
#define SIGNAL_ERROR        0x04

qurt_signal_t my_signal;

void signal_init(void)
{
    qurt_signal_init(&my_signal);
}

/* Waiting thread */
void waiter_thread(void *arg)
{
    unsigned int received_signals;

    while (1) {
        /* Wait for ANY of these signals */
        received_signals = qurt_signal_wait(
            &my_signal,
            SIGNAL_DATA_READY | SIGNAL_STOP | SIGNAL_ERROR,
            QURT_SIGNAL_ATTR_WAIT_ANY
        );

        if (received_signals & SIGNAL_STOP) {
            printf("Received stop signal. Exiting.\n");
            break;
        }

        if (received_signals & SIGNAL_DATA_READY) {
            printf("Data is ready! Processing...\n");
            process_data();
            /* Clear the signal after handling it */
            qurt_signal_clear(&my_signal, SIGNAL_DATA_READY);
        }

        if (received_signals & SIGNAL_ERROR) {
            printf("Error occurred! Handling...\n");
            handle_error();
            qurt_signal_clear(&my_signal, SIGNAL_ERROR);
        }
    }

    qurt_signal_destroy(&my_signal);
    qurt_thread_exit(QURT_EOK);
}

/* Signaling thread (or ISR) */
void sender_thread(void *arg)
{
    prepare_data();
    qurt_signal_set(&my_signal, SIGNAL_DATA_READY);

    /* Later, tell it to stop */
    qurt_signal_set(&my_signal, SIGNAL_STOP);

    qurt_thread_exit(QURT_EOK);
}

The waiting thread calls qurt_signal_wait() with a bitmask of the signals it cares about. QURT_SIGNAL_ATTR_WAIT_ANY means the thread wakes up when any of the specified bits are set. The sender thread calls qurt_signal_set() to set one or more bits. After handling a signal, the waiter must call qurt_signal_clear() to reset the bit. If you forget to clear a signal, the next call to qurt_signal_wait() returns immediately, and your thread processes the same event again.

The choice between signals and condition variables depends on the use case. Signals are best for notifications between unrelated threads, or from an ISR, because they're simpler and lighter weight. Condition variables are better when the notification is tied to a specific data condition (buffer full, queue empty) and you need mutex protection for the data check.

Barriers

A barrier blocks all participating threads until every one of them has reached the barrier point. This is useful when a computation is split into phases and each phase depends on the results of the previous one.

#define NUM_WORKER_THREADS  4

qurt_barrier_t sync_barrier;

void worker_thread(void *arg)
{
    int thread_num = (int)(uintptr_t)arg;

    /* Phase 1: Each thread computes its portion */
    printf("Thread %d: Computing phase 1...\n", thread_num);
    compute_partial_result(thread_num);

    /* All threads wait here until everyone finishes phase 1 */
    qurt_barrier_wait(&sync_barrier);

    /* Phase 2: All partial results are ready, combine them */
    printf("Thread %d: Computing phase 2...\n", thread_num);
    combine_results(thread_num);

    qurt_thread_exit(QURT_EOK);
}

int main(void)
{
    qurt_barrier_init(&sync_barrier, NUM_WORKER_THREADS);

    /* Create worker threads */
    for (int i = 0; i < NUM_WORKER_THREADS; i++) {
        create_worker(i);
    }

    join_all_workers();

    qurt_barrier_destroy(&sync_barrier);
    return 0;
}

The barrier is initialized with the number of participating threads. Each thread calls qurt_barrier_wait() when it reaches the synchronization point. The call blocks until all threads have arrived. Once the last thread calls qurt_barrier_wait(), all threads are released simultaneously and continue to phase 2.

Semaphores

A semaphore controls access to a pool of N identical resources. Unlike a mutex (which is a semaphore with N=1), a semaphore allows up to N threads to hold it simultaneously.

#define MAX_DMA_CHANNELS 4

qurt_sem_t dma_semaphore;

void init_dma_pool(void)
{
    /* 4 DMA channels available */
    qurt_sem_init_val(&dma_semaphore, MAX_DMA_CHANNELS);
}

void thread_needing_dma(void *arg)
{
    /* Acquire a DMA channel (blocks if all 4 are in use) */
    qurt_sem_down(&dma_semaphore);

    int channel = allocate_dma_channel();
    perform_dma_transfer(channel);
    release_dma_channel(channel);

    /* Release the semaphore slot */
    qurt_sem_up(&dma_semaphore);

    qurt_thread_exit(QURT_EOK);
}

The semaphore starts with a count of 4, matching the number of DMA channels. Each qurt_sem_down() decrements the count and blocks if the count reaches zero. Each qurt_sem_up() increments the count and unblocks one waiting thread if any are queued. This guarantees that no more than 4 threads use DMA channels simultaneously.

Memory Management

Memory on a DSP is limited. A typical Hexagon DSP has between 256 KB and 2 MB of tightly-coupled memory (TCM) plus access to DDR. QuRT provides tools to manage both effectively.

The Memory Map

┌───────────────────────────────────┐  High Address
│         DDR (Shared with ARM)     │
│   - Large buffers                 │
│   - Neural network weights        │
│   - Audio/video frames            │
├───────────────────────────────────┤
│         QuRT Virtual Memory       │
│   - User heap                     │
│   - Thread stacks                 │
├───────────────────────────────────┤
│         L2 Cache (TCM Mode)       │
│   - Frequently accessed buffers   │
│   - Lookup tables                 │
├───────────────────────────────────┤
│         QuRT Kernel               │
│   - Scheduler, ISR handlers       │
│   - System data structures        │
└───────────────────────────────────┘  Low Address

This diagram shows the Hexagon DSP memory layout from low to high addresses. The QuRT kernel occupies the lowest addresses and is off-limits to user code. Above that, L2 cache configured in TCM mode provides fast storage for hot data. The virtual memory region holds the user heap and thread stacks. At the top, DDR is shared with the ARM CPU and is used for large data buffers, ML model weights, and media frames. DDR has higher latency than TCM but much more capacity.

Dynamic Memory Allocation

#include 
#include 

void memory_examples(void)
{
    /* Standard malloc/free works (QuRT provides a heap) */
    int *data = (int *)malloc(1024 * sizeof(int));
    if (!data) {
        printf("malloc failed! Out of heap memory.\n");
        return;
    }

    for (int i = 0; i < 1024; i++) {
        data[i] = i * 2;
    }

    free(data);
}

QuRT provides a standard C heap, so malloc and free work as expected. But malloc has unpredictable execution time because it may need to search the free list, split blocks, or coalesce adjacent free regions. This makes it unsuitable for real-time hot paths, where execution time must be deterministic. Use malloc for setup and teardown, not for per-frame or per-sample allocation.

Cache Management

On the Hexagon DSP, explicit cache management is essential when sharing memory with the ARM CPU.

#include 

void cache_management_example(void)
{
    void *buffer;
    size_t buffer_size = 4096;

    /* Allocate physically contiguous, cache-aligned memory */
    int result = qurt_mem_region_create(
        &buffer,
        buffer_size,
        qurt_mem_default_pool,
        QURT_MEM_REGION_SHARED
    );

    if (result != QURT_EOK) {
        printf("Memory region creation failed\n");
        return;
    }

    /* BEFORE reading data written by another processor (e.g., ARM): */
    qurt_mem_cache_clean(buffer, buffer_size,
                          QURT_MEM_CACHE_INVALIDATE);

    /* Read data from the buffer... */

    /* AFTER writing data that another processor will read: */
    fill_buffer_with_results(buffer, buffer_size);
    qurt_mem_cache_clean(buffer, buffer_size,
                          QURT_MEM_CACHE_FLUSH);
}

The qurt_mem_region_create() call allocates a physically contiguous memory region suitable for sharing with other processors. The QURT_MEM_REGION_SHARED flag marks it for cross-processor use.

The cache rules for shared memory are simple but critical:

Invalidate before you read, so you see the latest data written by the ARM CPU rather than stale cache entries.
Flush after you write, so the ARM CPU sees your changes rather than the old contents of main memory.

Forgetting these operations causes bugs where your code is logically correct but operates on stale data.

Memory Pools for Predictable Allocation

Memory pools provide O(1) allocation time, making them suitable for real-time hot paths.

#include 

#define BLOCK_SIZE    256
#define NUM_BLOCKS    32

/* Pool memory is statically allocated for determinism */
static char pool_memory[BLOCK_SIZE * NUM_BLOCKS] __attribute__((aligned(8)));
static qurt_mem_pool_t my_pool;

void pool_init(void)
{
    qurt_mem_pool_create(&my_pool, pool_memory,
                          BLOCK_SIZE * NUM_BLOCKS,
                          BLOCK_SIZE);
}

void *pool_alloc(void)
{
    void *block = qurt_mem_pool_alloc(&my_pool);
    if (!block) {
        printf("Pool exhausted!\n");
    }
    return block;
}

void pool_free(void *block)
{
    qurt_mem_pool_free(&my_pool, block);
}

This code creates a pool of 32 blocks, each 256 bytes. The pool memory is statically allocated to avoid any dependency on malloc at runtime.

qurt_mem_pool_alloc() returns a block in constant time, and qurt_mem_pool_free() returns it in constant time. If the pool is exhausted, the allocation returns NULL rather than blocking or searching for memory elsewhere.

This determinism makes memory pools the right choice for audio processing loops, sensor data handlers, and any other code that runs on a strict deadline.

Timers and Timing

QuRT provides hardware-backed timers for precise timing. This is critical for DSP work: if you're processing audio at 48 kHz, you need a new buffer every 10.67 milliseconds, with no exceptions.

One-Shot Timer

#include 
#include 

qurt_timer_t my_timer;
qurt_signal_t timer_signal;

#define TIMER_EXPIRED_SIGNAL  0x01

void timer_example(void)
{
    qurt_signal_init(&timer_signal);

    qurt_timer_attr_t attr;
    qurt_timer_attr_init(&attr);

    /* Set timer duration: 10 milliseconds */
    qurt_timer_attr_set_duration(&attr,
        qurt_timer_convert_time_to_ticks(10000,  /* microseconds */
                                          QURT_TIME_USEC));

    /* Set the signal to fire when timer expires */
    qurt_timer_attr_set_signal(&attr, &timer_signal);
    qurt_timer_attr_set_signal_mask(&attr, TIMER_EXPIRED_SIGNAL);

    /* One-shot: fires once */
    qurt_timer_attr_set_type(&attr, QURT_TIMER_ONESHOT);

    /* Create and start the timer */
    qurt_timer_create(&my_timer, &attr);

    /* Wait for the timer to expire */
    qurt_signal_wait(&timer_signal,
                      TIMER_EXPIRED_SIGNAL,
                      QURT_SIGNAL_ATTR_WAIT_ANY);

    printf("Timer expired! 10ms have passed.\n");
    qurt_signal_clear(&timer_signal, TIMER_EXPIRED_SIGNAL);

    /* Clean up */
    qurt_timer_delete(my_timer);
    qurt_signal_destroy(&timer_signal);
}

This creates a one-shot timer that fires after 10 milliseconds. The timer is configured with an attributes structure that specifies the duration, the signal object to notify, the signal bitmask to set, and the timer type (QURT_TIMER_ONESHOT). When the timer expires, it sets the specified signal bit, which wakes up the thread blocked in qurt_signal_wait(). After handling the event, the thread clears the signal and cleans up the timer.

Periodic Timer

void periodic_timer_thread(void *arg)
{
    qurt_timer_t periodic_timer;
    qurt_signal_t periodic_signal;
    qurt_timer_attr_t attr;

    qurt_signal_init(&periodic_signal);
    qurt_timer_attr_init(&attr);

    /* Fire every 1 millisecond */
    qurt_timer_attr_set_duration(&attr,
        qurt_timer_convert_time_to_ticks(1000, QURT_TIME_USEC));
    qurt_timer_attr_set_signal(&attr, &periodic_signal);
    qurt_timer_attr_set_signal_mask(&attr, 0x01);
    qurt_timer_attr_set_type(&attr, QURT_TIMER_PERIODIC);

    qurt_timer_create(&periodic_timer, &attr);

    int iteration = 0;
    while (iteration < 1000) {
        qurt_signal_wait(&periodic_signal, 0x01,
                          QURT_SIGNAL_ATTR_WAIT_ANY);
        qurt_signal_clear(&periodic_signal, 0x01);

        /* This runs every 1ms */
        process_audio_frame(iteration);
        iteration++;
    }

    qurt_timer_delete(periodic_timer);
    qurt_signal_destroy(&periodic_signal);
    qurt_thread_exit(QURT_EOK);
}

The periodic timer uses QURT_TIMER_PERIODIC instead of QURT_TIMER_ONESHOT. It fires repeatedly at the specified interval. This example runs 1000 iterations at 1 ms intervals, processing one audio frame per tick. The signal must be cleared after each iteration, or the next qurt_signal_wait() will return immediately.

Reading the Current Time

void timing_example(void)
{
    unsigned long long start_ticks = qurt_sysclock_get_hw_ticks();

    heavy_computation();

    unsigned long long end_ticks = qurt_sysclock_get_hw_ticks();
    unsigned long long elapsed_ticks = end_ticks - start_ticks;

    unsigned long long elapsed_us =
        qurt_timer_convert_ticks_to_time(elapsed_ticks, QURT_TIME_USEC);

    printf("Computation took %llu microseconds\n", elapsed_us);
}

qurt_sysclock_get_hw_ticks() reads the hardware cycle counter, which provides the highest-resolution timing available on the DSP. qurt_timer_convert_ticks_to_time() converts raw ticks to human-readable units (microseconds in this case). Use this pattern to profile individual functions and identify performance bottlenecks.

Interrupt Handling

On a DSP, interrupts are how hardware signals that it needs attention. QuRT provides a thread-based interrupt model that's more structured than bare-metal ISR handlers.

#include 
#include 

#define MY_SENSOR_IRQ      42
#define IRQ_SIGNAL         0x01

static qurt_signal_t irq_signal;

void sensor_isr_thread(void *arg)
{
    int irq = MY_SENSOR_IRQ;

    /* Register this thread as the handler for IRQ 42 */
    qurt_interrupt_register(irq, &irq_signal, IRQ_SIGNAL);

    printf("Sensor ISR thread ready, waiting for interrupts...\n");

    while (1) {
        /* Block until the hardware interrupt fires */
        unsigned int sigs = qurt_signal_wait(
            &irq_signal, IRQ_SIGNAL, QURT_SIGNAL_ATTR_WAIT_ANY);

        if (sigs & IRQ_SIGNAL) {
            qurt_signal_clear(&irq_signal, IRQ_SIGNAL);

            /* Read sensor data quickly */
            int sensor_value = read_sensor_register();

            /* Put data in a queue for the processing thread */
            enqueue_sensor_data(sensor_value);

            /* Signal the processing thread */
            qurt_signal_set(&processing_signal, DATA_READY);

            /* Re-enable the interrupt */
            qurt_interrupt_acknowledge(irq);
        }
    }
}

QuRT ISRs are different from bare-metal ISRs. They run in a dedicated thread context, which means you can use mutexes and signals inside them. But the ISR thread should still do minimal work: read the hardware register, enqueue the data, signal a processing thread, and acknowledge the interrupt. All expensive computation should happen in a separate, lower-priority processing thread.

Hardware IRQ
     │
     ▼
ISR Thread (high priority)     Processing Thread (medium priority)
┌──────────────────┐          ┌──────────────────────────┐
│ Read HW register │          │ Wait for DATA_READY      │
│ Enqueue data     │ ──────►  │ Dequeue data             │
│ Signal "ready"   │          │ Run FFT / filter / etc.  │
│ ACK interrupt    │          │ Write results            │
└──────────────────┘          └──────────────────────────┘

This diagram shows the ISR offloading pattern. The ISR thread on the left handles the hardware interrupt with minimal latency: it reads the sensor register, enqueues the raw data, signals the processing thread, and acknowledges the interrupt so it can fire again. The processing thread on the right does the expensive work (FFT, filtering, ML inference) at a lower priority.

This design ensures that the ISR thread is always available to service the next hardware interrupt, even if the processing thread is still working on the previous sample.

Pipes and Message Queues

QuRT provides built-in pipe support for safe, structured inter-thread communication. Pipes are fixed-size message queues with blocking send and receive operations.

#include 
#include 

#define PIPE_ELEMENTS   16
#define ELEMENT_SIZE    sizeof(sensor_msg_t)

typedef struct {
    int sensor_id;
    int value;
    unsigned long long timestamp;
} sensor_msg_t;

/* Pipe buffer must be allocated by you */
static char pipe_buffer[PIPE_ELEMENTS * ELEMENT_SIZE]
    __attribute__((aligned(8)));

qurt_pipe_t sensor_pipe;

void pipe_init(void)
{
    qurt_pipe_attr_t attr;
    qurt_pipe_attr_init(&attr);
    qurt_pipe_attr_set_buffer(&attr, pipe_buffer);
    qurt_pipe_attr_set_buffer_partition(&attr, PIPE_ELEMENTS);
    qurt_pipe_attr_set_elements(&attr, PIPE_ELEMENTS);
    qurt_pipe_attr_set_element_size(&attr, ELEMENT_SIZE);

    qurt_pipe_create(&sensor_pipe, &attr);
}

/* Producer: send sensor data into the pipe */
void sensor_reader_thread(void *arg)
{
    while (1) {
        sensor_msg_t msg;
        msg.sensor_id = 1;
        msg.value = read_accelerometer();
        msg.timestamp = qurt_sysclock_get_hw_ticks();

        /* Blocking send: waits if pipe is full */
        qurt_pipe_send(&sensor_pipe, (char *)&msg, ELEMENT_SIZE);
    }
}

/* Consumer: receive sensor data from the pipe */
void data_processor_thread(void *arg)
{
    sensor_msg_t msg;

    while (1) {
        /* Blocking receive: waits if pipe is empty */
        qurt_pipe_receive(&sensor_pipe, (char *)&msg, ELEMENT_SIZE);

        printf("Sensor %d: value=%d at tick=%llu\n",
               msg.sensor_id, msg.value, msg.timestamp);

        process_sensor_reading(&msg);
    }
}

A QuRT pipe is configured with a statically allocated buffer, a number of elements, and an element size. Like stacks, the buffer memory is your responsibility. qurt_pipe_send() copies a message into the pipe and blocks if the pipe is full. qurt_pipe_receive() copies a message out and blocks if the pipe is empty. The pipe handles all internal synchronization, so you don't need a separate mutex.

Pipes are a natural fit for the sensor data pattern shown here: the reader thread samples hardware at a fixed rate and pushes messages into the pipe, while the processor thread pulls messages out and handles them. The pipe provides buffering and backpressure automatically.

QuRT and FastRPC

In real Qualcomm devices, you rarely use QuRT alone. Your Android or Linux application on the ARM CPU offloads compute-intensive work to the DSP using FastRPC (Fast Remote Procedure Call). The following diagram shows the full pipeline:

┌───────────────────────────────────────────────────────────────┐
│                         ARM CPU Side                          │
│                                                               │
│   your_app.c                                                  │
│   ┌───────────────────────────────────────────────────┐       │
│   │  #include "my_dsp_module.h"  // auto-generated    │       │
│   │                                                   │       │
│   │  // This looks like a normal function call,       │       │
│   │  // but it actually executes on the DSP!          │       │
│   │  result = my_dsp_module_process_audio(            │       │
│   │      input_buffer, output_buffer, num_samples);   │       │
│   └───────────────────┬───────────────────────────────┘       │
│                       │ FastRPC                               │
└───────────────────────┼───────────────────────────────────────┘
            (crosses processor boundary)          
┌───────────────────────┼───────────────────────────────────────┐
│                       ▼                                       │
│                  DSP Side (QuRT)                              │
│   my_dsp_module_skel.c  // auto-generated skeleton            │
│   ┌───────────────────────────────────────────────────┐       │
│   │  int my_dsp_module_process_audio(                 │       │
│   │      const int16_t *input,                        │       │
│   │      int16_t *output,                             │       │
│   │      int num_samples)                             │       │
│   │  {                                                │       │
│   │      // This runs on the Hexagon DSP under QuRT   │       │
│   │      apply_noise_reduction(input, output,         │       │
│   │                             num_samples);         │       │
│   │      return 0;                                    │       │
│   │  }                                                │       │
│   └───────────────────────────────────────────────────┘       │
└───────────────────────────────────────────────────────────────┘

This diagram shows the FastRPC architecture. On the ARM CPU side, your application calls a function that appears to be a normal C function. Under the hood, FastRPC serializes the arguments, sends them across the processor boundary to the Hexagon DSP, executes the function under QuRT, and returns the result. The programmer experience is a transparent remote procedure call.

Step 1: Define the Interface (IDL File)

Create a .idl file that describes the functions the ARM can call on the DSP:

/* my_dsp_module.idl */
#include "remote.idl"
#include "AEEStdDef.idl"

interface my_dsp_module {

    /* Simple computation */
    long process_audio(
        in sequence input,
        rout sequence output,
        in long num_samples
    );

    /* Matrix multiply offload */
    long matrix_multiply(
        in sequence mat_a,
        in sequence mat_b,
        rout sequence result,
        in long rows_a,
        in long cols_a,
        in long cols_b
    );
};

The IDL (Interface Definition Language) file defines the cross-processor API. Each function specifies its parameters with direction qualifiers: in for data flowing from ARM to DSP, rout for data flowing from DSP back to ARM. The sequence syntax specifies a variable-length array. The Hexagon SDK's IDL compiler generates stub code for the ARM side and skeleton code for the DSP side from this definition.

Step 2: Implement the DSP Side

/* my_dsp_module_imp.c - DSP implementation */

#include "my_dsp_module.h"
#include 
#include 

int my_dsp_module_process_audio(
    const int16_t *input, int input_len,
    int16_t *output, int output_len,
    int num_samples)
{
    if (!input || !output || num_samples <= 0) {
        return -1;
    }

    /* Invalidate cache: ARM wrote this data */
    qurt_mem_cache_clean((void *)input,
                          num_samples * sizeof(int16_t),
                          QURT_MEM_CACHE_INVALIDATE);

    /* Process on the DSP */
    for (int i = 0; i < num_samples; i++) {
        /* Simple noise gate */
        if (abs(input[i]) < 100) {
            output[i] = 0;
        } else {
            output[i] = input[i];
        }
    }

    /* Flush cache: ARM will read this data */
    qurt_mem_cache_clean(output,
                          num_samples * sizeof(int16_t),
                          QURT_MEM_CACHE_FLUSH);

    return 0;
}

The DSP implementation receives the input buffer that the ARM CPU wrote. Before reading it, the code invalidates the cache so the DSP sees the latest data from main memory rather than stale cache entries. After writing the output, the code flushes the cache so the ARM CPU sees the DSP's results. The actual processing (a simple noise gate in this example) runs between the cache operations.

Step 3: Implement the ARM Side

/* main_arm.c - ARM/Android application */

#include 
#include 
#include 
#include "my_dsp_module.h"

int main(void)
{
    int num_samples = 1024;

    /* Use ION memory for zero-copy sharing with DSP */
    rpcmem_init();

    int16_t *input = (int16_t *)rpcmem_alloc(
        RPCMEM_HEAP_ID_SYSTEM,
        RPCMEM_DEFAULT_FLAGS,
        num_samples * sizeof(int16_t));

    int16_t *output = (int16_t *)rpcmem_alloc(
        RPCMEM_HEAP_ID_SYSTEM,
        RPCMEM_DEFAULT_FLAGS,
        num_samples * sizeof(int16_t));

    if (!input || !output) {
        printf("rpcmem_alloc failed!\n");
        return -1;
    }

    /* Fill input with audio data */
    for (int i = 0; i < num_samples; i++) {
        input[i] = (int16_t)(i % 256);
    }

    /* This call goes to the DSP via FastRPC */
    int result = my_dsp_module_process_audio(
        input, num_samples,
        output, num_samples,
        num_samples);

    if (result != 0) {
        printf("DSP processing failed: %d\n", result);
    } else {
        printf("DSP processing succeeded!\n");
        printf("First 10 output samples: ");
        for (int i = 0; i < 10; i++) {
            printf("%d ", output[i]);
        }
        printf("\n");
    }

    rpcmem_free(input);
    rpcmem_free(output);
    rpcmem_deinit();

    return 0;
}

The ARM side uses rpcmem_alloc() to allocate ION memory, which is a shared memory region accessible by both the ARM CPU and the Hexagon DSP without copying. The call to my_dsp_module_process_audio() looks like a normal function call, but FastRPC transparently routes it to the DSP. When the call returns, the output buffer contains the DSP's results.

Building the Complete Project

A FastRPC project requires two SCons builds: one for the ARM CPU side and one for the Hexagon DSP side. Each side has its own .min file (android.min and hexagon.min), and both are processed by the SDK's SConstruct.

cd $HEXAGON_SDK_ROOT

# Build for ARM target (Android) via make wrapper
make V=android_Release tree=my_dsp_module

# Build for Hexagon DSP via make wrapper
make V=hexagon_Release_dynamic_toolv84_v66 tree=my_dsp_module

# Or invoke SCons directly for both variants
python tools/build/scons/scons.py \
    V=android_Release \
    V=hexagon_Release_dynamic_toolv84_v66 \
    my_dsp_module

# Push to device
adb push android_Release/ship/my_dsp_module /data/local/tmp/
adb push hexagon_Release_dynamic_toolv84_v66/ship/libmy_dsp_module_skel.so \
    /data/local/tmp/

# Run it
adb shell "cd /data/local/tmp && ./my_dsp_module"

The build produces two outputs: an ARM executable (compiled from the stub and your main_arm.c) and a Hexagon shared library (the _skel.so file, compiled from your DSP implementation). SCons handles the IDL compilation step automatically: it detects the .idl file, generates the stub and skeleton C source files, and includes them in the appropriate variant build. Both outputs are pushed to the device.

When the ARM executable runs and calls a FastRPC function, the system loads the skeleton library onto the DSP and routes the call through.

Building a Sensor Fusion Pipeline

This section brings together threads, synchronization, timers, and signals into a complete, realistic QuRT application. The pipeline reads from three simulated sensors (accelerometer, gyroscope, magnetometer), fuses the data using a complementary filter, and reports orientation at 100 Hz.

/*
 * sensor_fusion.c - Multi-sensor fusion pipeline on QuRT
 *
 * Architecture:
 *   [Accel ISR] ──► [Fusion Thread] ──► [Report Thread]
 *   [Gyro ISR]  ──►       ▲
 *   [Mag ISR]   ──►       │
 *                    [Timer Thread]
 *                    (triggers fusion every 10ms)
 */

#include 
#include 
#include 
#include 
#include 

/* Configuration */
#define STACK_SIZE          8192
#define FUSION_PERIOD_US    10000   /* 10ms = 100Hz fusion rate */
#define QUEUE_DEPTH         32

/* Data types */
typedef struct {
    float x, y, z;
    unsigned long long timestamp;
} vec3_sample_t;

typedef struct {
    vec3_sample_t accel;
    vec3_sample_t gyro;
    vec3_sample_t mag;
    float roll, pitch, yaw;
} fused_state_t;

/* Thread stacks */
static char accel_stack[STACK_SIZE]  __attribute__((aligned(8)));
static char gyro_stack[STACK_SIZE]   __attribute__((aligned(8)));
static char mag_stack[STACK_SIZE]    __attribute__((aligned(8)));
static char fusion_stack[STACK_SIZE] __attribute__((aligned(8)));
static char report_stack[STACK_SIZE] __attribute__((aligned(8)));

/* Shared state */
static vec3_sample_t latest_accel;
static vec3_sample_t latest_gyro;
static vec3_sample_t latest_mag;
static fused_state_t latest_fused;

static qurt_mutex_t sensor_mutex;
static qurt_mutex_t fused_mutex;
static qurt_signal_t fusion_signal;
static qurt_signal_t report_signal;

#define SIG_FUSION_TICK    0x01
#define SIG_NEW_FUSED_DATA 0x01
#define SIG_SHUTDOWN       0x80

static volatile int running = 1;

/* Simulated sensor reads */
static void read_accelerometer(vec3_sample_t *sample)
{
    sample->x = 0.01f;
    sample->y = 0.02f;
    sample->z = 9.81f;
    sample->timestamp = qurt_sysclock_get_hw_ticks();
}

static void read_gyroscope(vec3_sample_t *sample)
{
    sample->x = 0.001f;
    sample->y = -0.002f;
    sample->z = 0.0005f;
    sample->timestamp = qurt_sysclock_get_hw_ticks();
}

static void read_magnetometer(vec3_sample_t *sample)
{
    sample->x = 25.0f;
    sample->y = -5.0f;
    sample->z = 40.0f;
    sample->timestamp = qurt_sysclock_get_hw_ticks();
}

/* Accelerometer thread */
void accel_thread(void *arg)
{
    printf("[Accel] Thread started\n");

    while (running) {
        vec3_sample_t sample;
        read_accelerometer(&sample);

        qurt_mutex_lock(&sensor_mutex);
        latest_accel = sample;
        qurt_mutex_unlock(&sensor_mutex);

        /* ~400Hz sample rate */
        qurt_timer_sleep(2500);
    }

    printf("[Accel] Thread exiting\n");
    qurt_thread_exit(QURT_EOK);
}

/* Gyroscope thread */
void gyro_thread(void *arg)
{
    printf("[Gyro] Thread started\n");

    while (running) {
        vec3_sample_t sample;
        read_gyroscope(&sample);

        qurt_mutex_lock(&sensor_mutex);
        latest_gyro = sample;
        qurt_mutex_unlock(&sensor_mutex);

        /* 1kHz sample rate */
        qurt_timer_sleep(1000);
    }

    printf("[Gyro] Thread exiting\n");
    qurt_thread_exit(QURT_EOK);
}

/* Magnetometer thread */
void mag_thread(void *arg)
{
    printf("[Mag] Thread started\n");

    while (running) {
        vec3_sample_t sample;
        read_magnetometer(&sample);

        qurt_mutex_lock(&sensor_mutex);
        latest_mag = sample;
        qurt_mutex_unlock(&sensor_mutex);

        /* 100Hz sample rate */
        qurt_timer_sleep(10000);
    }

    printf("[Mag] Thread exiting\n");
    qurt_thread_exit(QURT_EOK);
}

/* Simplified complementary filter */
static void compute_orientation(
    const vec3_sample_t *accel,
    const vec3_sample_t *gyro,
    const vec3_sample_t *mag,
    fused_state_t *state)
{
    float dt = 0.01f;

    float accel_roll = atan2f(accel->y, accel->z) * 57.2958f;
    float accel_pitch = atan2f(-accel->x,
        sqrtf(accel->y * accel->y + accel->z * accel->z)) * 57.2958f;

    /* Trust gyro short-term, accel long-term */
    state->roll = 0.98f * (state->roll + gyro->x * dt * 57.2958f)
                + 0.02f * accel_roll;
    state->pitch = 0.98f * (state->pitch + gyro->y * dt * 57.2958f)
                 + 0.02f * accel_pitch;

    state->yaw = atan2f(mag->y, mag->x) * 57.2958f;

    state->accel = *accel;
    state->gyro = *gyro;
    state->mag = *mag;
}

/* Fusion thread (runs every 10ms) */
void fusion_thread(void *arg)
{
    qurt_timer_t fusion_timer;
    qurt_timer_attr_t timer_attr;

    printf("[Fusion] Thread started\n");

    qurt_timer_attr_init(&timer_attr);
    qurt_timer_attr_set_duration(&timer_attr,
        qurt_timer_convert_time_to_ticks(FUSION_PERIOD_US,
                                          QURT_TIME_USEC));
    qurt_timer_attr_set_signal(&timer_attr, &fusion_signal);
    qurt_timer_attr_set_signal_mask(&timer_attr, SIG_FUSION_TICK);
    qurt_timer_attr_set_type(&timer_attr, QURT_TIMER_PERIODIC);

    qurt_timer_create(&fusion_timer, &timer_attr);

    while (running) {
        unsigned int sigs = qurt_signal_wait(
            &fusion_signal,
            SIG_FUSION_TICK | SIG_SHUTDOWN,
            QURT_SIGNAL_ATTR_WAIT_ANY);

        if (sigs & SIG_SHUTDOWN) break;

        qurt_signal_clear(&fusion_signal, SIG_FUSION_TICK);

        /* Snapshot sensor data under lock */
        vec3_sample_t a, g, m;
        qurt_mutex_lock(&sensor_mutex);
        a = latest_accel;
        g = latest_gyro;
        m = latest_mag;
        qurt_mutex_unlock(&sensor_mutex);

        /* Run the fusion algorithm (no lock needed, local data) */
        fused_state_t state;
        qurt_mutex_lock(&fused_mutex);
        state = latest_fused;
        qurt_mutex_unlock(&fused_mutex);

        compute_orientation(&a, &g, &m, &state);

        /* Publish fused result */
        qurt_mutex_lock(&fused_mutex);
        latest_fused = state;
        qurt_mutex_unlock(&fused_mutex);

        /* Notify reporter */
        qurt_signal_set(&report_signal, SIG_NEW_FUSED_DATA);
    }

    qurt_timer_delete(fusion_timer);
    printf("[Fusion] Thread exiting\n");
    qurt_thread_exit(QURT_EOK);
}

/* Reporting thread */
void report_thread(void *arg)
{
    int report_count = 0;

    printf("[Report] Thread started\n");

    while (running) {
        unsigned int sigs = qurt_signal_wait(
            &report_signal,
            SIG_NEW_FUSED_DATA | SIG_SHUTDOWN,
            QURT_SIGNAL_ATTR_WAIT_ANY);

        if (sigs & SIG_SHUTDOWN) break;

        qurt_signal_clear(&report_signal, SIG_NEW_FUSED_DATA);

        fused_state_t state;
        qurt_mutex_lock(&fused_mutex);
        state = latest_fused;
        qurt_mutex_unlock(&fused_mutex);

        /* Report every 100th update (once per second at 100Hz) */
        if (++report_count % 100 == 0) {
            printf("[Report] Orientation - Roll: %.2f  Pitch: %.2f  "
                   "Yaw: %.2f  (update #%d)\n",
                   state.roll, state.pitch, state.yaw, report_count);
        }
    }

    printf("[Report] Thread exiting\n");
    qurt_thread_exit(QURT_EOK);
}

/* Main */
int main(void)
{
    qurt_thread_t threads[5];
    qurt_thread_attr_t attr;
    int status;

    printf("=== Sensor Fusion Pipeline Starting ===\n");

    /* Initialize synchronization primitives */
    qurt_mutex_init(&sensor_mutex);
    qurt_mutex_init(&fused_mutex);
    qurt_signal_init(&fusion_signal);
    qurt_signal_init(&report_signal);
    memset(&latest_fused, 0, sizeof(latest_fused));

    struct {
        const char *name;
        char *stack;
        int priority;
        void (*func)(void *);
    } thread_configs[] = {
        {"accel_reader", accel_stack,  60, accel_thread},
        {"gyro_reader",  gyro_stack,   60, gyro_thread},
        {"mag_reader",   mag_stack,    70, mag_thread},
        {"fusion",       fusion_stack, 80, fusion_thread},
        {"reporter",     report_stack, 120, report_thread},
    };

    /* Create all threads */
    for (int i = 0; i < 5; i++) {
        qurt_thread_attr_init(&attr);
        qurt_thread_attr_set_name(&attr, thread_configs[i].name);
        qurt_thread_attr_set_stack_addr(&attr, thread_configs[i].stack);
        qurt_thread_attr_set_stack_size(&attr, STACK_SIZE);
        qurt_thread_attr_set_priority(&attr, thread_configs[i].priority);

        int result = qurt_thread_create(&threads[i], &attr,
                                         thread_configs[i].func, NULL);
        if (result != QURT_EOK) {
            printf("Failed to create thread '%s': %d\n",
                   thread_configs[i].name, result);
            return -1;
        }
        printf("Created thread '%s' (priority %d)\n",
               thread_configs[i].name, thread_configs[i].priority);
    }

    /* Let it run for 10 seconds */
    printf("Pipeline running for 10 seconds...\n");
    qurt_timer_sleep(10000000);

    /* Shutdown */
    printf("Shutting down...\n");
    running = 0;
    qurt_signal_set(&fusion_signal, SIG_SHUTDOWN);
    qurt_signal_set(&report_signal, SIG_SHUTDOWN);

    /* Wait for all threads to finish */
    for (int i = 0; i < 5; i++) {
        qurt_thread_join(threads[i], &status);
    }

    /* Clean up */
    qurt_mutex_destroy(&sensor_mutex);
    qurt_mutex_destroy(&fused_mutex);
    qurt_signal_destroy(&fusion_signal);
    qurt_signal_destroy(&report_signal);

    printf("=== Sensor Fusion Pipeline Complete ===\n");
    return 0;
}

This pipeline demonstrates several QuRT patterns working together.

Three sensor reader threads run at the highest priority (60 for accel and gyro, 70 for the slower magnetometer) and continuously write the latest samples into shared state under a mutex.

A fusion thread, triggered by a periodic timer every 10 ms, snapshots all three sensor readings, runs a complementary filter to compute roll, pitch, and yaw, and publishes the fused result.

A reporting thread at the lowest priority (120) receives a signal each time new fused data is available and logs orientation once per second.

Priority Assignment

Priority 60:  Sensor readers (highest priority, never miss hardware data)
Priority 80:  Fusion engine (runs every 10ms, must finish quickly)
Priority 120: Reporter (lowest priority, only logging)

The priority assignments follow a strict rule: threads closer to hardware get higher priority. If the fusion thread takes too long, the reporter waits. That's acceptable because a delayed log message has no real-time consequence. If a sensor read gets delayed, the fusion algorithm operates on stale data.

In a real application controlling a drone or robot, stale IMU data means incorrect orientation estimates, which can lead to physical failures.

Debugging QuRT Applications

QuRT debugging is more limited than Linux debugging. There's no gdb with a TUI, and error messages from crashes are often unhelpful. The following techniques form a practical debugging toolkit.

Printf Debugging

#include 

void debug_example(void)
{
    printf("[%s:%d] value = %d\n", __func__, __LINE__, some_var);
}

QuRT supports printf through a semi-hosting mechanism. On the simulator, output goes to stdout. On hardware, it goes to a DIAG buffer (similar to Android's logcat). This is the most common debugging technique in QuRT development.

QuRT Error Codes

switch (result) {
    case QURT_EOK:
        break;
    case QURT_EINVALID:
        printf("Invalid argument\n");
        break;
    case QURT_EFAILED:
        printf("General failure\n");
        break;
    case QURT_EMEM:
        printf("Out of memory\n");
        break;
    case QURT_ENOTALLOWED:
        printf("Operation not allowed (check permissions)\n");
        break;
    case QURT_ETIMEOUT:
        printf("Operation timed out\n");
        break;
    default:
        printf("Unknown error: %d\n", result);
}

Always check return values from QuRT API calls. These are the error codes you'll encounter most frequently.

QURT_EINVALID usually means a bad parameter (unaligned stack, null pointer, out-of-range priority). QURT_EMEM means the kernel ran out of memory for internal structures. QURT_ENOTALLOWED often indicates a permissions issue on hardware.

Thread State Inspection

void dump_thread_info(void)
{
    qurt_thread_t tid = qurt_thread_get_id();
    char name[QURT_THREAD_ATTR_NAME_MAXLEN];

    qurt_thread_get_name(name, sizeof(name));

    printf("Thread: %s (ID: %lu)\n", name, tid);
}

This function prints the current thread's name and ID, which is useful when you have multiple threads writing to the same log output and need to distinguish which thread produced each message.

Stack Overflow Detection

#define STACK_CANARY 0xDEADBEEF

static char my_stack[STACK_SIZE] __attribute__((aligned(8)));

void init_stack_canary(void)
{
    /* Write canary at the bottom of the stack */
    ((unsigned int *)my_stack)[0] = STACK_CANARY;
    ((unsigned int *)my_stack)[1] = STACK_CANARY;
}

void check_stack_canary(void)
{
    if (((unsigned int *)my_stack)[0] != STACK_CANARY ||
        ((unsigned int *)my_stack)[1] != STACK_CANARY) {
        printf("STACK OVERFLOW DETECTED!\n");
    }
}

QuRT doesn't detect stack overflows. This canary pattern writes a known value at the bottom of the stack before the thread starts. If the stack grows downward past its bounds, it overwrites the canary value. Periodically checking the canary (or checking it on thread exit) catches overflows that would otherwise manifest as mysterious, unrelated crashes.

Using the Hexagon Simulator

# Run with instruction tracing
hexagon-sim --timing --pmu_statsfile stats.txt \
    --cosim_file osam.cfg \
    -- bootimg.pbn -- my_app.so

# The stats file gives you:
# - Total cycles
# - Cache hit/miss rates
# - Stall cycles
# - Instructions per cycle (IPC)

The --timing flag enables cycle-accurate simulation, and --pmu_statsfile writes performance counter data to a file. The stats file reports total cycles, cache hit and miss rates, stall cycles, and instructions per cycle (IPC). This data is essential for identifying whether your bottleneck is compute-bound, memory-bound, or stall-bound.

Common Pitfalls

Pitfall 1: Forgetting to Exit Threads

/* BAD: thread function returns without exit */
void bad_thread(void *arg) {
    do_work();
    return;  /* CRASH or undefined behavior */
}

/* GOOD */
void good_thread(void *arg) {
    do_work();
    qurt_thread_exit(QURT_EOK);
}

A QuRT thread that returns from its entry function without calling qurt_thread_exit() causes undefined behavior. The kernel set the link register to qurt_thread_exit as a safety net during thread creation, but you shouldn't rely on this. Always call qurt_thread_exit() explicitly.

Pitfall 2: Stack Allocated in Wrong Scope

/* BAD: stack is on the calling thread's stack */
void create_thread_bad(void) {
    char stack[4096];
    qurt_thread_attr_set_stack_addr(&attr, stack);
    qurt_thread_create(&tid, &attr, func, NULL);
}   /* stack disappears here, new thread crashes */

/* GOOD: use static or heap allocation */
static char stack[4096] __attribute__((aligned(8)));
void create_thread_good(void) {
    qurt_thread_attr_set_stack_addr(&attr, stack);
    qurt_thread_create(&tid, &attr, func, NULL);
}

The stack memory must outlive the thread that uses it. If you allocate the stack as a local variable in a function, it's freed when that function returns, but the thread may still be running. Use static allocation (as shown) or heap allocation with careful lifetime management.

Pitfall 3: Priority Inversion Without Awareness

/* BAD: manual spinlock, no priority inheritance */
volatile int lock = 0;
while (__sync_lock_test_and_set(&lock, 1)) { /* spin */ }

/* GOOD: QuRT mutex with priority inheritance */
qurt_mutex_lock(&my_mutex);

If a high-priority thread spins on a manual spinlock held by a low-priority thread, and a medium-priority thread preempts the lock holder, the high-priority thread is effectively blocked by the medium-priority thread.

QuRT mutexes solve this with automatic priority inheritance: the lock holder is temporarily boosted to the priority of the highest-priority waiter. Manual spinlocks don't get this treatment.

Pitfall 4: Unaligned Memory

/* BAD */
char stack[4096];

/* GOOD */
char stack[4096] __attribute__((aligned(8)));

/* For DMA buffers, you often need 256-byte alignment */
char dma_buffer[1024] __attribute__((aligned(256)));

Thread stacks must be 8-byte aligned. DMA buffers typically require 256-byte alignment. Unaligned memory causes hard faults on the Hexagon architecture that produce minimal diagnostic output.

Pitfall 5: Blocking in ISR Context

/* BAD: mutex_lock may block indefinitely */
void isr_handler(void *arg) {
    qurt_mutex_lock(&some_mutex);
    qurt_mutex_unlock(&some_mutex);
}

/* GOOD: non-blocking try_lock with fallback */
void isr_handler(void *arg) {
    if (qurt_mutex_try_lock(&some_mutex) == QURT_EOK) {
        /* Quick update */
        qurt_mutex_unlock(&some_mutex);
    } else {
        /* Defer to processing thread */
        qurt_signal_set(&deferred_signal, DEFERRED_WORK);
    }
}

Although QuRT ISR threads can technically call blocking APIs, doing so in a high-priority interrupt handler freezes interrupt processing until the blocking condition is resolved. Use qurt_mutex_try_lock() for non-blocking attempts, and defer work to a lower-priority thread using signals if the lock is unavailable.

Performance Optimization

Using HVX (Hexagon Vector Extensions)

#include 
#include 

/* Process 128 bytes at once with HVX */
void vectorized_gain(int16_t *audio, int num_samples, int16_t gain)
{
    HVX_Vector *vptr = (HVX_Vector *)audio;
    HVX_Vector vgain = Q6_Vh_vsplat_R(gain);
    int num_vectors = num_samples * sizeof(int16_t) / sizeof(HVX_Vector);

    for (int i = 0; i < num_vectors; i++) {
        vptr[i] = Q6_Vh_vmpy_VhVh_sat(vptr[i], vgain);
    }
}

HVX provides 128-byte SIMD operations on the Hexagon DSP. The Q6_Vh_vsplat_R intrinsic broadcasts a scalar value across all lanes of a vector register. Q6_Vh_vmpy_VhVh_sat performs a saturating multiply of two half-word vectors. A single HVX instruction processes 64 16-bit samples, which can yield an order-of-magnitude speedup over scalar code for audio and signal processing workloads.

Locking L2 Cache for Hot Data

void lock_cache_example(void)
{
    extern float fft_twiddle_factors[];
    size_t twiddle_size = 1024 * sizeof(float);

    /* Pin data in L2 to prevent eviction */
    qurt_mem_l2cache_lock((unsigned int)fft_twiddle_factors,
                           twiddle_size);

    /* When done: */
    qurt_mem_l2cache_unlock((unsigned int)fft_twiddle_factors,
                             twiddle_size);
}

qurt_mem_l2cache_lock() pins a memory region in the L2 cache, preventing it from being evicted by other cache traffic. This is useful for lookup tables and constant data that are accessed frequently in hot loops (such as FFT twiddle factors).

Locking too much data in L2 reduces the cache available for other threads, so use this technique selectively.

Avoiding Dynamic Memory in Hot Paths

/* BAD: malloc in the audio processing loop */
void process_audio_bad(void) {
    while (1) {
        float *temp = malloc(1024 * sizeof(float));
        process(temp);
        free(temp);
    }
}

/* GOOD: pre-allocate everything */
static float temp_buffer[1024];
void process_audio_good(void) {
    while (1) {
        process(temp_buffer);
    }
}

malloc and free have non-deterministic execution time because they may traverse free lists, split or coalesce blocks, and in the worst case, request additional memory from the kernel.

In a real-time audio processing loop running at 48 kHz, a single slow allocation can cause an audible glitch. Pre-allocate all buffers during initialization and reuse them.

API Quick Reference

┌─────────────────────────────────────────────────────────────────┐
│                    QuRT API Quick Reference                     │
├─────────────────┬───────────────────────────────────────────────┤
│ THREADS         │                                               │
│  create         │ qurt_thread_create(&id, &attr, func, arg)     │
│  exit           │ qurt_thread_exit(status)                      │
│  join           │ qurt_thread_join(id, &status)                 │
│  get id         │ qurt_thread_get_id()                          │
│  sleep          │ qurt_timer_sleep(usec)                        │
├─────────────────┼───────────────────────────────────────────────┤
│ MUTEX           │                                               │
│  init           │ qurt_mutex_init(&mutex)                       │
│  lock           │ qurt_mutex_lock(&mutex)                       │
│  try lock       │ qurt_mutex_try_lock(&mutex)                   │
│  unlock         │ qurt_mutex_unlock(&mutex)                     │
│  destroy        │ qurt_mutex_destroy(&mutex)                    │
├─────────────────┼───────────────────────────────────────────────┤
│ SIGNALS         │                                               │
│  init           │ qurt_signal_init(&signal)                     │
│  wait           │ qurt_signal_wait(&sig, mask, attr)            │
│  set            │ qurt_signal_set(&signal, mask)                │
│  clear          │ qurt_signal_clear(&signal, mask)              │
│  destroy        │ qurt_signal_destroy(&signal)                  │
├─────────────────┼───────────────────────────────────────────────┤
│ TIMERS          │                                               │
│  create         │ qurt_timer_create(&timer, &attr)              │
│  delete         │ qurt_timer_delete(timer)                      │
│  sleep          │ qurt_timer_sleep(usec)                        │
│  ticks          │ qurt_sysclock_get_hw_ticks()                  │
├─────────────────┼───────────────────────────────────────────────┤
│ MEMORY          │                                               │
│  cache flush    │ qurt_mem_cache_clean(addr, sz, FLUSH)         │
│  cache inval    │ qurt_mem_cache_clean(addr, sz, INVALIDATE)    │
│  l2 lock        │ qurt_mem_l2cache_lock(addr, size)             │
│  l2 unlock      │ qurt_mem_l2cache_unlock(addr, size)           │
├─────────────────┼───────────────────────────────────────────────┤
│ SEMAPHORE       │                                               │
│  init           │ qurt_sem_init_val(&sem, count)                │
│  down (wait)    │ qurt_sem_down(&sem)                           │
│  up (post)      │ qurt_sem_up(&sem)                             │
│  destroy        │ qurt_sem_destroy(&sem)                        │
├─────────────────┼───────────────────────────────────────────────┤
│ BARRIER         │                                               │
│  init           │ qurt_barrier_init(&barrier, count)            │
│  wait           │ qurt_barrier_wait(&barrier)                   │
│  destroy        │ qurt_barrier_destroy(&barrier)                │
└─────────────────┴───────────────────────────────────────────────┘

This table lists the most commonly used QuRT API functions organized by category. The left column names the operation and the right column shows the function signature.

Thread operations cover creation, termination, joining, and sleeping.
Mutex operations provide lock, try-lock, and unlock.
Signal operations support wait, set, and clear with bitmask-based notifications. Timer operations handle creation, deletion, and sleeping, plus reading the hardware tick counter.
Memory operations cover cache flush and invalidate (essential for cross-processor buffers) and L2 cache locking for performance-critical data.
Semaphore and barrier operations round out the synchronization primitives.

Next Steps

This handbook covered the fundamentals of QuRT programming: thread management, synchronization, memory, timers, interrupts, pipes, FastRPC, and a multi-sensor fusion pipeline. The next steps for deeper learning follow a natural progression.

Start by downloading the Hexagon SDK and running the included example projects on the simulator. The examples in $HEXAGON_SDK_ROOT/examples/ demonstrate real ARM-DSP communication patterns through FastRPC and are the best way to see complete, working projects.

Read the QuRT User Guide in $HEXAGON_SDK_ROOT/docs/. It covers every API discussed in this article in full detail, plus many that weren't covered (such as QuRT's TLB management and power management interfaces).

Experiment with HVX, the Hexagon Vector Extensions. HVX is where the real performance of the Hexagon DSP lives, and learning to write vectorized DSP code is the single largest performance lever available to you.

Finally, get a development board (such as the Qualcomm RB5) and run your code on real hardware. The simulator validates correctness, but only real hardware reveals timing behavior, cache effects, and the interaction between your code and other software running on the DSP.

QuRT - freeCodeCamp.org

How to Use SCons to Build Software Projects [Full Handbook]

Table of Contents

Prerequisites

What is SCons and Why Does it Exist?

How SCons Works

How SCons Compares to Make, CMake, and Meson

SCons versus Make

SCons versus CMake

SCons versus Meson

A Side-by-Side Look at Make Versus SCons

Installing SCons

Core Concepts You Need Before Writing a Build File

The SConstruct Build File

SConscript Build Files

Construction Environment

Builder Methods

Nodes

The Three Environments in SCons

Construction Variables Reference

Your First SConstruct File

Building a Multi-File C++ Project Step by Step

Detailed Walkthrough of Every File in the Project

The SConstruct File

The Library SConscript

The Application SConscript

Running the Build and Understanding the Output

What Happens During an Incremental Build

Cross-Compiling for QuRT (Qualcomm Real-Time OS)

What is QuRT

The Hexagon SDK Directory Structure

The Cross-Compilation SConstruct

Writing QuRT-Specific Application Code

Building Both Native and QuRT From One SConstruct

How SCons Detects Dependencies and Decides What to Rebuild

Writing a Custom Scanner

The Shared Build Cache

Working with Shared Libraries

Adding Command-Line Options with AddOption

Configure Checks for Portability

Custom Builders for Non-Standard File Types

Builder with an External Command

Builder with a Python Function

The Command Builder for One-Off Rules

Aliases, Default Targets, and Install Rules

Platform-Specific Configuration

Customizing Build Output

How to Debug SCons Build Files

Print Variables

The --debug flag

The --dry-run flag

The Dump method

The SCons Command-Line Reference

Common Mistakes and How to Avoid Them

Summary

QuRT: The Real-Time OS Inside Your Phone's Processor [Full Handbook]

Table of Contents

Why QuRT Matters

Where QuRT Fits in the System

Setting Up Your Development Environment

Prerequisites

Installing the Hexagon SDK

Verifying Your Setup

Project Structure

Build Configuration with SCons

The QuRT Programming Model

Creating Your First QuRT Thread

Thread Creation Flow

How Thread Creation Works Internally

What the Kernel Does During Thread Creation

How the Stack Frame Launches Your Function

Context Switch Walkthrough

Thread Control Block Contents

Thread State Machine

Hardware Thread Slots

Full Thread Lifecycle

Building and Running on the Simulator

Working with Multiple Threads

Synchronization Primitives

Mutexes

The `--debug` flag

The `--dry-run` flag

The `Dump` method