freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

"Relaxation and its Role in Vision": The 1977 PhD Thesis That Helped Shape Modern AI Research

Mohammed Fahd Abrah — Tue, 21 Jul 2026 17:39:01 +0000

When people think of Geoffrey Hinton, they usually think of backpropagation, Boltzmann Machines, Deep Belief Networks, or the deep learning revolution that transformed artificial intelligence.

But few people look further back to the beginning of his research career.

In 1977, nearly a decade before the famous backpropagation paper, Hinton completed his PhD thesis at the University of Edinburgh titled "Relaxation and its Role in Vision." At first glance, it seems to be a thesis about computer vision and relaxation methods. That was exactly what I expected when I began reading it.

As I worked through the thesis, however, I realized that it was about much more than a vision algorithm. Many of the ideas that would later define Hinton's research were already taking shape. The terminology was different, the math was simpler, and neural networks hadn't yet become the focus of his work. But the same way of thinking was already there.

This review isn't a chapter-by-chapter summary of the thesis. Instead, it focuses on the ideas that stood out to me while reading it and explores how many of them reappeared in Hinton's later work. Some of these ideas became central to modern AI, while others remain surprisingly overlooked despite being discussed nearly fifty years ago.

Looking back, what impressed me most was not that the thesis predicted specific algorithms. It was that it introduced a consistent way of thinking about intelligence, perception, and computation that would continue to shape Hinton's research for decades.

I hope this review encourages more people to read this remarkable thesis, not simply as a historical document, but as the starting point of one of the most influential research journeys in artificial intelligence.

Thesis Overview

In this review, we'll explore Geoffrey Hinton's 1977 PhD thesis, "Relaxation and its Role in Vision", completed at the University of Edinburgh.

We'll begin by looking at the central problem Hinton set out to solve and the ideas that motivated his relaxation approach. From there, we'll explore how the thesis represents uncertainty, reasons about competing hypotheses, and searches for globally consistent interpretations.

Next, we'll examine the puppet program, the relaxation operator, the role of schemas and stored knowledge, the SETTLE system, and Hinton's comparisons with other approaches of the time. We'll also discuss the limitations he identified in his own method and why they mattered.

Finally, we'll look at how many of the ideas introduced in this thesis reappeared throughout Hinton's later work and helped shape the development of modern AI.

If you'd like to follow along, you can also read the original thesis:

Geoffrey Hinton. Relaxation and its Role in Vision. PhD thesis, University of Edinburgh, 1977.

Here is an infographic gives a quick overview of Geoffrey Hinton's 1977 PhD thesis. It summarizes the main ideas, how the relaxation method works, its applications, its limitations, and why many of these ideas still matter today.

The Core Challenge: Why Visual Systems Can't Afford to Guess Too Soon
The First Appearance of Thinking as Optimization
Vision Is Inference, Not Pattern Matching
Why Perception Requires Hypotheses
From Binary Decisions to Degrees of Belief
Distributed Computation Before Neural Networks
Parallelism as the Natural Way to Compute
Constraint Propagation
Local Rules Can Produce Global Intelligence
Why Local Consistency Is Not Enough
Relaxation as a Way of Reasoning
The Importance of Equilibrium
From Symbolic Decisions to Numerical Reasoning
Why Perception Is a Search Problem
Beyond Pattern Recognition: Why Internal Representations Matter More Than the Final Output
The Importance of Intermediate and Hierarchical Representations
Schemas and Stored Knowledge
The SETTLE System
Uncertainty and Ambiguity as the Foundation of Reasoning
The Whole Picture
A Consistent Philosophy Across Five Decades
Permission to Publish
Further Reading

The Core Challenge: Why Visual Systems Can't Afford to Guess Too Soon

Before exploring the ideas in Hinton's thesis, it helps to understand the problem he set out to solve. The opening chapter asks a deceptively simple question: How can a visual system choose the correct interpretation when a single image may support many plausible explanations?

This is the central challenge of visual perception. Real-world scenes are often ambiguous or partially hidden, so a system can't afford to commit to one interpretation too early. A premature decision can introduce errors that spread through the rest of the reasoning process and lead to an incorrect understanding of the entire scene.

The real challenge is to keep multiple plausible interpretations alive until there is enough evidence to determine which one is most consistent.

Hinton argues that the common approaches of the 1970s didn't solve this problem. One approach, known as the principle of least commitment, delayed decisions by leaving information unspecified. According to Hinton, this simply postponed the real issue because it offered no way to compare competing hypotheses or determine how they should become consistent with one another.

Another approach assigned fixed meanings to low-level visual features. But since the meaning of a feature depends on its surrounding context, these rigid definitions often failed when objects were partially hidden or appeared in different situations.

The infographic below summarizes the central challenge Hinton identifies at the beginning of his thesis. Rather than committing to the first plausible interpretation of a visual scene, he argues that a vision system should maintain many competing hypotheses simultaneously and allow them to interact until they converge on a single, globally consistent explanation.

It also highlights two contemporary approaches that Hinton rejects, the principle of least commitment and rigid feature semantics, because, in his view, they avoid the core problem instead of solving it.

This framing establishes the motivation for the relaxation framework developed throughout the rest of the thesis.

The First Appearance of Thinking as Optimization

One of the most interesting ideas in Hinton's thesis is that perception isn't a matter of instantly recognizing an object. Instead, he treats it as a process of finding the best explanation for what the eyes are seeing.

Rather than committing to a single interpretation from the start, the system considers many possible hypotheses at the same time. Some support each other, others compete, and their confidence changes as they interact. Through repeated updates, weak explanations gradually disappear while the strongest and most consistent interpretation emerges.

Although Hinton applies this idea to visual perception, the underlying principle reaches far beyond computer vision. It introduces a way of thinking about intelligence as an optimization problem: many possible explanations compete until the system settles on the one that best fits the available evidence.

Looking back, this idea feels surprisingly familiar. The same general philosophy later appeared in probabilistic inference, energy-based models, Conditional Random Fields (CRFs), Boltzmann Machines, and many other approaches where intelligence emerges by searching for the most consistent solution rather than making a single immediate decision.

Vision Is Inference, Not Pattern Matching

One idea that stands out throughout the thesis is Hinton's view of what it actually means to see. He argues that vision is not simply recognizing patterns or assigning an image to a category. Instead, perception is the process of building an internal explanation of the scene.

A visual system doesn't immediately know what it's looking at. It must decide which objects are present, how they relate to one another, and which interpretation best explains the available evidence. In other words, seeing is a process of inference, not just recognition.

Hinton also rejects the idea that perception works by simply comparing an input with a collection of stored templates. He argues that this view is too limited to explain how we understand complex and unfamiliar scenes.

Instead, perception is presented as a constructive process. The system builds an interpretation by combining evidence, relationships, and prior knowledge until a coherent explanation emerges. It's not retrieving an answer from memory but actively constructing one.

Reading this today is striking because it closely resembles ideas that became popular decades later. Modern generative models and latent variable methods are also built around the idea of explaining observations by inferring the hidden structure that produced them.

These ideas also feel remarkably close to modern representation learning, where the goal isn't to memorize examples but to learn meaningful internal representations that can explain new observations.

Hinton was exploring these ways of thinking in 1977, long before they became a central theme in modern AI.

Why Perception Requires Hypotheses

Hinton argues that perception can't be a purely reactive process. A visual system often receives incomplete, ambiguous, or even misleading information, so it can't simply accept the first interpretation that comes to mind.

Instead, it must begin with several possible explanations. As more evidence is considered, some hypotheses become more convincing while others are weakened or rejected. The final interpretation is reached only after this process of evaluation and refinement.

Although Hinton doesn't describe it using modern Bayesian terminology, the underlying idea is remarkably similar. Rather than making an immediate decision, the system continuously updates its beliefs as evidence accumulates until the most consistent explanation remains.

From Binary Decisions to Degrees of Belief

Another idea that feels remarkably modern is Hinton's decision to avoid treating hypotheses as simply true or false. Instead, every hypothesis is assigned a value between 0 and 1 that reflects how strongly the system currently believes it. As the relaxation process unfolds, these values are updated repeatedly until the most consistent interpretation stands out while the others gradually fade away.

Today, we use different terms for similar concepts, including probabilities, belief values, confidence scores, activations, and logits. The terminology has evolved over the years, but the underlying idea remains the same: intelligence often depends on representing uncertainty instead of making immediate, irreversible decisions.

The infographic below illustrates how Hinton's relaxation process operates after hypotheses have been assigned continuous belief values.

Rather than selecting a single answer immediately, the system repeatedly updates all competing hypotheses in parallel, using both numerical constraints and individual preferences until one coherent interpretation gradually emerges.

By replacing rigid yes-or-no decisions with continuous optimization, the relaxation framework makes it possible to search efficiently for a globally consistent solution.

Distributed Computation Before Neural Networks

One of the most forward-looking ideas in the thesis is that intelligence shouldn't depend on a single central controller making every decision. Instead, Hinton describes a system made up of many local hypotheses that interact with one another at the same time. Each contributes a small part of the final solution, and together they produce a coherent interpretation.

Instead of focusing on individual components, Hinton emphasizes how these connections allow information to flow through the system until a consistent interpretation emerges.

This way of thinking feels surprisingly familiar today. Modern neural networks are also built on the idea that complex behavior can emerge from the combined activity of many simple units rather than from one component directing the entire process.

The terminology is different from modern deep learning, but the emphasis on networks, interactions, and distributed computation is already clearly visible.

Parallelism as the Natural Way to Compute

Another idea that stands out is Hinton's emphasis on parallel computation. At a time when most computers were designed to execute instructions one after another, he argued that perception is better viewed as many processes working simultaneously and influencing one another.

Looking back, this was an unusually forward-looking perspective. Decades before massively parallel hardware became common, Hinton was already describing computation in a way that closely resembles how modern neural networks run today, with many simple operations happening at the same time rather than one step after another.

Constraint Propagation

A recurring idea throughout the thesis is that no hypothesis should be evaluated in isolation. Instead, each one influences the others through a network of constraints. When the confidence of one hypothesis changes, that change spreads across the network, strengthening compatible explanations and weakening conflicting ones.

This idea later became a common theme in several areas of AI. Graphical models, factor graphs, message passing, and belief propagation all rely on the same basic intuition: local interactions can gradually lead to a globally consistent solution.

Although these methods were developed later and use different mathematical frameworks, it's not difficult to see the conceptual connection.

To demonstrate how constraints interact during relaxation, Hinton chose a deliberately simplified vision problem instead of real photographs.

A user first drew several transparent, overlapping rectangles on a graphics terminal. Some rectangles represented genuine parts of a stick-figure puppet, such as the torso, arms, or legs, while others acted as irrelevant distractors.

Every overlap between rectangles became a candidate joint, and the system generated competing hypotheses about which rectangles belonged to the puppet and which overlaps represented real connections. Its goal was to identify the interpretation with the greatest number of mutually consistent instantiated joints, while remaining robust to missing body parts and irrelevant clutter.

By removing the complexity of natural images, Hinton isolated the combinatorial challenge of visual interpretation while keeping the problem mathematically manageable.

The infographic below illustrates how Hinton used this simplified puppet domain to evaluate the relaxation framework. By reducing vision to identifying consistent body parts and joints, the example isolates the core challenge of combining many competing local hypotheses into a single globally consistent interpretation.

Although intentionally simple, the puppet experiment captures the essential reasoning problem of computer vision: many local hypotheses compete simultaneously, constraints propagate between them, and only the globally most consistent interpretation survives.

Hinton presents the domain as a controlled laboratory for studying these interactions before extending the same relaxation principles to more realistic vision problems.

Local Rules Can Produce Global Intelligence

One of the ideas I enjoyed most in this thesis is how a complex solution emerges from simple local interactions. Each hypothesis only needs to communicate with the hypotheses directly connected to it. There's no central component that knows the correct answer or controls the entire process.

As information flows through the network, the system gradually settles on a consistent interpretation. The final result emerges from cooperation rather than command.

This same principle continues to appear throughout AI research. Neural networks, swarm intelligence, graph neural networks, and belief propagation all demonstrate how complex behavior can arise from many simple components following local rules.

The puppet task wasn't just a toy example. It was a complete program with its own processing pipeline. Starting from a drawing of overlapping rectangles, the system generated hypotheses, applied constraints, repeatedly updated their confidence, and finally selected the most consistent interpretation.

The infographic below illustrates how Hinton's entire relaxation framework fits together as a complete computational pipeline. It shows how hypothesis generation, constraint construction, iterative relaxation, and final selection work as successive stages of a single reasoning process.

Rather than relying on a central controller or modern end-to-end training, the system reaches a coherent solution through repeated local interactions among competing hypotheses. This illustrates how simple local rules can produce globally consistent behavior.

Why Local Consistency Is Not Enough

One important point Hinton makes is that solving small local conflicts doesn't necessarily produce the best overall interpretation. A hypothesis may fit well with its immediate neighbors while still contributing to an incorrect explanation of the entire scene.

For that reason, the system must evaluate how all the hypotheses work together rather than judging each one independently.

This shift from local agreement to finding the best overall solution is a key theme throughout the thesis. It also reflects a broader direction that AI would later take, where many problems are formulated as global optimization rather than a collection of isolated local decisions.

Relaxation as a Way of Reasoning

As I read the thesis, I began to see relaxation as more than just an algorithm. It is a way of approaching difficult problems. Instead of trying to reach the correct answer in a single step, the system starts with tentative beliefs, refines them through repeated interactions, and continues until the solution becomes stable.

This idea feels surprisingly familiar today. Although the mathematics is different, many modern methods follow the same pattern. Gradient descent improves parameters step by step, the Expectation-Maximization (EM) algorithm alternates between refinement stages, belief propagation repeatedly exchanges information, and diffusion models generate samples through a sequence of gradual updates.

The methods are different, but the underlying philosophy is remarkably similar: good solutions often emerge through many small improvements rather than one decisive computation.

So how does relaxation actually work? Hinton's answer is surprisingly simple. During each update, the system balances two forces: one keeps the hypotheses consistent with the constraints, while the other gently pushes them toward better explanations. Repeating this process eventually leads to a stable solution.

The infographic below illustrates how Hinton translates this reasoning process into a concrete computational procedure. Rather than making a single decision, the relaxation operator repeatedly updates all hypotheses in parallel, applying the same two-force rule until the entire network settles into a stable, globally consistent state.

The process shows how simple local updates can collectively produce a coherent global interpretation.

The Importance of Equilibrium

Another idea that appears throughout the thesis is the importance of allowing the system to reach a stable state. The final interpretation isn't imposed by a central controller or chosen by a fixed rule. Instead, it emerges naturally as the hypotheses interact until no further changes are needed.

This idea became a recurring theme in Hinton's later work. Hopfield Networks, Boltzmann Machines, and energy-based models all rely on systems evolving toward stable configurations through their own internal dynamics. Although the models are different, the underlying intuition is much the same: a good solution is one the system naturally settles into.

Hinton didn't just describe what worked. He also analyzed the situations where relaxation could break down, discussing both the possibility of converging to ambiguous intermediate solutions and the architectural limitations of the overall framework.

The infographic below illustrates these two limitations. During relaxation, the system may converge to a stable solution that lies between discrete interpretations rather than fully committing to a single one.

It also highlights a broader architectural weakness: hypothesis generation and hypothesis selection remain separate stages, preventing later reasoning from influencing which candidate hypotheses are created in the first place.

Hinton openly presents these limitations, which later AI systems addressed through more integrated and end-to-end learning approaches.

From Symbolic Decisions to Numerical Reasoning

One of the subtle but important shifts in the thesis is the move away from treating knowledge as simply true or false. Instead of relying on rigid symbolic decisions, Hinton represents beliefs with numerical values that can increase or decrease as new evidence is considered.

This may seem like a small design choice, but it reflects a much broader change in how intelligent systems can reason. Rather than forcing early decisions, the system keeps track of uncertainty and adjusts its beliefs over time. Looking back, this is the same direction that much of modern machine learning would eventually follow.

Hinton didn't develop these ideas in isolation. In the thesis, he evaluates his relaxation framework alongside other prominent approaches of the time, highlighting their different ways of representing uncertainty and reasoning about visual scenes.

The infographic below compares three approaches side by side. Using the same line-labeling problem as a common benchmark, it shows how Waltz's filtering algorithm, fuzzy-weight models, and Hinton's relaxation framework represent uncertainty, update hypotheses, and enforce consistency.

Hinton argues that his supposition-value framework provides a more principled way to reason under uncertainty by combining continuous confidence values with explicit numerical constraints, allowing competing interpretations to evolve toward a globally consistent solution.

Why Perception Is a Search Problem

One of the ideas that repeatedly appears throughout the thesis is that perception is fundamentally a search process. A visual system isn't simply recognizing an object from what it sees. Instead, it's searching through many possible explanations to find the one that best fits the available evidence.

This distinction is more important than it might first appear. Recognition suggests that the answer is already obvious and only needs to be retrieved. Search assumes that the correct interpretation must be discovered by exploring alternatives and resolving uncertainty.

Even today, that way of thinking continues to shape many approaches to artificial intelligence.

If perception is a search problem, the next question becomes what the system is actually searching through. Hinton answers this by introducing a geometric view of the search process, where every possible interpretation occupies a position within a structured space of feasible solutions.

The infographic below illustrates this geometric perspective. It represents all valid hypothesis assignments as points inside a feasible search space, where the corners correspond to clean all-or-nothing interpretations and intermediate points represent uncertain or partial beliefs.

Rather than searching directly among discrete solutions, the relaxation process moves continuously through this space, gradually improving the current state until it converges on the highest-scoring feasible interpretation.

This geometric perspective provides an intuitive way to understand how relaxation transforms a difficult combinatorial search into a continuous optimization problem.

Beyond Pattern Recognition: Why Internal Representations Matter More Than the Final Output

One of the most thought-provoking parts of the thesis is Hinton's criticism of viewing perception as nothing more than pattern recognition. He argues that recognizing visual features alone can't explain how we understand a scene. A vision system must also determine how objects are related, how their parts fit together, and how those relationships combine into a coherent interpretation of the world.

This emphasis on relationships is central to Hinton's view of perception. Understanding a scene requires more than identifying individual objects. The system must represent objects, their constituent parts, and the structural relationships that connect them into larger wholes. In other words, perception is fundamentally about building a structured representation of the scene rather than recognizing isolated patterns.

As I read the thesis, one theme kept resurfacing: Hinton is less interested in the final answer than in the internal representation the system builds while interpreting a scene. The goal isn't simply to assign a label to an image, but to construct a structured description that captures the relationships between its components. The final decision is simply the outcome of this richer reasoning process.

Looking back, this perspective was remarkably forward-looking. Many modern AI systems have moved beyond simple classification toward learning internal representations that capture structure, relationships, and context. Although today's models use very different mathematical tools, the underlying intuition is strikingly similar.

Throughout his later career, Hinton consistently emphasized that the quality of an intelligent system depends less on its final output than on the representations it learns along the way.

The Importance of Intermediate and Hierarchical Representations

One idea that caught my attention is Hinton's discussion of intermediate-level hypotheses. Rather than moving directly from visual input to a final interpretation, he argues that perception benefits from intermediate representations that bridge the gap between raw observations and complete understanding.

Understanding a scene happens at multiple levels. Simple elements combine to form larger structures, and those structures become part of an even richer interpretation. Perception is built gradually through a hierarchy rather than all at once.

Looking back, these ideas feel strikingly familiar. Modern deep learning is built on the principle of intermediate-level hypotheses, with each layer learning increasingly abstract representations before reaching a final prediction.

And the idea of hierarchical perception would continue to appear throughout Hinton's later research. Whether in Deep Belief Nets, Capsule Networks, or hierarchical generative models, the same principle remains: meaningful representations are built layer by layer, with each level capturing patterns that the previous one could not.

The terminology has once again changed, but the intuition is much the same: complex understanding is achieved through a hierarchy of intermediate representations, not in a single step.

Schemas and Stored Knowledge

In the later chapters, Hinton introduces the idea of schemas as a way to organize knowledge and connect it to perception. Rather than treating perception and stored knowledge as separate processes, he shows how they can work together to interpret what the system observes.

One of the most interesting ideas in these chapters is Hinton's view of schemas. Instead of storing exact examples, he argues that knowledge should capture the rules, relationships, and constraints that define a category. This allows the system to interpret new situations by reasoning about their underlying structure rather than simply matching what it has already seen.

Reading this today, it's easy to see why the idea remains relevant. Although the terminology has changed, many modern AI systems also rely on learned internal representations that support generalization instead of memorization. In that sense, Hinton's discussion of schemas can be seen as an early step toward concepts that later evolved into latent representations and internal world models.

The infographic below illustrates Hinton's contrast between schema-based reasoning and template matching. Instead of relying on stored examples, schemas represent structural knowledge through roles, relationships, and constraints that guide interpretation. New observations are understood by satisfying these structural relationships rather than by finding an exact match to a memorized template.

This perspective foreshadows later developments in representation learning, where successful generalization depends on learning underlying structure instead of memorizing individual examples.

The SETTLE System

One part of the thesis that deserves far more attention is SETTLE, an experimental reasoning system Hinton developed to combine schemas, inference rules, and relaxation into a single computational framework. It's often overshadowed by the earlier chapters on relaxation, but it reveals how Hinton was already thinking about integrating multiple forms of reasoning rather than treating them as separate processes.

Instead of applying rules independently or storing knowledge in isolation, SETTLE allows schemas, inference rules, relaxation, and dynamic network construction to cooperate while the system gradually builds the most consistent interpretation from uncertain evidence.

Looking back, SETTLE is interesting not because it resembles modern AI systems in detail, but because it reflects Hinton's early effort to integrate knowledge, reasoning, and inference into a unified computational process.

The infographic below illustrates how these components interact within SETTLE. Inference rules generate candidate conclusions while the relaxation process continuously evaluates their consistency. This allows evidence, rules, and competing hypotheses to influence one another until they converge on the most coherent interpretation.

Uncertainty and Ambiguity as the Foundation of Reasoning

One theme that runs throughout the thesis is that uncertainty isn't a problem to avoid but a natural starting point for perception. A visual system doesn't begin with complete knowledge or immediate confidence. Instead, it starts with tentative assumptions that are gradually strengthened, weakened, or discarded as more evidence is taken into account.

Closely related to this is Hinton's treatment of ambiguity. Rather than viewing multiple possible interpretations as a failure of perception, he accepts them as an unavoidable consequence of incomplete information.

A visual scene can support several plausible explanations, and the system should allow those possibilities to coexist until enough evidence is available to distinguish between them. Instead of forcing an early decision, it gradually moves toward the most consistent interpretation.

Although Hinton's thesis predates modern probabilistic graphical models and Bayesian inference methods, its underlying perspective is remarkably close to probabilistic thinking.

The mathematical tools would evolve considerably over the following decades, but the central idea remained the same: intelligent systems should reason under uncertainty rather than expect complete and perfect information from the start. Looking back, it's not difficult to see why this perspective became one of the defining principles of modern AI.

The Whole Picture

The infographic below provides an overview of Hinton's thesis by bringing together its main contributions in a single view. It compares his relaxation framework with other approaches available in the late 1970s, outlines the different application domains explored throughout the thesis (from simplified vision tasks to schema-based reasoning and the SETTLE system) and concludes with the key limitations Hinton openly identifies.

Taken together, these elements show that the thesis is not only a proposal for a relaxation algorithm, but also a broader research program for reasoning under uncertainty that influenced many ideas developed in later AI systems.

A Consistent Philosophy Across Five Decades

After finishing the thesis, what impressed me most was not a single algorithm or experiment. It was the continuity of Hinton's thinking. Many of the ideas introduced in 1977 reappear throughout the rest of his career, even though the mathematical tools and models changed dramatically.

The thesis begins with relaxation, competing hypotheses, distributed constraints, optimization, and stable solutions. Later came Boltzmann Machines, backpropagation, Deep Belief Networks, AlexNet, the Forward-Forward algorithm, and more recently, his ideas on mortal computation.

At first glance, these contributions seem very different. Yet they're connected by a remarkably consistent research philosophy.

Throughout his career, Hinton has viewed intelligence as something that emerges from the interaction of many simple computational elements rather than from a central controller. He has consistently emphasized distributed knowledge over isolated symbols, inference over simple recognition, rich internal representations over final outputs, optimization as the mechanism for intelligent behavior, and uncertainty as something to be represented and refined rather than ignored.

Looking back from today, Hinton's 1977 thesis feels less like an isolated piece of early research and more like the beginning of an intellectual journey that would shape nearly five decades of artificial intelligence research.

This final infographic illustrates these conceptual connections. Rather than presenting Hinton's later work as direct implementations of his thesis, it shows how many of its central ideas, including continuous confidence values, optimization-based perception, structured knowledge representations, and integrated reasoning, continued to reappear in later developments such as Boltzmann Machines, backpropagation, Deep Belief Networks, and energy-based models.

The emphasis isn't on a single line of technical development, but on the remarkable continuity of the research philosophy that connects Hinton's earliest work to many of his later contributions.

Permission to Publish

Before writing this review, I contacted Professor Geoffrey Hinton to request permission to publish an educational review of his thesis. Professor Hinton kindly granted permission for the publication of this review.

The article is written entirely in my own words and reflects my own interpretation of the thesis, with full acknowledgment of the original work.

How to Serve a Multi-User AI Agent with FastAPI and Streamlit

Darsh Shah — Mon, 20 Jul 2026 22:07:49 +0000

In this tutorial, I’ll show you how to serve a multi-user local AI agent as a REST API using FastAPI, then add a lightweight Streamlit UI on top.

Instead of interacting with the agent through a terminal, we’ll expose it over HTTP so multiple users can access it through a chat-style frontend interface. Each session will maintain its own conversation history and streamed responses.

The local AI agent will be built with LangChain v1, Ollama, Qwen, and Python, running on your own machine and ready to plug into larger applications without any per-call model API charges.

Background
What is FastAPI?
What is Streamlit?
What Is Multi-User Support?
Motivation and Architecture
Step 1: Install Ollama and Pull the Model
Step 2: Install Python Dependencies
Step 3: Build the agent and API layer with FastAPI
Step 4: Build Streamlit UI
Step 5: Run the backend app
Step 6: Run the frontend app
Sample Output
What to Improve Before Production
Conclusion

Background

Many AI agents start out as simple Python scripts that run in a command-line terminal. You type a message, the agent responds, and everything happens in a single local session.

That setup is great for development and testing, but it becomes limiting when you want other people or applications to interact with the agent.

To make an AI agent truly useful, we need to expose it through an interface that other users can access. A REST API is a practical way to do that.

To follow this tutorial, you'll need Ollama installed on your machine. The tutorial works on macOS, Windows, and Linux. I'm using a MacBook Pro with 32 GB of RAM, but you can run this on a lower-memory machine by choosing a smaller Qwen model from Ollama.

What is FastAPI?

FastAPI is a Python web framework for building APIs. In this tutorial, it gives us a simple way to expose the agent over HTTP so other apps, scripts, or services can call it.

FastAPI is a good fit for AI apps because it gives us a clean boundary around the system. We define the request and response models in Python, FastAPI validates them automatically, and it turns HTTP requests into Python objects and Python objects back into JSON. It also generates interactive API docs for free and supports async endpoints, which is useful for AI workloads that may take longer to respond.

What is Streamlit?

Streamlit is a Python framework for building lightweight web interfaces with minimal frontend work. It lets us create interactive browser-based apps using normal Python code instead of HTML, CSS, and JavaScript.

In this tutorial, Streamlit sits on top of the FastAPI backend as a thin client. FastAPI exposes the AI agent over HTTP, and Streamlit gives us a simple UI for calling that API and displaying the results. That separation keeps the backend reusable while still making the agent easy to use in the browser.

What Is Multi-User Support?

Multi-user support means the AI agent can handle requests from more than one user while keeping each user’s session separate.

For example, User 1 asks the agent one question and User 2 asks a different question. The agent should remember the correct context for each user independently. Without multi-user support, all users may end up sharing the same conversation state, which can lead to mixed responses, incorrect memory, or overwritten context.

Motivation and Architecture

Turning an AI agent into an API is the natural next step after building it locally. A Python script is great for experimenting, but an API makes the agent reusable. And adding multi-user support makes the agent extensible to be used by others.

To keep things simple, we’ll use a small local agent powered by Ollama and Qwen. The agent has two tools: one for checking the current time and another for counting words.

FastAPI provides the HTTP layer by exposing one endpoint called /chat/stream. When the request comes in with a user message, Pydantic validates the request, LangChain handles the agent loop and tool calling, and the final answer is returned as stream. Streamlit sits on top of that API and acts as a frontend that sends requests to the API and displays the results.

Example request:

{ 
    "message": "How many words are in: LangChain makes tool calling easier",
    "user_id":"123e4567-e89b-12d3-a456-426614174000"
 }

Example response:

{
  "answer": "There are **5** words in LangChain makes tool calling easier."
}

The model runs locally through Ollama, so there are no per-call model API charges.

Step 1: Install Ollama and Pull the Model

To get started, install the Ollama application for your platform.

We’ll use Qwen as the chat model. I’m using qwen3.5:4b. If your machine has less RAM, you can use qwen3.5:0.8b instead.

ollama pull qwen3.5:4b

Step 2: Install Python Dependencies

Create a virtual environment and install the required packages:

python3 -m venv venv
source venv/bin/activate

pip install fastapi uvicorn streamlit requests langchain langchain-core langchain-ollama langgraph

If tutorial requires LangChain >= 1.0.0.

Step 3: Build the Agent and API Layer with FastAPI

This application has three main responsibilities. FastAPI exposes the HTTP endpoint, Pydantic validates the incoming request data, and LangChain runs the agent, including tool calling and short-term memory.

The user_id sent with each request is used as the thread identifier, allowing the checkpointer to keep each user’s conversation history separate. This memory is per session. So every new session will have its own memory.

Another important detail is that the agent is created only once at startup with agent = build_agent(). Reusing the same agent instance avoids rebuilding the model and tool list for every request, which reduces overhead and improves response times while still supporting multiple users.

Inside the /chat/stream endpoint, the backend uses LangChain’s stream_events(..., version="v3") to generate the response as a stream instead of waiting for the full answer all at once. FastAPI then wraps that stream in a StreamingResponse, so the frontend can receive the output gradually as it's produced. This makes the app feel much more interactive, because users can start reading the answer immediately while the rest is still being generated.

Put together, this gives you a lightweight backend that validates input, preserves separate memory for each user, and streams responses to the UI in real time.

Save the following code as app.py:

from datetime import datetime
from uuid import UUID

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse

from pydantic import BaseModel

from langchain.agents import create_agent
from langchain_core.tools import tool
from langchain_ollama import ChatOllama
from langgraph.checkpoint.memory import InMemorySaver

CHAT_MODEL = "qwen3.5:4b"

SYSTEM_PROMPT = (
    "You are a helpful assistant with access to tools for getting the current time "
    "and counting words in text. "
    "Use tools when needed. If the question does not need a tool, answer directly."
)

# -----------------------------
# Request model
# -----------------------------

class ChatRequest(BaseModel):
    user_id: UUID
    message: str

# -----------------------------
# Tools
# -----------------------------

@tool
def current_time() -> str:
    """Return the current local date and time."""
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")


@tool
def word_count(text: str) -> int:
    """Count the number of words in a piece of text."""
    return len(text.split())


# -----------------------------
# Agent + checkpoint memory
# -----------------------------

# Store conversation history in short term memory
checkpointer = InMemorySaver()

def build_agent():
    model = ChatOllama(model=CHAT_MODEL, temperature=0)
    return create_agent(
        model=model,
        tools=[current_time, word_count],
        system_prompt=SYSTEM_PROMPT,
        checkpointer=checkpointer,
    )


agent = build_agent()

# -----------------------------
# Streaming endpoint
# -----------------------------

app = FastAPI()

@app.post("/chat/stream")
def chat_stream(req: ChatRequest):
    def generate():
        run = agent.stream_events(
            {
                "messages": [{"role": "user", "content": req.message}],
            },
            config={
                "configurable": {
                    # Keep each user's short-term memory isolated
                    # by using their user_id as the thread ID.
                    "thread_id": str(req.user_id),
                }
            },
            version="v3",
        )

        for message in run.messages:
            for token in message.text:
                yield token

    return StreamingResponse(generate(), media_type="text/plain")

Step 4: Build Streamlit UI

The Streamlit code creates a simple chat interface for the AI agent and keeps each browser session tied to a unique user_id.

When the app first loads, it generates and stores a UUID in st.session_state, which is later sent to the backend so the agent can keep that user’s conversation history separate from other users. It also creates a chat_history list in session state so previous messages remain visible every time Streamlit reruns the script. The app then loops through that saved history and displays each message in a chat-style format using st.chat_message().

When the user enters a new message through st.chat_input(), the app immediately saves and displays it, then sends it to the backend API with a POST request to http://127.0.0.1:8001/chat/stream along with the session’s user_id.

The request is made with stream=True, which allows the response to arrive gradually instead of all at once. As each chunk of text is received from the backend, the code appends it to full_answer and updates a placeholder on the page, creating a live streaming effect. Once the response is complete, the final assistant message is stored in chat_history so it remains part of the conversation on the page

Save the below as streamlit_app.py

import uuid
import requests
import streamlit as st

API_URL = "http://127.0.0.1:8001/chat/stream"

st.title("Local AI Agent")

if "user_id" not in st.session_state:
    st.session_state.user_id = str(uuid.uuid4())

if "chat_history" not in st.session_state:
    st.session_state.chat_history = []

# Show previous messages
for item in st.session_state.chat_history:
    with st.chat_message(item["role"]):
        st.markdown(item["content"])

message = st.chat_input("Enter a message")

if message:
    # Save and show user message
    st.session_state.chat_history.append({"role": "user", "content": message})
    with st.chat_message("user"):
        st.markdown(message)

    # Stream assistant response
    full_answer = ""
    with st.chat_message("assistant"):
        placeholder = st.empty()

        # Send the reqeust to backend API via POST request
        with requests.post(
            API_URL,
            json={
                "message": message,
                "user_id": st.session_state.user_id,
            },
            stream=True,
        ) as response:
            response.raise_for_status()

            for chunk in response.iter_content(chunk_size=None, decode_unicode=True):
                if chunk:
                    full_answer += chunk
                    placeholder.markdown(full_answer)

    # Save final assistant response
    st.session_state.chat_history.append(
        {"role": "assistant", "content": full_answer}
    )

Step 5: Run the Backend App

Start the server with Uvicorn:

uvicorn app:app --reload --port 8001

Once the application starts, open:

http://127.0.0.1:8001/
http://127.0.0.1:8001/docs

The /docs endpoint is automatically generated by FastAPI using your Pydantic models. It provides an interactive interface where you can test the API without writing any client code.

You can send requests directly from curl. In your terminal, run these commands to invoke the API for the AI agent and check the output:

$ curl -X POST http://127.0.0.1:8001/chat/stream \
  -H "Content-Type: application/json" \
  -d '{"message":"What time is it?","user_id":"123e4567-e89b-12d3-a456-426614174000"}'

$ curl -X POST http://127.0.0.1:8001/chat/stream \
  -H "Content-Type: application/json" \
  -d '{"message":"How many words are in: LangChain makes tool calling easier","user_id":"123e4567-e89b-12d3-a456-426614174000"}'

$ curl -X POST "http://127.0.0.1:8001/chat/stream" \
-H "Content-Type: application/json" \
-d '{"message":"What is the capital of France?","user_id":"123e4567-e89b-12d3-a456-426614174000"}'

To stop the server, press Ctrl+C in the terminal.

Step 6: Run the Frontend App

In another terminal, go to the project directory:

source venv/bin/activate
streamlit run streamlit_app.py

That opens the frontend in your browser at http://localhost:8501/. Try the example prompts like "What is the capital of France". You should see the answer in a chat style interface.

The UI is calling the FastAPI endpoint and invoking the AI agent. You now have a working end to end application for your local AI agent that you can play with.

To stop the server, press Ctrl+C in the terminal.

Sample Output

The image below show two browser sessions of the app running side by side on the same endpoint. Each session is assigned a unique id, which allows the backend to maintain a separate conversation history for each user.

Even though both users ask the same question, “Who am I?”, the responses are different because each session’s answer is based on its own prior messages.

What to Improve Before Production

Although this application is fully functional, it's still intentionally minimal. It already supports a reusable FastAPI backend, a Streamlit chat interface, per-user conversation history, and streaming responses.

If you wanted to take it further, the next steps would be adding authentication, persistent storage, structured logging, monitoring, and more robust deployment setup.

It's also worth noting that if your goal is simply to get a polished self-hosted chat UI up and running quickly, you may not need to build the frontend yourself. Projects like LibreChat and Open WebUI already provide richer interfaces and broader features out of the box.

This tutorial takes a different approach: instead of adopting a full platform, it shows how to build a lightweight custom stack yourself so you can better understand the architecture and have more control over how the agent is exposed.

Conclusion

In this tutorial, we took a local AI agent, wrapped it in a FastAPI app, and used Streamlit UI on top of it.

This transforms the AI agent from a standalone script into a reusable service. Instead of only working in a terminal, it can now be accessed through a simple HTTP endpoint by other apps, scripts, or internal tools.

By assigning each session a unique id, the service can also maintain separate conversation history for multiple users, making it possible to support a chat-style interface with isolated memory per session.

From here, you can continue extending the same service by adding authentication or production-ready features. Happy tinkering!

If you enjoyed this tutorial, you can find more of my writing on my blog (recent posts include system design paper series), my work on my personal website, and updates on LinkedIn.

How to Use Apple’s Foundation Models in a Web App with a macOS Companion

Balogun Wahab — Mon, 20 Jul 2026 21:27:11 +0000

Not every AI feature needs a cloud model, with its per-token bills, network round-trips, and private data leaving your machine. If you're on a modern Mac, a capable language model is already on your disk.

Foundation Models is Apple's Swift framework for working with large language models. It's the on-device model behind Apple Intelligence, Apple's Private Cloud Compute, or another provider's server model.

This tutorial targets the on-device model: you send it a prompt and it runs entirely on the Mac's own hardware locally, free-per-call, and offline-friendly.

Paired with Apple Vision for reading images on device, that's enough to build real AI features like summaries, classification, and structured extraction without the data ever leaving your machine.

What You Will Build
Prerequisites
Why a macOS Companion App?
Foundation Models Can't Read Images Directly
Project Structure
Build the React App
Build the macOS Companion App
Check Foundation Models Availability
Extract Text with Apple Vision
Ask Foundation Models to Explain the Vision Output
Return JSON to the Browser
Run the App
Conclusion
Resources

What You Will Build

You'll build Vision Bridge, a web app that sends an image to a local macOS companion. The companion reads the image with Apple Vision, reasons about it with Foundation Models, and returns structured JSON to the browser: private, on-device AI behind a plain web interface.

You can find the complete source code in this GitHub repository: github.com/03balogun/vision-bridge.

The goal isn't to build a giant product but rather to understand the architecture behind how this works.

Vision Bridge has two parts:

A React app with a split-screen interface.
A macOS companion app that exposes a local API.

The React app has:

An image upload area
An image preview
Automatic analysis after upload
A JSON output viewer
A companion health status indicator

The macOS companion app has:

GET /v1/health
POST /v1/analyze-image
Apple Vision OCR
Foundation Models availability checks
Foundation Models reasoning over Vision output

The final response looks like this:

{
  "support": {
    "visionAvailable": true,
    "foundationModelAvailable": true,
    "foundationModelStatus": "available"
  },
  "image": {
    "filename": "screenshot.png",
    "contentType": "image/png",
    "byteCount": 1048576,
    "width": 1440,
    "height": 900
  },
  "vision": {
    "detectedText": [
      {
        "text": "Build failed",
        "confidence": 0.96,
        "boundingBox": {
          "x": 0.12,
          "y": 0.31,
          "width": 0.45,
          "height": 0.08
        }
      }
    ]
  },
  "model": {
    "summary": "The image appears to show a software build failure.",
    "description": "A developer tool window is showing an error state with diagnostic text.",
    "suggestedTags": ["screenshot", "developer-tool", "error"],
    "possibleUses": [
      "Generate alt text",
      "Summarize screenshots",
      "Extract document data"
    ]
  }
}

Prerequisites

To follow along, you need:

macOS 26 or newer
Xcode with the macOS 26 SDK
Node.js 20 or newer
Basic React knowledge
Basic Swift knowledge
A Mac that supports Apple Intelligence

Foundation Models availability depends on the Mac, the OS version, and Apple Intelligence settings. The companion checks this at runtime, which we'll cover below.

Why a macOS Companion App?

You can't write this in a regular React app:

import FoundationModels from "apple-frameworks";

That API doesn't exist in the browser. A native macOS app, however, can use any Apple framework, so the companion acts as a local bridge. The same pattern works for any native capability the web platform doesn't expose.

Foundation Models Can't Read Images Directly

The public Foundation Models framework is a language model interface. It doesn't currently expose direct image input the way a multimodal cloud model might, so this tutorial never sends the image to the model. Instead, the companion feeds the Vision OCR observations and image metadata into the prompt. The model reasons over structured text, never the original pixels.

That split plays to each framework's strength: Vision is excellent at pulling machine-readable information out of images, and Foundation Models turns that information into summaries, labels, explanations, and structured output.

The above diagram shows the round trip that the rest of this tutorial builds. The browser sends the uploaded image as base64 JSON over localhost to the Swift companion. Inside the companion, Apple Vision runs OCR on the image and produces text observations: the recognized strings, their confidence scores, and their bounding boxes.

Those observations, not the image itself, are formatted into a prompt for Foundation Models, which generates a summary, description, and tags. The companion then bundles the Vision output and the model output into one JSON response and returns it to the browser.

Project Structure

Create a project with this structure:

vision-bridge/
  apps/
    web/
      src/
        main.tsx
        styles.css
      package.json
      vite.config.ts
    macos-companion/
      Package.swift
      Sources/
        VisionBridgeCompanion/
          main.swift
  package.json
  README.md

The root package.json gives us a few convenient commands:

{
  "scripts": {
    "dev": "npm --workspace apps/web run dev",
    "build": "npm --workspace apps/web run build",
    "companion": "swift run --package-path apps/macos-companion VisionBridgeCompanion"
  },
  "workspaces": ["apps/web"]
}

Build the React App

The web app is intentionally simple. It has one job: let the user pick an image and show the JSON returned by the companion.

The web app uses Vite, React, Lucide icons, and a JSON viewer:

{
  "dependencies": {
    "@vitejs/plugin-react": "^6.0.3",
    "lucide-react": "^0.468.0",
    "react": "^18.3.1",
    "react-dom": "^18.3.1",
    "react-json-view-lite": "^2.5.0",
    "vite": "^8.1.3"
  }
}

After defining the dependencies, install them:

npm install

The API base URL points to the local companion:

const API_BASE_URL = "http://127.0.0.1:43119";

Check Companion Health

The web app pings the companion so the UI can show whether the native bridge is online:

async function checkHealth() {
  setHealthError(null);

  try {
    const response = await fetch(`${API_BASE_URL}/v1/health`);
    if (!response.ok) {
      throw new Error(`Health check failed with ${response.status}`);
    }

    const payload = await response.json();
    setHealth(payload);
  } catch (error) {
    setHealth(null);
    setHealthError(error instanceof Error ? error.message : "Companion unavailable");
  }
}

Convert the Image to Base64

When the user selects a file, the app converts it to base64 so it can be sent as JSON:

function readFileAsBase64(file: File) {
  return new Promise((resolve, reject) => {
    const reader = new FileReader();
    reader.onload = () => {
      const result = String(reader.result);
      resolve(result.includes(",") ? result.split(",")[1] : result);
    };
    reader.onerror = () => reject(reader.error);
    reader.readAsDataURL(file);
  });
}

This isn't the only way to upload files. You could also use multipart/form-data, but JSON keeps the demo easy to inspect.

Analyze Immediately After Upload

The app starts analysis as soon as an image is uploaded:

async function handleFile(file: File) {
  if (!file.type.startsWith("image/")) {
    setError("Choose a PNG, JPEG, HEIC, or another browser-readable image.");
    return;
  }

  const base64 = await readFileAsBase64(file);
  const nextImage = {
    file,
    previewUrl: URL.createObjectURL(file),
    base64,
  };

  setSelectedImage(nextImage);
  setAnalysis(null);
  setError(null);
  setCopied(false);

  analyzeImage(nextImage);
}

handleFile does the preparation work for every new image. It rejects anything that isn't a browser-readable image, converts the file to base64, and builds a single object holding everything the rest of the flow needs: the original File (for its name and MIME type), an object URL for the preview, and the base64 payload for the API call.

It then clears out the previous run the old analysis, any error message, and the "copied" indicator so the UI never shows results from the last image next to a new one. Finally, it kicks off analyzeImage(nextImage) immediately.

Note that it passes the fresh object directly instead of relying on the selectedImage state: React state updates don't apply until the next render, so reading the state here would still give you the previous image.

The Analyze button still exists in the UI, but it works as a manual rerun button.

Send the Image to the Companion

Here's the core request:

const analysisRequestId = useRef(0);

async function analyzeImage(image = selectedImage) {
  if (!image) {
    setError("Choose an image first.");
    return;
  }

  const requestId = analysisRequestId.current + 1;
  analysisRequestId.current = requestId;

  setRequestState("loading");
  setError(null);
  setCopied(false);

  try {
    const response = await fetch(`${API_BASE_URL}/v1/analyze-image`, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        filename: image.file.name,
        mimeType: image.file.type || "application/octet-stream",
        base64: image.base64,
      }),
    });

    const payload = await response.json();

    if (requestId !== analysisRequestId.current) {
      return;
    }

    if (!response.ok) {
      throw new Error(payload.error?.message ?? `Analysis failed with ${response.status}`);
    }

    setAnalysis(payload);
    setRequestState("success");
  } catch (error) {
    if (requestId !== analysisRequestId.current) {
      return;
    }

    setRequestState("error");
    setError(error instanceof Error ? error.message : "Could not analyze image");
  }
}

This function is the entire client side of the bridge. It flips requestState to loading (which drives the spinner and disables the button), then sends a POST to /v1/analyze-image with a JSON body containing three fields: the filename, the MIME type, and the base64 image data. That body maps one-to-one onto the AnalyzeImageRequest struct the Swift companion decodes later.

Notice that the response is parsed as JSON before checking response.ok. That's deliberate: when the companion rejects a request (bad base64, oversized image), it still returns a JSON body with an error.message field, so the UI can show the companion's own explanation instead of a generic status code. On success, the payload goes straight into state, and the JSON viewer re-renders with the result.

The requestId bookkeeping guards against stale responses. If a user uploads a second image while the first is still analyzing, whichever request finishes last would win, and OCR plus model generation takes long enough that responses can genuinely arrive out of order. So every call increments a counter stored in a ref and remembers its own ID.

After the await, it checks whether it's still the newest request; if a newer upload started in the meantime, the older response is silently discarded instead of overwriting the latest image's result. The same check runs in the catch block, so an old failure can't clobber a newer success either. If you also want to cancel the in-flight HTTP request rather than just ignore its result, an AbortController is the natural next step.

Render the JSON Output

The output pane uses react-json-view-lite:

Build the macOS Companion App

The companion is a Swift command-line app. It exposes a small local HTTP API.

If you come from the web side, the mapping is simple: Swift Package Manager is Swift's npm, Package.swift is its package.json, and swift run is its npm start. It ships with Xcode, so there's nothing extra to install.

The Package.swift file looks like this:

// swift-tools-version: 6.0

import PackageDescription

let package = Package(
    name: "VisionBridgeCompanion",
    platforms: [
        .macOS("26.0")
    ],
    products: [
        .executable(
            name: "VisionBridgeCompanion",
            targets: ["VisionBridgeCompanion"]
        )
    ],
    targets: [
        .executableTarget(
            name: "VisionBridgeCompanion"
        )
    ]
)

The companion imports the Apple frameworks it needs:

import Foundation
import FoundationModels
import ImageIO
import Network
import Vision

It listens on 127.0.0.1:43119:

private let defaultPort: UInt16 = 43119

The app exposes two routes:

switch (request.method, request.path) {
case ("GET", "/v1/health"):
    let health = HealthResponse(support: ModelSupport.current)
    return try json(health)

case ("POST", "/v1/analyze-image"):
    let payload = try JSONDecoder().decode(AnalyzeImageRequest.self, from: request.body)
    let response = try await service.analyze(payload)
    return try json(response)

default:
    return try json(
        ErrorResponse(error: APIErrorPayload(message: "Route not found")),
        status: .notFound
    )
}

This switch is the companion's entire routing layer — no web framework, just pattern matching on the method and path.

The two routes split the work cleanly:

GET /v1/health is the cheap, read-only route. It runs no analysis, it just reports whether Vision and Foundation Models are usable on this Mac via ModelSupport.current (covered in the next section). The React app calls it on load to render the online/offline status pill, so the user knows the bridge is up before they upload anything.
POST /v1/analyze-image is where the real work happens. It decodes the request body into an AnalyzeImageRequest (with the same filename, mimeType, and base64 fields the browser sent) and hands it to the analysis service. This validates the image, runs Vision OCR, prompts Foundation Models, and returns the combined result. The try await matters here: analysis is asynchronous, and the route simply waits for it before serializing the response.

Anything else falls through to a JSON 404, so even unknown routes respond in the same format the browser already knows how to parse.

Errors work the same way: thrown errors are caught in one place and converted into JSON error responses with an appropriate status code, which is exactly what the web app's payload.error?.message check reads.

One practical detail: because the browser calls the companion from a different origin (the Vite dev server), every response also carries CORS headers, and the router answers preflight OPTIONS requests with an empty 204. Without that, the browser would block the fetch before it ever reached these routes.

Check Foundation Models Availability

The companion shouldn't assume that the model is available. Check it first:

private struct ModelSupport: Encodable {
    let visionAvailable: Bool
    let foundationModelAvailable: Bool
    let foundationModelStatus: String

    static var current: ModelSupport {
        let model = SystemLanguageModel.default

        switch model.availability {
        case .available:
            return ModelSupport(
                visionAvailable: true,
                foundationModelAvailable: true,
                foundationModelStatus: "available"
            )

        case .unavailable(let reason):
            return ModelSupport(
                visionAvailable: true,
                foundationModelAvailable: false,
                foundationModelStatus: "unavailable.\(reason.description)"
            )

        @unknown default:
            return ModelSupport(
                visionAvailable: true,
                foundationModelAvailable: false,
                foundationModelStatus: "unavailable.unknown"
            )
        }
    }
}

A user might have an unsupported Mac, Apple Intelligence might be disabled, or the model might not be ready yet. The response tells the browser which case it's dealing with.

Extract Text with Apple Vision

The companion decodes the base64 image, checks its metadata, then runs Vision OCR.

Here's the text recognition flow:

private func recognizeText(in imageData: Data) async throws -> [DetectedText] {
    var request = RecognizeTextRequest()
    request.recognitionLevel = .accurate
    request.automaticallyDetectsLanguage = true
    request.usesLanguageCorrection = true

    let observations = try await request.perform(on: imageData)

    var detectedText: [DetectedText] = []

    for observation in observations {
        guard let candidate = observation.topCandidates(1).first else {
            continue
        }

        let bounds = NormalizedBox.from(points: [
            observation.topLeft,
            observation.topRight,
            observation.bottomRight,
            observation.bottomLeft
        ])

        detectedText.append(DetectedText(
            text: candidate.string,
            confidence: Double(candidate.confidence),
            boundingBox: bounds
        ))
    }

    return detectedText
}

Vision gives us structured observations:

recognized text
confidence scores
normalized bounding boxes

Those observations become the model’s context.

Ask Foundation Models to Explain the Vision Output

Now the companion creates a prompt from the image metadata and OCR results.

Notice the instruction:

You cannot see the original image. Use only the metadata and OCR observations below.

That keeps the model honest. It shouldn't pretend to see pixels it never received.

Here's the prompt shape:

let textPreview = detectedText
    .prefix(30)
    .map { "- \($0.text) (confidence: \(String(format: "%.2f", $0.confidence)))" }
    .joined(separator: "\n")

let prompt = """
You are summarizing Apple Vision OCR output for a developer tool named Vision Bridge.
You cannot see the original image. Use only the metadata and OCR observations below.

Image:
- filename: \(image.filename)
- content type: \(image.contentType)
- size: \(image.width ?? 0)x\(image.height ?? 0)

OCR observations:
\(textPreview.isEmpty ? "- No text detected." : textPreview)

Return a compact JSON object with these exact keys:
summary: one sentence
description: one short paragraph
suggestedTags: 3 to 6 short tags
possibleUses: 3 to 5 practical use cases for this kind of image analysis
"""

Then call the model:

let session = LanguageModelSession(
    model: .default,
    instructions: "Return valid JSON only. Do not include Markdown fences."
)

let response = try await session.respond(to: prompt)
let raw = response.content.trimmingCharacters(in: .whitespacesAndNewlines)

Even when you ask for JSON, always validate the output. Models can still return Markdown fences or malformed text. The sample app strips simple Markdown code fences and falls back to a raw response if parsing fails.

Return JSON to the Browser

The companion combines the support state, image metadata, Vision results, and model output:

return AnalyzeImageResponse(
    support: support,
    image: metadata,
    vision: VisionPayload(detectedText: detectedText),
    model: modelInsight
)

The browser doesn't need to know how Vision or Foundation Models work. It just receives JSON. The native app owns the native capabilities, while the web app owns the interface.

It's worth pausing on what each of the four blocks actually gives you, because they're not all the same kind of data:

support tells you what was possible on this Mac. If foundationModelAvailable is false, the model block still exists but contains a fallback message rather than real analysis, and the foundationModelStatus string (for example, unavailable.appleIntelligenceNotEnabled) tells the UI why, so it can explain rather than silently degrade.
image echoes back the file's metadata plus the measured pixel dimensions. It's useful as a sanity check, and you need the width and height to do anything spatial with the Vision results.
vision is the ground truth. Each entry in detectedText is a string Vision actually found, with a confidence score between 0 and 1 and a normalized bounding box: coordinates expressed as fractions of the image size, so x: 0.12, width: 0.45 means "starts 12% from the left and spans 45% of the width." Because the boxes are normalized, you can draw highlight overlays on the preview at any display size by multiplying by the rendered dimensions. Low-confidence entries are worth filtering or flagging before you trust them.
model is interpretation, not observation. The summary, description, suggestedTags, and possibleUses fields are generated by the language model from the OCR text. This is useful as alt text, captions, or tag suggestions, but they inherit whatever the OCR missed and should be treated as a draft, not a fact. When the model's output can't be parsed as JSON, rawResponse carries the unparsed text so nothing is lost.

For a screenshot of a failed build, the model block might come back like this:

{
  "model": {
    "summary": "The image appears to show a software build failure.",
    "description": "A developer tool window is showing an error state with diagnostic text.",
    "suggestedTags": ["screenshot", "developer-tool", "error"],
    "possibleUses": [
      "Generate alt text",
      "Summarize screenshots",
      "Extract document data"
    ]
  }
}

That combination (exact text with positions from Vision, plus a human-readable interpretation from the model) is enough to build real features on top of a searchable screenshot library indexed by detectedText and suggestedTags, automatic alt text for uploaded images, or click-to-highlight overlays powered by the bounding boxes.

And because the prompt lives in the companion, changing what comes back (say, extracting line items from receipts instead of tagging screenshots) is a prompt edit, not an architecture change.

Run the App

Start the companion:

npm run companion

In another terminal, start the web app:

npm run dev

Open the Vite URL:

http://127.0.0.1:5173

If that port is busy, Vite will choose another one.

The companion should be available at:

http://127.0.0.1:43119

You can test it directly:

curl http://127.0.0.1:43119/v1/health

Expected response:

{
  "app": "Vision Bridge Companion",
  "ok": true,
  "support": {
    "foundationModelAvailable": true,
    "foundationModelStatus": "available",
    "visionAvailable": true
  },
  "version": "0.1.0"
}

Conclusion

You now have a React interface that uploads an image, a Swift companion that analyzes it with Apple-native frameworks, and structured JSON flowing between them.

Vision Bridge is intentionally small, but the bridge itself is reusable. Once you have a trusted native companion, a web app can do more than send prompts to a remote model: it can ask the Mac to work with local context, use any Apple framework, and return structured data the browser can render, store, or sync.

Resources

How to Optimize Enterprise Application Performance with T-SQL Query Tuning and Indexing Strategies

Gopinath Karunanithi — Mon, 20 Jul 2026 20:49:29 +0000

In this article, you'll learn how to optimize SQL Server performance using T-SQL query tuning, indexing strategies, execution plans, and real-world optimization techniques for enterprise applications.

Slow SQL queries are one of the biggest bottlenecks in enterprise applications. This guide demonstrates how to analyze execution plans, design effective indexes, rewrite inefficient T-SQL queries, optimize joins and aggregations, and monitor performance using SQL Server tools.

By working through several practical examples, you'll learn how to build faster, scalable, and more maintainable SQL Server workloads.

Introduction
Prerequisites
Why Query Performance Matters in Enterprise Applications
How SQL Server Executes Queries
Understanding Execution Plans
Common Execution Plan Operators
Finding Slow Queries
Writing Efficient WHERE Clauses
Optimizing JOIN Operations
Optimizing Aggregations
Common Table Expressions vs. Temporary Tables
Avoiding Common T-SQL Performance Anti-Patterns
Measuring Before and After Optimization
Monitoring Query Performance
Real-World Example: Optimizing a Reporting Query
When NOT to Optimize Prematurely
Best Practices for Enterprise T-SQL Optimization
Future Trends in SQL Performance Optimization
Conclusion

Introduction

Enterprise application performance often depends more on the database than the application itself. Whether you're building with ASP.NET Core, Java Spring Boot, or Node.js, inefficient database queries can lead to slow API responses, page load delays, timeout errors, and increased infrastructure costs.

While adding CPU, memory, or database replicas may temporarily improve performance, the root cause is often inefficient T-SQL queries, poorly designed indexes, outdated statistics, or suboptimal execution plans. Since the same queries may execute thousands of times per minute, even small optimizations can significantly reduce latency and resource consumption.

In enterprise environments, where databases often contain millions of records and support highly concurrent workloads, query tuning becomes essential for maintaining scalability and responsiveness.

In this article, you'll learn how SQL Server executes queries, how to analyze execution plans, optimize T-SQL, design effective indexing strategies, and apply practical techniques to improve database performance in real-world applications.

Prerequisites

To get the most from this tutorial, you should be familiar with:

Basic SQL and T-SQL syntax
Microsoft SQL Server fundamentals
Primary keys and foreign keys
Basic understanding of indexes
SQL Server Management Studio (SSMS) or Azure Data Studio
Basic knowledge of relational database concepts

Why Query Performance Matters in Enterprise Applications

Database performance directly affects every layer of an enterprise application. Even if the frontend is highly optimized and the application servers are properly scaled, slow database operations quickly become the limiting factor.

Consider a typical enterprise architecture:

Figure 1. High-Level Architecture Showing Client, ASP.NET Core API, Business Services, and SQL Server Database

Figure 1 illustrates the high-level architecture of a typical ASP.NET Core application. Client requests are received by the ASP.NET Core API, which serves as the application's entry point. The API forwards these requests to the Business Services layer, where the core business logic is executed. The Business Services layer then interacts with the SQL Server Database to retrieve or persist data. The sequential flow of requests through these components is represented by the arrows in the diagram.

Although this architecture separates responsibilities and improves maintainability, its overall performance is often constrained by the database. Every request that requires data eventually reaches the SQL Server database. If the database responds slowly, every upstream component (including the Business Services layer, the API, and ultimately the client) must wait for the query to complete.

Consider an order management system in which a dashboard displays customer information, recent orders, invoices, inventory levels, and shipment status. Loading this dashboard may require several independent database queries. While these queries may execute concurrently, the user perceives the combined response time. Consequently, even a small number of poorly optimized queries can significantly increase page load times and degrade the overall user experience.

As database size and application usage grow, performance issues often become increasingly apparent.

Common symptoms include:

APIs that gradually become slower as data grows
High CPU utilization on the SQL Server
Excessive disk I/O
Blocking between concurrent transactions
Deadlocks during peak usage
Timeout exceptions in application logs

Many of these problems originate from inefficient SQL rather than insufficient hardware.

For example, suppose a customer table contains ten million records. Searching for customers by email without an appropriate index forces SQL Server to examine every row.

SELECT *
FROM Customers
WHERE Email = 'john@example.com';

Without an index on the Email column, SQL Server performs a table scan, reading every page before locating the desired row.

Adding a properly designed index transforms the same query into an index seek, allowing SQL Server to locate the record almost immediately.

As enterprise datasets continue growing, these differences become increasingly significant.

How SQL Server Executes Queries

Understanding SQL Server's execution process is essential before attempting optimization.

Every query passes through several stages before data is returned.

Step 1: Parsing

SQL Server first validates the syntax.

SELECT Name
FROM Customers;

If the statement contains syntax errors, execution stops immediately.

Step 2: Binding

Next, SQL Server verifies that referenced tables, columns, functions, and objects exist.

For example,

SELECT CustomerName
FROM Customers;

If CustomerName doesn't exist, SQL Server reports an error before optimization begins.

Step 3: Query Optimization

The SQL Server Query Optimizer evaluates multiple possible execution strategies.

It estimates the cost of various approaches, including table scans, index seeks, different join algorithms, parallel execution, and sorting methods.

The optimizer chooses the plan with the lowest estimated cost based on available statistics.

Importantly, developers don't tell SQL Server how to execute a query. They specify what data they need.

Step 4: Execution Plan Generation

The optimizer then generates an execution plan.

The execution plan acts as a blueprint describing every operation required to satisfy the query.

For example:

SELECT *
FROM Orders
WHERE CustomerID = 1250;

Depending on available indexes, SQL Server may choose either clustered Index Seek, Nonclustered Index Seek, Index Scan, or Table Scan. Understanding these operators is the foundation of effective tuning.

Understanding Execution Plans

Execution plans reveal how SQL Server actually processes a query. Rather than guessing why a query performs poorly, execution plans identify the most expensive operations directly.

SQL Server provides two primary plan types:

Estimated Execution Plan: Generated without executing the query. It predicts the optimizer's chosen strategy using available statistics.
Actual Execution Plan: Generated after the query runs, showing the real execution path along with runtime statistics such as row counts and operator costs.

For performance tuning, the actual execution plan is generally more valuable because it exposes differences between estimated and actual behavior.

In SQL Server Management Studio, you can enable the actual execution plan by selecting Include Actual Execution Plan before running your query.

Common Execution Plan Operators

Understanding a handful of common operators makes execution plans much easier to interpret.

Table Scan

A table scan reads every row in a table.

Customers ──► Table Scan

This is acceptable for small lookup tables but becomes increasingly expensive as tables grow.

Index Scan

An index scan reads every entry within an index.

Although better than scanning the full table, it still processes every index page.

Index Seek

An index seek navigates directly to matching rows.

CustomerID Index
│
▼
Index Seek

This is generally the most efficient access method for selective queries.

Nested Loop Join

Nested Loop joins perform well when one input contains relatively few rows.

Customers
│
▼
Nested Loop
▲

│
Orders

They're commonly used for OLTP workloads.

Hash Match

Hash joins excel when processing large datasets with no useful indexes. But keep in mind that they consume more memory and may spill to disk if insufficient memory is available.

Merge Join

Merge joins require sorted inputs but can process large result sets efficiently. They're often selected when both datasets are already indexed appropriately.

Key Lookup

One operator that frequently surprises developers is the Key Lookup.

Suppose an index contains only the CustomerID column, but the query also requests Address and PhoneNumber.

SQL Server first performs an Index Seek to locate matching rows, then executes additional lookups against the clustered index to retrieve missing columns.

Although acceptable for a few rows, thousands of key lookups can significantly degrade performance.

In many cases, creating a covering index eliminates these extra lookups entirely. This is a topic we'll explore later in the article.

Finding Slow Queries

SQL Server provides several built-in tools that help locate performance bottlenecks in production environments.

Using Query Store

Query Store records query history, execution plans, runtime statistics, and performance trends over time. Rather than relying on temporary monitoring sessions, it continuously captures valuable performance information, making it one of the most useful features for enterprise SQL Server deployments.

For example, if an application suddenly becomes slower after a deployment, Query Store can compare execution plans before and after the change to determine whether the optimizer selected a less efficient plan.

Typical metrics available include:

Average execution time
CPU consumption
Logical reads
Execution count
Query plan history

This historical view helps identify regressions that may otherwise be difficult to reproduce.

Using Dynamic Management Views (DMVs)

Dynamic Management Views expose internal SQL Server performance information while the server is running.

One commonly used DMV is:

SELECT TOP 10
    qs.execution_count,
    qs.total_worker_time,
    qs.total_elapsed_time,
    SUBSTRING(
        qt.text,
        qs.statement_start_offset / 2,
        (
            CASE
                WHEN qs.statement_end_offset = -1
                THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
                ELSE qs.statement_end_offset
            END - qs.statement_start_offset
        ) / 2
    ) AS QueryText
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
ORDER BY qs.total_worker_time DESC;

This query identifies statements consuming the most CPU time, helping prioritize optimization efforts.

Measuring I/O and Execution Time

SQL Server also provides lightweight commands for measuring query performance.

SET STATISTICS IO ON;
SET STATISTICS TIME ON;

After enabling these options, executing a query displays additional information such as:

Logical reads
Physical reads
CPU time
Total elapsed time

Consider the following query:

SELECT *
FROM Orders
WHERE CustomerID = 1025;

The output might resemble:

Table 'Orders'.

Logical reads: 4832

SQL Server Execution Times:
CPU time = 215 ms

Elapsed time = 287 ms

After adding an appropriate index, the same query could produce:

Logical reads: 6

CPU time = 3 ms

Elapsed time = 5 ms

These measurements provide objective evidence that an optimization has improved performance.

Writing Efficient WHERE Clauses

One of the simplest ways to improve query performance is to write SARGable predicates. A query is considered SARGable (Search ARGument Able) when SQL Server can efficiently use an index to locate matching rows.

Many developers unintentionally prevent index usage by applying functions directly to indexed columns.

Consider this example:

SELECT *
FROM Orders
WHERE YEAR(OrderDate) = 2025;

Although the logic is correct, SQL Server must evaluate the YEAR() function for every row before performing the comparison. As a result, it can't efficiently seek into an index on OrderDate.

A better approach is to compare the column directly.

SELECT *
FROM Orders
WHERE OrderDate >= '2025-01-01'
AND OrderDate < '2026-01-01';

This version allows SQL Server to perform an index seek rather than scanning the entire table.

Similarly, avoid implicit data type conversions.

Instead of:

WHERE CustomerID = '100'

prefer:

WHERE CustomerID = 100

Matching the column's data type eliminates unnecessary conversions during query execution.

Optimizing JOIN Operations

Enterprise applications rarely query a single table. Most business operations involve combining data from multiple related tables, making joins one of the most important optimization areas.

Consider an order management system:

SELECT
    c.Name,
    o.OrderDate,
    o.TotalAmount
FROM Customers c
INNER JOIN Orders o
    ON c.CustomerID = o.CustomerID;

When both CustomerID columns are indexed, SQL Server can efficiently join the tables.

But poor indexing often forces SQL Server to scan one or both tables, dramatically increasing execution time.

`EXISTS` vs. `IN`

Another common optimization involves replacing IN with EXISTS for large subqueries.

Less efficient:

SELECT *
FROM Customers
WHERE CustomerID IN (
    SELECT CustomerID
    FROM Orders
);

Better:

SELECT *
FROM Customers c
WHERE EXISTS (
    SELECT 1
    FROM Orders o
    WHERE o.CustomerID = c.CustomerID
);

For correlated lookups involving large datasets, EXISTS often enables more efficient execution plans.

Eliminate Unnecessary Joins

Sometimes queries include tables whose data is never used.

For example:

SELECT
    o.OrderID,
    c.Name
FROM Orders o
INNER JOIN Customers c
    ON o.CustomerID = c.CustomerID
INNER JOIN Regions r
    ON c.RegionID = r.RegionID;

If no columns from Regions are selected or filtered, removing the join reduces unnecessary work and simplifies the execution plan.

Optimizing Aggregations

Aggregations become increasingly expensive as datasets grow. Reporting systems frequently summarize millions of rows using functions such as SUM(), COUNT(), AVG(), and MAX().

A straightforward aggregation might look like this:

SELECT
    CustomerID,
    SUM(TotalAmount)
FROM Orders
GROUP BY CustomerID;

Although simple, performance depends heavily on indexing and data distribution.

If the query repeatedly scans millions of rows, consider creating an index on CustomerID.

Window functions often provide a cleaner alternative to complex subqueries.

For example, identifying each customer's most recent order:

SELECT
    CustomerID,
    OrderDate,
    ROW_NUMBER() OVER (
        PARTITION BY CustomerID
        ORDER BY OrderDate DESC
    ) AS RowNum
FROM Orders;

Window functions allow SQL Server to calculate rankings and running totals without complicated self-joins.

Whenever possible, avoid unnecessary sorting operations, since sorting large result sets consumes considerable CPU and memory.

Common Table Expressions vs. Temporary Tables

Both Common Table Expressions (CTEs) and temporary tables help simplify complex queries, but they serve different purposes.

A CTE provides a readable way to structure intermediate query logic.

WITH RecentOrders AS
(
    SELECT *
    FROM Orders
    WHERE OrderDate >= DATEADD(DAY, -30, GETDATE())
)
SELECT *
FROM RecentOrders;

CTEs improve readability and maintainability but aren't materialized automatically. SQL Server may execute the underlying logic multiple times depending on the execution plan.

Temporary tables, on the other hand, physically store intermediate results.

SELECT *
INTO #RecentOrders
FROM Orders
WHERE OrderDate >= DATEADD(DAY, -30, GETDATE());

SELECT *
FROM #RecentOrders;

Temporary tables become particularly useful when:

Intermediate results are reused multiple times
Large datasets need additional indexing
Complex joins benefit from breaking queries into stages

Choosing between the two depends on workload characteristics rather than personal preference.

Avoiding Common T-SQL Performance Anti-Patterns

Many performance issues stem from common coding habits rather than complex database problems.

Avoid `SELECT *`

Fetching every column increases network traffic, memory consumption, and I/O.

Instead of this:

SELECT *
FROM Customers;

Retrieve only the required columns:

SELECT
    CustomerID,
    Name,
    Email
FROM Customers;

This reduces both data transfer and execution costs.

Avoid Scalar Functions in `WHERE` Clauses

Scalar functions execute once per row, preventing efficient index usage.

Instead of:

WHERE UPPER(Name) = 'JOHN'

store normalized values or use case-insensitive collations where appropriate.

Avoid Cursors for Row-by-Row Processing

Cursors process records sequentially.

DECLARE CustomerCursor CURSOR
FOR
SELECT CustomerID
FROM Customers;

Although sometimes necessary, cursor-based solutions rarely scale well for enterprise workloads.

Most cursor logic can be rewritten using set-based operations.

For example:

Instead of updating rows individually:

UPDATE Customers
SET Status = 'Active'
WHERE LastLogin >= DATEADD(DAY, -30, GETDATE());

SQL Server processes the entire set efficiently rather than iterating row by row.

Reduce Correlated Subqueries

Correlated subqueries execute repeatedly for each outer row.

For example:

SELECT
    CustomerID,
    (
        SELECT COUNT(*)
        FROM Orders o
        WHERE o.CustomerID = c.CustomerID
    ) AS OrderCount
FROM Customers c;

Rewriting this using joins and aggregation often produces more efficient execution plans.

SELECT
    c.CustomerID,
    COUNT(o.OrderID) AS OrderCount
FROM Customers c
LEFT JOIN Orders o
    ON c.CustomerID = o.CustomerID
GROUP BY c.CustomerID;

The rewritten version allows SQL Server to process the data in a single pass rather than executing thousands of nested queries.

Measuring Before and After Optimization

Effective tuning always follows the same cycle:

Measure the original query using Query Store or SET STATISTICS.
Analyze the execution plan.
Identify expensive operators such as scans, sorts, or key lookups.
Apply one targeted optimization, such as rewriting the query or adding an index.
Measure again using the same workload.

This iterative approach ensures that every optimization is evidence-based rather than relying on assumptions. In enterprise environments, even small improvements to frequently executed queries can significantly reduce CPU usage, disk I/O, and response times.

Monitoring Query Performance

Query tuning is an ongoing process rather than a one-time optimization effort. As enterprise databases grow, data distributions change, index fragment, and application workloads evolve. Queries that once performed well may gradually become inefficient.

SQL Server provides several built-in tools for identifying performance issues.

Query Store

Query Store records query history, execution statistics, execution plans, and runtime information.

It helps answer questions such as:

Which queries consume the most CPU?
Which execution plans changed recently?
Which query became slower after deployment?
Which indexes are no longer being used?

Enable Query Store:

ALTER DATABASE SalesDB
SET QUERY_STORE = ON;

View top resource-consuming queries:

SELECT
    qt.query_sql_text,
    rs.avg_duration,
    rs.avg_cpu_time
FROM sys.query_store_query_text qt
JOIN sys.query_store_query q
    ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan p
    ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats rs
    ON p.plan_id = rs.plan_id
ORDER BY rs.avg_duration DESC;

Instead of relying on user complaints, administrators can proactively detect regressions before they affect production workloads.

Dynamic Management Views (DMVs)

SQL Server exposes runtime statistics through Dynamic Management Views.

Example:

SELECT TOP 10
    qs.execution_count,
    qs.total_elapsed_time / qs.execution_count AS AvgTime,
    st.text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
ORDER BY AvgTime DESC;

This query highlights expensive SQL statements currently cached by SQL Server.

Actual Execution Plans

Execution plans remain one of the most valuable tuning tools.

When reviewing plans, look for:

Table scans
Index scans on large tables
Key lookups
Sort operators
Hash Match operations
Missing index recommendations
Large memory grants

The graphical execution plan often pinpoints the exact operator responsible for poor performance.

Real-World Example: Optimizing a Reporting Query

Consider an enterprise reporting system that generates monthly sales summaries.

Original query:

SELECT
    CustomerName,
    SUM(TotalAmount)
FROM Orders
WHERE YEAR(OrderDate) = 2025
GROUP BY CustomerName;

Although simple, this query performs poorly because YEAR() prevents index seeks and the entire Orders table must be scanned.

Rewrite it like this:

SELECT
    CustomerName,
    SUM(TotalAmount)
FROM Orders
WHERE OrderDate >= '2025-01-01'
AND OrderDate < '2026-01-01'
GROUP BY CustomerName;

Then create an appropriate index:

CREATE INDEX IX_Orders_OrderDate
ON Orders(OrderDate)
INCLUDE (CustomerName, TotalAmount);

Performance improvements may include:

Index Seek instead of Table Scan
Lower logical reads
Reduced CPU utilization
Faster execution time
Better scalability under concurrent reporting workloads

This illustrates that query rewriting and indexing typically produce much larger gains than simply adding hardware.

When NOT to Optimize Prematurely

Performance optimization should be driven by evidence, not assumptions. Premature or unnecessary tuning can increase complexity, make queries harder to maintain, and sometimes even reduce overall system performance.

Before making changes, use tools such as Query Store, execution plans, and SQL Server DMVs to identify the actual bottlenecks.

Avoid Optimizing Without Profiling

Don't rewrite queries simply because they look inefficient. Measure execution time, logical reads, CPU usage, and execution plans first so that optimization efforts target real performance problems rather than perceived ones.

Don't Create Indexes for Every Query

While indexes can dramatically improve read performance, every additional index increases storage requirements and slows INSERT, UPDATE, and DELETE operations. Create indexes only for frequently executed queries that demonstrate a measurable benefit.

Don't Force Query Hints Unnecessarily

Query hints such as OPTION (FORCE ORDER) or OPTION (RECOMPILE) can override SQL Server's optimizer. They should be used only after careful testing, as they may solve one problem while causing performance regressions elsewhere.

SELECT *
FROM Orders
WHERE CustomerID = @CustomerID
OPTION (RECOMPILE);

Don't Over-Normalize or De-Normalize Without Evidence

Highly normalized schemas may require expensive joins, while excessive denormalization can introduce redundant data and update anomalies. Choose the appropriate design based on actual workload characteristics rather than assumptions.

Balance Read and Write Performance

An optimization that accelerates reporting queries may slow transactional workloads due to additional index maintenance. Always evaluate how tuning changes affect both read-heavy and write-heavy operations before deploying them to production.

Best Practices for Enterprise T-SQL Optimization

Successful tuning is about applying consistent engineering practices rather than isolated optimizations. We've already discussed some of these best practices, but I'll list them all here for review and completeness (and as a quick reference):

Design Indexes Around Queries

Indexes should reflect actual application workloads.

Instead of indexing every column, identify common WHERE clauses, frequently joined columns, ORDER BY columns, and GROUP BY columns.

Build indexes that support these operations efficiently.

Avoid Over-Indexing

More indexes are not always better.

Every INSERT, UPDATE, and DELETE operation must maintain every index.

Too many indexes increase storage, write latency, maintenance time, and fragmentation. Keep only indexes that provide measurable value.

Keep Statistics Updated

Outdated statistics lead to poor execution plans.

Update statistics regularly:

UPDATE STATISTICS Orders;

Or update the entire database:

EXEC sp_updatestats;

Many performance issues disappear after SQL Server receives accurate distribution statistics.

Monitor Index Fragmentation

Indexes become fragmented as data changes.

Check fragmentation:

SELECT
    avg_fragmentation_in_percent,
    page_count
FROM sys.dm_db_index_physical_stats
(
    DB_ID(),
    OBJECT_ID('Orders'),
    NULL,
    NULL,
    'LIMITED'
);

Avoid SELECT *

Retrieve only required columns.

Instead of:

SELECT *
FROM Customers;

Use this:

SELECT CustomerID,
       CustomerName,
       Email
FROM Customers;

Benefits include smaller network payloads, better covering index usage, lower memory consumption, and reduced I/O.

Test with Production-Like Data

Queries that perform well on development databases containing thousands of rows may behave very differently against production systems with hundreds of millions of records.

Always validate execution plans, memory grants, CPU usage, parallelism, and logical reads using realistic datasets.

Future Trends in SQL Performance Optimization

1. Intelligent Query Processing

Modern versions of SQL Server include features such as adaptive query processing, memory grant feedback, and automatic plan correction. These capabilities allow the query optimizer to adjust execution strategies based on actual workload patterns, improving performance without manual tuning.

2. Cloud-Native Database Optimization

Cloud database platforms provide built-in capabilities such as automatic indexing recommendations, continuous performance monitoring, and self-tuning features. These services reduce administrative overhead while helping maintain consistent query performance as workloads grow.

3. AI-Assisted Performance Tuning

Artificial intelligence is becoming a valuable assistant for database optimization. AI-powered tools can analyze execution plans, recommend indexes, identify inefficient queries, and even suggest T-SQL rewrites, enabling developers to resolve performance issues earlier in the development lifecycle.

4. Performance Engineering by Default

Database optimization is shifting from reactive troubleshooting to proactive performance engineering. By incorporating query analysis, indexing reviews, and performance testing into CI/CD pipelines, teams can detect regressions before they reach production.

Strong Fundamentals Still Matter

Despite advances in automation, understanding execution plans, indexing strategies, query design, and statistics remains essential. Automated tools provide recommendations, but experienced developers and DBAs are still needed to validate trade-offs and ensure optimizations align with business requirements.

Conclusion

Effective T-SQL performance optimization isn't about applying isolated tricks or adding indexes indiscriminately. It requires understanding how SQL Server executes queries, accesses data, and chooses execution plans.

By combining efficient query design with well-planned indexing strategies, accurate statistics, and continuous monitoring, you can dramatically reduce latency, lower resource consumption, and improve scalability across enterprise applications.

Rather than waiting for performance problems to appear in production, teams should make query tuning a routine part of the development lifecycle. Regularly reviewing execution plans, monitoring workload patterns through Query Store, validating indexes against real application behavior, and testing with production-scale datasets creates a foundation for predictable and reliable database performance.

As enterprise systems continue to grow in complexity and data volume, organizations that treat performance optimization as an ongoing engineering discipline will be better equipped to deliver responsive, scalable, and cost-effective applications.

How to Build a Browser-Based PDF Signature Tool Using JavaScript

Bhavin Sheth — Mon, 20 Jul 2026 20:47:13 +0000

PDF documents are commonly used for agreements, forms, approvals, invoices, reports, applications, and other documents that may need a signature or additional text before they are shared.

A traditional workflow often involves printing the document, signing it by hand, scanning it again, and sending the new file. For a simple electronic signature, that process adds unnecessary steps.

In this tutorial, you'll build a browser-based PDF Signature Tool using JavaScript. Users will be able to upload a PDF, preview and navigate its pages, and add content directly to the document.

The application will support two main element types: Signature and Text/Stamp.

For signatures, users can draw directly in the browser, type their name and choose a signature style, or upload an existing signature image. For text-based elements, they can enter custom text or use preset stamps such as APPROVED, CONFIDENTIAL, DRAFT, and PAID.

After creating an element, users can position it on the PDF preview and adjust properties such as scale, rotation, opacity, font size, and color. The element can then be applied to the current page, every page, or a specific set of pages.

Once processing is complete, the application generates a new PDF for review. Users can preview the result, rename the output file, check its page count and file size, and download it directly from the browser.

The project uses PDF.js for document rendering and PDF-lib for modifying and generating the final PDF.

By the end of this tutorial, you'll understand how to build an interactive PDF editing workflow that combines canvas-based input, image embedding, text placement, coordinate conversion, page selection, and client-side file generation.

What We'll Cover:

What This PDF Signature Tool Can Do
Electronic Signatures vs Digital Signatures
How the Browser-Based Workflow Works
Project Setup
What Libraries Are We Using?
Uploading and Previewing the PDF
Choosing an Element to Add
Creating a Signature
Drawing a Signature
Typing a Signature
Uploading a Signature Image
Adding Text and Preset Stamps
Positioning and Styling the Element
Applying the Element to Selected Pages
Applying and Finalizing the PDF
Generating the Signed PDF
Previewing the Final PDF
Renaming and Downloading the Final PDF
Demo: How the PDF Signature Tool Works
Handling Signature Transparency
Important Notes and Common Mistakes
Conclusion

What This PDF Signature Tool Can Do

The application provides a single editing workflow for adding signatures, text, and common document stamps to PDF pages.

When Signature is selected, users can create the signature in three different ways.

The Draw option provides a canvas where the user can write a signature using a mouse, trackpad, stylus, or touch input.
The Type option converts entered text into a signature-style element. Users can type their name, adjust the size, and choose from the available signature styles.
The Upload option accepts an existing signature image. This is useful for someone who already has a transparent PNG or another supported image of their handwritten signature.

The second element type is Text/Stamp. Users can enter custom text such as:

Signed on: 08-09-2025

They can also quickly choose a predefined stamp:

APPROVED
CONFIDENTIAL
DRAFT
PAID

After an element has been created, the application provides controls for its placement and appearance. Users can move it to the required location and adjust its scale, rotation, opacity, and position.

Text and stamp elements can additionally use configurable font sizes and colors.

The page controls determine where the selected element will be applied. A signature may belong only on the final page of a contract, while a CONFIDENTIAL stamp may need to appear on every page.

The application therefore supports:

Current page only
All pages
Specific pages

The goal is to provide one consistent workflow for several common PDF editing tasks without requiring separate tools for each element type.

Electronic Signatures vs Digital Signatures

Before building the application, it's important to distinguish between an electronic signature and a digital signature.

The tool in this tutorial creates an electronic signature workflow.

A drawn signature, typed signature, or uploaded signature image is placed visually onto the PDF page. This is similar to signing a document by hand and inserting a visible representation of that signature into the file.

For example, a user might draw a signature on a canvas:

const signatureImage =
    signatureCanvas.toDataURL("image/png");

The generated image can then be embedded into the PDF.

A digital signature is technically different.

Certificate-based digital signatures use cryptographic methods to help verify document integrity and the identity associated with a signing certificate. They may involve digital certificates, private keys, signature validation, and trust chains.

Simply placing a handwritten signature image on a PDF doesn't create that type of cryptographic verification.

This distinction matters because the terms are sometimes used interchangeably in everyday conversation even though the underlying technologies are different.

The project we're building focuses on visual electronic signatures and document elements. It doesn't create certificate-based cryptographic digital signatures.

Keeping that distinction clear makes it easier to understand exactly what the application does and what would require a more advanced signing system.

How the Browser-Based Workflow Works

The process begins when a user selects a PDF file.

PDF.js loads the document and renders the current page into a browser canvas. Previous and next buttons allow the user to navigate through the PDF before choosing where to place an element.

The user then selects one of two element types:

Signature
Text/Stamp

If Signature is selected, the application provides three creation methods:

Draw
Type
Upload

The selected signature is converted into an element that can be displayed over the PDF preview.

If Text/Stamp is selected, the application instead creates a text element using either custom content or one of the predefined stamp values.

The complete workflow looks like this:

Upload PDF
    ↓
Render and Navigate Pages
    ↓
Choose Signature or Text/Stamp
    ↓
Create the Element
    ↓
Position and Style It
    ↓
Choose Target Pages
    ↓
Apply & Finalize
    ↓
Generate the New PDF
    ↓
Preview the Result
    ↓
Rename and Download

During editing, the element displayed over the PDF preview is only a browser-side representation. Its position must later be translated into coordinates that match the actual PDF page.

For example, the application may store an element like this:

const element = {
    type: "signature",
    x: 622,
    y: 496,
    scale: 1.14,
    rotation: 0,
    opacity: 1
};

When the user clicks Apply & Finalize, those values are used to calculate the final placement inside the PDF.

This separation between the interactive preview and the final PDF generation is the foundation of the project. It allows users to visually prepare the document first and create the modified PDF only after the placement is ready.

Project Setup

To keep the project easy to understand, we'll use three main files:

pdf-signature-tool/
│
├── index.html
├── style.css
└── script.js

The HTML file contains the upload interface, PDF preview, editing controls, final preview, and download section.

The CSS file handles the layout and visual states.

The JavaScript file manages PDF loading, page rendering, signature creation, text and stamp elements, positioning, final PDF generation, and downloading.

Start with the basic HTML structure:





    

    

    PDF Signature Tool

    





    

        

            PDF Signature Tool

            
                Upload your PDF to add your
                electronic signature.
            

            

                Drag & Drop PDF Here

                Or click to browse file

                

                

            

        

        

            

                

                    

                    

                

                

                    

                    
                        Page 1 of 1

The previewContainer is especially important.

It contains two layers:

PDF Canvas
    +
Interactive Element Layer

The PDF page is rendered onto the canvas, while signatures, text, and stamps are displayed in a separate overlay.

This allows users to move and style an element without modifying the original PDF every time they make a small adjustment.

The overlay should match the dimensions and position of the PDF canvas.

#previewContainer {
    position: relative;
    display: inline-block;
}

#pdfCanvas {
    display: block;
}

#elementLayer {
    position: absolute;
    inset: 0;
    pointer-events: none;
}

Individual signature and text elements can later enable their own pointer interactions.

.pdf-element {
    position: absolute;
    cursor: move;
    pointer-events: auto;
    transform-origin: center;
}

This layered structure becomes the foundation of the interactive editor.

What Libraries Are We Using?

This project uses two JavaScript libraries for different parts of the PDF workflow.

PDF.js for Rendering and Previewing

PDF.js is responsible for reading the uploaded document and rendering its pages inside the browser.

A page can be loaded like this:

const page =
    await pdfDocument.getPage(
        currentPage
    );

The page is then rendered to a canvas:

const viewport =
    page.getViewport({
        scale: 1.5
    });

const context =
    pdfCanvas.getContext("2d");

pdfCanvas.width =
    viewport.width;

pdfCanvas.height =
    viewport.height;

await page.render({

    canvasContext: context,

    viewport

}).promise;

PDF.js handles the visual preview.

PDF-lib for Modifying the PDF

PDF-lib is used later when the user clicks Apply & Finalize.

It allows us to load the original PDF bytes and add content to its pages.

For example:

const pdfDoc =
    await PDFLib.PDFDocument.load(
        originalPdfBytes
    );

An uploaded PNG signature can then be embedded:

const signatureImage =
    await pdfDoc.embedPng(
        signatureBytes
    );

Text can also be drawn directly onto a PDF page:

page.drawText(
    "APPROVED",
    {
        x: 100,
        y: 100,
        size: 18
    }
);

The two libraries therefore have separate responsibilities:

PDF.js
→ Load and visually render PDF pages

PDF-lib
→ Modify pages and generate the final PDF

Separating these responsibilities keeps the editor easier to manage.

Include both libraries in the project before script.js.

Configure the PDF.js worker as well:

pdfjsLib.GlobalWorkerOptions.workerSrc =
    "https://cdnjs.cloudflare.com/ajax/libs/pdf.js/3.11.174/pdf.worker.min.js";

For a production project, pin and test the exact library versions you use rather than automatically loading an unspecified latest release.

Uploading and Previewing the PDF

The first interactive step is accepting the user's PDF.

Get references to the required elements:

const pdfInput =
    document.getElementById(
        "pdfInput"
    );

const selectPdfButton =
    document.getElementById(
        "selectPdfButton"
    );

const dropZone =
    document.getElementById(
        "dropZone"
    );

const uploadSection =
    document.getElementById(
        "uploadSection"
    );

const editorSection =
    document.getElementById(
        "editorSection"
    );

const pdfCanvas =
    document.getElementById(
        "pdfCanvas"
    );

We also need a few variables to store the current document state.

let pdfDocument = null;

let originalPdfBytes = null;

let currentPage = 1;

let totalPages = 0;

Clicking the custom button opens the hidden file input.

selectPdfButton.addEventListener(
    "click",
    () => {

        pdfInput.click();

    }
);

When a file is selected, pass it to the PDF loading function.

pdfInput.addEventListener(
    "change",
    event => {

        const file =
            event.target.files[0];

        if (file) {

            loadPdf(file);

        }

    }
);

Before processing the file, validate its type.

async function loadPdf(file) {

    if (
        file.type !==
        "application/pdf"
    ) {

        alert(
            "Please select a valid PDF file."
        );

        return;

    }

}

Read the file as an ArrayBuffer.

const arrayBuffer =
    await file.arrayBuffer();

Keep a copy of the original bytes because PDF.js and PDF-lib will use the document at different stages.

originalPdfBytes =
    new Uint8Array(
        arrayBuffer
    );

Now load the document with PDF.js.

pdfDocument =
    await pdfjsLib
        .getDocument({
            data:
                originalPdfBytes.slice()
        })
        .promise;

Store the number of pages.

totalPages =
    pdfDocument.numPages;

currentPage = 1;

Switch from the upload interface to the editor.

uploadSection.hidden = true;

editorSection.hidden = false;

Finally, render the first page.

await renderPage(currentPage);

The complete loading function becomes:

async function loadPdf(file) {

    if (
        file.type !==
        "application/pdf"
    ) {

        alert(
            "Please select a valid PDF file."
        );

        return;

    }

    const arrayBuffer =
        await file.arrayBuffer();

    originalPdfBytes =
        new Uint8Array(
            arrayBuffer
        );

    pdfDocument =
        await pdfjsLib
            .getDocument({
                data:
                    originalPdfBytes.slice()
            })
            .promise;

    totalPages =
        pdfDocument.numPages;

    currentPage = 1;

    uploadSection.hidden = true;

    editorSection.hidden = false;

    await renderPage(currentPage);

}

For drag-and-drop support, prevent the browser's default behavior.

dropZone.addEventListener(
    "dragover",
    event => {

        event.preventDefault();

        dropZone.classList.add(
            "drag-active"
        );

    }
);

Remove the active state when the file leaves the drop area.

dropZone.addEventListener(
    "dragleave",
    () => {

        dropZone.classList.remove(
            "drag-active"
        );

    }
);

Handle the dropped file:

dropZone.addEventListener(
    "drop",
    event => {

        event.preventDefault();

        dropZone.classList.remove(
            "drag-active"
        );

        const file =
            event.dataTransfer.files[0];

        if (file) {

            loadPdf(file);

        }

    }
);

Rendering the Current PDF Page

The renderPage() function loads one page from the PDF and displays it on the canvas.

async function renderPage(
    pageNumber
) {

    const page =
        await pdfDocument.getPage(
            pageNumber
        );

    const viewport =
        page.getViewport({
            scale: 1.5
        });

    const context =
        pdfCanvas.getContext("2d");

    pdfCanvas.width =
        viewport.width;

    pdfCanvas.height =
        viewport.height;

    await page.render({

        canvasContext: context,

        viewport

    }).promise;

    updatePageInfo();

}

Because the interactive element layer sits above the canvas, it must use the same dimensions.

const elementLayer =
    document.getElementById(
        "elementLayer"
    );

elementLayer.style.width =
    `${viewport.width}px`;

elementLayer.style.height =
    `${viewport.height}px`;

Add those lines inside renderPage() after setting the canvas dimensions.

The page information can then be updated:

function updatePageInfo() {

    pageInfo.textContent =
        `Page ${currentPage} of ${totalPages}`;

}

At this point, the uploaded PDF page is visible, but users still need a way to move through multi-page documents.

Get the navigation controls:

const previousPage =
    document.getElementById(
        "previousPage"
    );

const nextPage =
    document.getElementById(
        "nextPage"
    );

const pageInfo =
    document.getElementById(
        "pageInfo"
    );

The previous button decreases the page number.

previousPage.addEventListener(
    "click",
    async () => {

        if (currentPage <= 1) {
            return;
        }

        currentPage--;

        await renderPage(
            currentPage
        );

    }
);

The next button moves forward.

nextPage.addEventListener(
    "click",
    async () => {

        if (
            currentPage >=
            totalPages
        ) {
            return;
        }

        currentPage++;

        await renderPage(
            currentPage
        );

    }
);

The boundary checks prevent navigation outside the document.

For a 12-page PDF, the interface may display:

Page 12 of 12

The previous button remains available, while the next action can be disabled because the user is already on the final page.

function updateNavigationState() {

    previousPage.disabled =
        currentPage === 1;

    nextPage.disabled =
        currentPage ===
        totalPages;

}

Call this function whenever a new page is rendered.

function updatePageInfo() {

    pageInfo.textContent =
        `Page ${currentPage} of ${totalPages}`;

    updateNavigationState();

}

The user can now upload a PDF, preview its pages, and navigate to the exact page where a signature, custom text, or document stamp needs to be placed.

Choosing an Element to Add

Once the PDF is loaded and the correct page is visible, the user can choose what type of element to place on the document.

The editor provides two options:

Signature
Text/Stamp

Create the element selector:



    1. Choose Element

    
        
        Signature
    

    
        
        Text/Stamp

Get the controls in JavaScript:

const elementTypeInputs =
    document.querySelectorAll(
        'input[name="elementType"]'
    );

const signatureControls =
    document.getElementById(
        "signatureControls"
    );

const textControls =
    document.getElementById(
        "textControls"
    );

Listen for changes:

elementTypeInputs.forEach(
    input => {

        input.addEventListener(
            "change",
            event => {

                const type =
                    event.target.value;

                if (
                    type ===
                    "signature"
                ) {

                    signatureControls.hidden =
                        false;

                    textControls.hidden =
                        true;

                } else {

                    signatureControls.hidden =
                        true;

                    textControls.hidden =
                        false;

                }

            }
        );

    }
);

This keeps the interface focused. Signature-specific controls appear only when the user is creating a signature, while text and stamp controls appear when that element type is selected.

Creating a Signature

The signature workflow supports three methods:

Draw
Type
Upload

Create the method selector:



    2. Create Signature

Track the currently selected method:

let signatureMethod =
    "draw";

Switch between the three panels:

const signatureTabs =
    document.querySelectorAll(
        ".signature-tabs button"
    );

signatureTabs.forEach(
    button => {

        button.addEventListener(
            "click",
            () => {

                signatureMethod =
                    button.dataset.method;

                showSignatureMethod(
                    signatureMethod
                );

            }
        );

    }
);

The panel switching function can hide the inactive methods:

function showSignatureMethod(
    method
) {

    drawPanel.hidden =
        method !== "draw";

    typePanel.hidden =
        method !== "type";

    uploadPanel.hidden =
        method !== "upload";

}

Each method creates the same type of final element (a signature) but the source of that signature is different.

Drawing a Signature

The Draw option allows users to create a handwritten signature directly in the browser.

Add a canvas to the Draw panel:

Get the drawing context:

const signatureCanvas =
    document.getElementById(
        "signatureCanvas"
    );

const signatureContext =
    signatureCanvas.getContext(
        "2d"
    );

let isDrawing = false;

Begin drawing when the pointer touches the canvas:

signatureCanvas.addEventListener(
    "pointerdown",
    event => {

        isDrawing = true;

        const rect =
            signatureCanvas
                .getBoundingClientRect();

        signatureContext.beginPath();

        signatureContext.moveTo(

            event.clientX -
                rect.left,

            event.clientY -
                rect.top

        );

    }
);

Continue the line while the pointer moves:

signatureCanvas.addEventListener(
    "pointermove",
    event => {

        if (!isDrawing) {
            return;
        }

        const rect =
            signatureCanvas
                .getBoundingClientRect();

        signatureContext.lineTo(

            event.clientX -
                rect.left,

            event.clientY -
                rect.top

        );

        signatureContext.stroke();

    }
);

Stop drawing when the pointer is released:

signatureCanvas.addEventListener(
    "pointerup",
    () => {

        isDrawing = false;

    }
);

signatureCanvas.addEventListener(
    "pointerleave",
    () => {

        isDrawing = false;

    }
);

Set a few drawing properties:

signatureContext.lineWidth = 2;

signatureContext.lineCap =
    "round";

signatureContext.lineJoin =
    "round";

For touch devices, prevent the browser from interpreting drawing gestures as page scrolling:

#signatureCanvas {
    touch-action: none;
    cursor: crosshair;
}

The Clear button resets the drawing canvas:

clearSignature.addEventListener(
    "click",
    () => {

        signatureContext.clearRect(

            0,
            0,

            signatureCanvas.width,
            signatureCanvas.height

        );

    }
);

Once the signature is ready, convert the canvas into a PNG data URL:

const drawnSignature =
    signatureCanvas.toDataURL(
        "image/png"
    );

Because the canvas can preserve transparency, the resulting signature can be placed over the PDF without adding an unwanted rectangular background.

The generated image can now be displayed inside the interactive element layer.

function useDrawnSignature() {

    const image =
        new Image();

    image.src =
        signatureCanvas.toDataURL(
            "image/png"
        );

    image.onload =
        () => {

            createSignatureElement(
                image.src
            );

        };

}

This gives the user a visual signature element that can later be positioned over the PDF page.

Typing a Signature

Not every user has a touchscreen, stylus, or existing signature image.

The Type option allows users to enter their name and choose a signature-style appearance.

Add the input controls:



    

    
        Font Size

Listen for text changes:

typedSignature.addEventListener(
    "input",
    updateTypedSignatures
);

Create several style previews:

const signatureFonts = [

    "cursive",

    "'Brush Script MT', cursive",

    "'Segoe Script', cursive"

];

Render the available options:

function updateTypedSignatures() {

    const value =
        typedSignature.value.trim();

    signatureStyles.innerHTML = "";

    if (!value) {
        return;
    }

    signatureFonts.forEach(
        font => {

            const option =
                document.createElement(
                    "button"
                );

            option.textContent =
                value;

            option.style.fontFamily =
                font;

            option.style.fontSize =
                `${signatureFontSize.value}px`;

            option.addEventListener(
                "click",
                () => {

                    createTypedSignature(
                        value,
                        font
                    );

                }
            );

            signatureStyles.appendChild(
                option
            );

        }
    );

}

A typed signature can be converted to an image using another canvas.

function createTypedSignature(
    text,
    fontFamily
) {

    const canvas =
        document.createElement(
            "canvas"
        );

    const context =
        canvas.getContext("2d");

    const fontSize =
        Number(
            signatureFontSize.value
        );

    context.font =
        `${fontSize}px ${fontFamily}`;

    const width =
        context.measureText(
            text
        ).width;

    canvas.width =
        Math.ceil(width + 40);

    canvas.height =
        Math.ceil(fontSize * 2);

    context.font =
        `${fontSize}px ${fontFamily}`;

    context.textBaseline =
        "middle";

    context.fillText(
        text,
        20,
        canvas.height / 2
    );

    const imageUrl =
        canvas.toDataURL(
            "image/png"
        );

    createSignatureElement(
        imageUrl
    );

}

The typed signature is now treated like the drawn signature: it becomes an image element that can be positioned and later embedded into the PDF.

Uploading a Signature Image

The third option allows users to upload an existing signature image.

Add a file input:



    

    
        No file chosen

Listen for file selection:

signatureUpload.addEventListener(
    "change",
    event => {

        const file =
            event.target.files[0];

        if (!file) {
            return;
        }

        loadSignatureImage(file);

    }
);

Validate the image:

function loadSignatureImage(
    file
) {

    const allowedTypes = [

        "image/png",

        "image/jpeg"

    ];

    if (
        !allowedTypes.includes(
            file.type
        )
    ) {

        alert(
            "Please upload a PNG or JPEG image."
        );

        return;

    }

}

Read the selected image:

const reader =
    new FileReader();

reader.onload =
    event => {

        createSignatureElement(
            event.target.result
        );

};

reader.readAsDataURL(file);

Display the selected filename:

selectedSignatureFile.textContent =
    `Selected: ${file.name}`;

The complete function becomes:

function loadSignatureImage(
    file
) {

    const allowedTypes = [

        "image/png",

        "image/jpeg"

    ];

    if (
        !allowedTypes.includes(
            file.type
        )
    ) {

        alert(
            "Please upload a PNG or JPEG image."
        );

        return;

    }

    selectedSignatureFile.textContent =
        `Selected: ${file.name}`;

    const reader =
        new FileReader();

    reader.onload =
        event => {

            createSignatureElement(
                event.target.result
            );

        };

    reader.readAsDataURL(file);

}

A transparent PNG usually works particularly well because only the signature strokes remain visible over the document.

Creating the Signature Preview Element

All three signature methods eventually call the same function:

createSignatureElement(imageUrl);

This means the rest of the editor doesn't need separate positioning logic for drawn, typed, and uploaded signatures.

Create the preview element:

let activeElement = null;

function createSignatureElement(
    imageUrl
) {

    elementLayer.innerHTML = "";

    const image =
        document.createElement(
            "img"
        );

    image.src =
        imageUrl;

    image.className =
        "pdf-element signature-element";

    image.style.left =
        "100px";

    image.style.top =
        "100px";

    image.style.width =
        "180px";

    elementLayer.appendChild(
        image
    );

    activeElement = {

        type: "signature",

        source: imageUrl,

        element: image,

        x: 100,

        y: 100,

        scale: 1,

        rotation: 0,

        opacity: 1

    };

}

The preview now represents the signature that will eventually be written into the PDF.

The same state object can later be updated when the user changes the signature's position, scale, rotation, or opacity.

Adding Text and Preset Stamps

The second main element type is Text/Stamp.

This mode is useful when a document needs a short label, status, date, or other text rather than a handwritten signature.

Create the controls:



    2. Add Text or Stamp

    

    

        

        

        

        

    

    
        Size

        
    

    
        Color

Custom text can be displayed as the user types:

customText.addEventListener(
    "input",
    () => {

        createTextElement(
            customText.value
        );

    }
);

Preset stamps can update the same text input:

const stampButtons =
    document.querySelectorAll(
        "[data-stamp]"
    );

stampButtons.forEach(
    button => {

        button.addEventListener(
            "click",
            () => {

                const stamp =
                    button.dataset.stamp;

                customText.value =
                    stamp;

                createTextElement(
                    stamp
                );

            }
        );

    }
);

Create the text preview:

function createTextElement(
    text
) {

    if (!text.trim()) {

        elementLayer.innerHTML = "";

        activeElement = null;

        return;

    }

    elementLayer.innerHTML = "";

    const textElement =
        document.createElement(
            "div"
        );

    textElement.className =
        "pdf-element text-element";

    textElement.textContent =
        text;

    textElement.style.left =
        "100px";

    textElement.style.top =
        "100px";

    textElement.style.fontSize =
        `${textSize.value}px`;

    textElement.style.color =
        textColor.value;

    elementLayer.appendChild(
        textElement
    );

    activeElement = {

        type: "text",

        text,

        element:
            textElement,

        x: 100,

        y: 100,

        fontSize:
            Number(
                textSize.value
            ),

        color:
            textColor.value,

        rotation: 0,

        opacity: 1

    };

}

When the size changes, update the current element:

textSize.addEventListener(
    "input",
    () => {

        if (
            activeElement?.type !==
            "text"
        ) {
            return;
        }

        activeElement.fontSize =
            Number(
                textSize.value
            );

        activeElement
            .element
            .style
            .fontSize =
                `${textSize.value}px`;

    }
);

Do the same for the color:

textColor.addEventListener(
    "input",
    () => {

        if (
            activeElement?.type !==
            "text"
        ) {
            return;
        }

        activeElement.color =
            textColor.value;

        activeElement
            .element
            .style
            .color =
                textColor.value;

    }
);

The user can now enter custom content such as:

Signed on: 08-09-2025

or quickly select a predefined document stamp.

At this point, the application can create content using all of the available input methods: a drawn signature, typed signature, uploaded signature image, custom text, or preset document stamp.

Positioning and Styling the Element

After creating a signature, text label, or preset stamp, the next step is positioning it correctly on the PDF page.

The preview element sits inside the elementLayer created earlier. Because this layer matches the PDF canvas dimensions, users can move the element visually before anything is written into the final PDF.

The editor also provides controls for:

Scale
Rotation
Opacity
X position
Y position

The exact controls can vary depending on the active element. For example, scale is particularly useful for signatures, while text size and color are handled by the Text/Stamp controls from the previous section.

Create the placement controls:



    3. Placement & Style

    
        Rotation (°)

        
    

    
        Opacity

        
    

    
        X Position

        
    

    
        Y Position

For signature elements, add a scale control:


    Scale (100%)

Get the controls in JavaScript:

const scaleInput =
    document.getElementById(
        "scaleInput"
    );

const scaleValue =
    document.getElementById(
        "scaleValue"
    );

const rotationInput =
    document.getElementById(
        "rotationInput"
    );

const opacityInput =
    document.getElementById(
        "opacityInput"
    );

const xPosition =
    document.getElementById(
        "xPosition"
    );

const yPosition =
    document.getElementById(
        "yPosition"
    );

We'll use a single function to update the visual transformation.

function updateElementTransform() {

    if (!activeElement) {
        return;
    }

    activeElement.element.style.transform =
        `
            scale(${activeElement.scale})
            rotate(${activeElement.rotation}deg)
        `;

    activeElement.element.style.opacity =
        activeElement.opacity;

}

For text elements, initialize scale as 1 so the same transformation function can still be used.

activeElement = {

    type: "text",

    text,

    element: textElement,

    x: 100,

    y: 100,

    scale: 1,

    rotation: 0,

    opacity: 1

};

Changing the Element Scale

When the user moves the scale slider, convert the percentage into a decimal value.

scaleInput.addEventListener(
    "input",
    () => {

        if (!activeElement) {
            return;
        }

        const percentage =
            Number(
                scaleInput.value
            );

        activeElement.scale =
            percentage / 100;

        scaleValue.textContent =
            `${percentage}%`;

        updateElementTransform();

    }
);

A value of 100% represents the original preview size.

50%  → 0.5
100% → 1
114% → 1.14
200% → 2

This makes it easy to enlarge or reduce an uploaded, drawn, or typed signature without creating a new image.

Rotating the Element

The rotation input stores the angle in degrees.

rotationInput.addEventListener(
    "input",
    () => {

        if (!activeElement) {
            return;
        }

        activeElement.rotation =
            Number(
                rotationInput.value
            );

        updateElementTransform();

    }
);

A rotation of 0 keeps the element horizontal, while positive or negative values rotate it around its center.

Adjusting Opacity

Opacity can be useful for stamps, watermarks, and other document labels.

Convert the percentage slider to a value between 0 and 1.

opacityInput.addEventListener(
    "input",
    () => {

        if (!activeElement) {
            return;
        }

        activeElement.opacity =
            Number(
                opacityInput.value
            ) / 100;

        updateElementTransform();

    }
);

For example:

100% → 1
75%  → 0.75
50%  → 0.5

The same opacity value will later be used when generating the final PDF.

Dragging an Element Across the PDF Preview

Typing X and Y coordinates manually is useful for precise adjustments, but most users will prefer to drag the element directly to the required location.

Track the dragging state:

let isDragging = false;

let dragOffsetX = 0;

let dragOffsetY = 0;

When a signature or text element is created, attach the dragging behavior.

function enableDragging(
    element
) {

    element.addEventListener(
        "pointerdown",
        event => {

            isDragging = true;

            const elementRect =
                element
                    .getBoundingClientRect();

            dragOffsetX =
                event.clientX -
                elementRect.left;

            dragOffsetY =
                event.clientY -
                elementRect.top;

            element.setPointerCapture(
                event.pointerId
            );

        }
    );

}

Call this function when creating an element.

enableDragging(image);

or:

enableDragging(textElement);

Next, listen for pointer movement.

elementLayer.addEventListener(
    "pointermove",
    event => {

        if (
            !isDragging ||
            !activeElement
        ) {
            return;
        }

        const layerRect =
            elementLayer
                .getBoundingClientRect();

        const x =
            event.clientX -
            layerRect.left -
            dragOffsetX;

        const y =
            event.clientY -
            layerRect.top -
            dragOffsetY;

        moveActiveElement(
            x,
            y
        );

    }
);

Create a reusable movement function:

function moveActiveElement(
    x,
    y
) {

    if (!activeElement) {
        return;
    }

    activeElement.x = x;

    activeElement.y = y;

    activeElement.element.style.left =
        `${x}px`;

    activeElement.element.style.top =
        `${y}px`;

    xPosition.value =
        Math.round(x);

    yPosition.value =
        Math.round(y);

}

Stop dragging when the pointer is released.

elementLayer.addEventListener(
    "pointerup",
    () => {

        isDragging = false;

    }
);

elementLayer.addEventListener(
    "pointercancel",
    () => {

        isDragging = false;

    }
);

Now the signature or text element can be moved directly over the document.

For example, an uploaded signature may be positioned near the bottom-right corner of the final page.

Updating the Position Manually

The X and Y fields provide another way to position the element.

Listen for changes to the X coordinate:

xPosition.addEventListener(
    "input",
    () => {

        if (!activeElement) {
            return;
        }

        const x =
            Number(
                xPosition.value
            );

        moveActiveElement(
            x,
            activeElement.y
        );

    }
);

Do the same for Y:

yPosition.addEventListener(
    "input",
    () => {

        if (!activeElement) {
            return;
        }

        const y =
            Number(
                yPosition.value
            );

        moveActiveElement(
            activeElement.x,
            y
        );

    }
);

Dragging and manual coordinate entry remain synchronized. Moving the element updates the fields, while changing the fields moves the preview element.

Keeping the Element Inside the Page

Without boundaries, users could accidentally drag an element completely outside the PDF preview.

We can limit the position before saving it.

function clampPosition(
    x,
    y
) {

    const element =
        activeElement.element;

    const maxX =
        elementLayer.clientWidth -
        element.offsetWidth;

    const maxY =
        elementLayer.clientHeight -
        element.offsetHeight;

    return {

        x:
            Math.max(
                0,
                Math.min(x, maxX)
            ),

        y:
            Math.max(
                0,
                Math.min(y, maxY)
            )

    };

}

Use it inside moveActiveElement():

const position =
    clampPosition(
        x,
        y
    );

activeElement.x =
    position.x;

activeElement.y =
    position.y;

When scale or rotation is applied, the element's transformed visual bounds can extend beyond its original box. A production editor can use getBoundingClientRect() for more precise transformed-boundary calculations.

The basic clamp shown here is sufficient to demonstrate the positioning workflow.

Applying the Element to Selected Pages

After positioning the element, the user decides which PDF pages should receive it.

The interface provides three options:

Current page only
All pages
Specific pages

Create the controls:



    4. Apply to Pages

    
        
        Current page only
    

    
        
        All pages
    

    
        
        Specific pages

Read the selected mode:

function getTargetPages() {

    const mode =
        document.querySelector(
            'input[name="applyMode"]:checked'
        ).value;

    if (
        mode ===
        "current"
    ) {

        return [
            currentPage
        ];

    }

    if (
        mode ===
        "all"
    ) {

        return Array.from(

            {
                length:
                    totalPages
            },

            (_, index) =>
                index + 1

        );

    }

    return parsePageRange(
        specificPages.value
    );

}

Parse custom values such as:

1, 3-5, 10

with:

function parsePageRange(
    value
) {

    const pages =
        new Set();

    value
        .split(",")
        .forEach(part => {

            const item =
                part.trim();

            if (!item) {
                return;
            }

            if (
                item.includes("-")
            ) {

                const [
                    start,
                    end
                ] =
                    item
                        .split("-")
                        .map(Number);

                for (
                    let page = start;
                    page <= end;
                    page++
                ) {

                    if (
                        page >= 1 &&
                        page <= totalPages
                    ) {

                        pages.add(page);

                    }

                }

            } else {

                const page =
                    Number(item);

                if (
                    page >= 1 &&
                    page <= totalPages
                ) {

                    pages.add(page);

                }

            }

        });

    return [...pages];

}

The result becomes:

This allows a signature or stamp to be placed once and then applied to multiple target pages.

Just keep in mind that page dimensions may differ within the same PDF. Applying the same coordinates across pages works best when those pages use a consistent size and layout.

Applying and Finalizing the PDF

Once the element is created, positioned, styled, and assigned to the correct pages, the user can click Apply & Finalize.

Create the action buttons:

Get the buttons:

const applyButton =
    document.getElementById(
        "applyButton"
    );

const startOverButton =
    document.getElementById(
        "startOverButton"
    );

Before generating the final PDF, make sure an element exists.

applyButton.addEventListener(
    "click",
    async () => {

        if (!activeElement) {

            alert(
                "Please add a signature, text, or stamp first."
            );

            return;

        }

        const targetPages =
            getTargetPages();

        if (
            targetPages.length === 0
        ) {

            alert(
                "Please select at least one valid page."
            );

            return;

        }

        await generateFinalPdf(
            targetPages
        );

    }
);

The generateFinalPdf() function will handle the actual PDF modification in the next section.

The Start Over button clears the current document and resets the application.

startOverButton.addEventListener(
    "click",
    resetTool
);

Create the reset function:

function resetTool() {

    pdfDocument = null;

    originalPdfBytes = null;

    currentPage = 1;

    totalPages = 0;

    activeElement = null;

    pdfInput.value = "";

    elementLayer.innerHTML = "";

    editorSection.hidden = true;

    resultSection.hidden = true;

    uploadSection.hidden = false;

}

This returns the application to its original upload state.

The interactive editing stage is now complete. Users can create a signature or text element, position it directly over the PDF, adjust its appearance, choose the target pages, and prepare the document for final processing.

Generating the Signed PDF

The element shown over the browser preview hasn't yet been added to the actual PDF. When users click Apply & Finalize, the application loads the original document with PDF-lib and writes the selected signature, text, or stamp onto the target pages.

Start by loading the original PDF bytes:

async function generateFinalPdf(
    targetPages
) {

    const pdfDoc =
        await PDFLib.PDFDocument.load(
            originalPdfBytes.slice()
        );

    const pages =
        pdfDoc.getPages();

}

Before placing the element, we need to convert its browser coordinates into PDF coordinates.

The preview canvas may be displayed at a different size from the actual PDF page. The coordinate systems also use different Y-axis origins.

For each target page, calculate the scale:

const {
    width: pdfWidth,
    height: pdfHeight
} = page.getSize();

const scaleX =
    pdfWidth /
    pdfCanvas.width;

const scaleY =
    pdfHeight /
    pdfCanvas.height;

Convert the preview position:

const pdfX =
    activeElement.x *
    scaleX;

const pdfY =
    pdfHeight -
    (
        activeElement.y +
        activeElement.element.offsetHeight
    ) * scaleY;

This conversion maps the element from the browser's top-left coordinate system to the PDF page's coordinate system.

Embedding a Signature

Drawn, typed, and uploaded signatures are all represented as images by the time they reach the final processing stage.

Convert the signature data URL into bytes:

async function dataUrlToBytes(
    dataUrl
) {

    const response =
        await fetch(dataUrl);

    return await response.arrayBuffer();

}

Embed the signature image:

const signatureBytes =
    await dataUrlToBytes(
        activeElement.source
    );

const signatureImage =
    await pdfDoc.embedPng(
        signatureBytes
    );

If uploaded JPEG signatures are supported, the application should preserve the original image format and use embedJpg() when appropriate.

Calculate the final dimensions:

const previewWidth =
    activeElement
        .element
        .offsetWidth *
    activeElement.scale;

const previewHeight =
    activeElement
        .element
        .offsetHeight *
    activeElement.scale;

const finalWidth =
    previewWidth *
    scaleX;

const finalHeight =
    previewHeight *
    scaleY;

Then draw the signature:

page.drawImage(
    signatureImage,
    {
        x: pdfX,

        y:
            pdfHeight -
            (
                activeElement.y *
                scaleY
            ) -
            finalHeight,

        width:
            finalWidth,

        height:
            finalHeight,

        rotate:
            PDFLib.degrees(
                activeElement.rotation
            ),

        opacity:
            activeElement.opacity
    }
);

The same processing logic works whether the signature was drawn, typed, or uploaded because all three methods produce an image element before finalization.

Adding Text or a Stamp

Text and preset stamps are written directly onto the PDF page.

First, convert the selected color from hexadecimal to RGB values.

function hexToRgb(
    hex
) {

    const value =
        hex.replace(
            "#",
            ""
        );

    return {

        r:
            parseInt(
                value.substring(0, 2),
                16
            ) / 255,

        g:
            parseInt(
                value.substring(2, 4),
                16
            ) / 255,

        b:
            parseInt(
                value.substring(4, 6),
                16
            ) / 255

    };

}

Apply the text:

const color =
    hexToRgb(
        activeElement.color
    );

page.drawText(
    activeElement.text,
    {
        x:
            activeElement.x *
            scaleX,

        y:
            pdfHeight -
            (
                activeElement.y *
                scaleY
            ) -
            activeElement.fontSize,

        size:
            activeElement.fontSize *
            scaleY,

        color:
            PDFLib.rgb(
                color.r,
                color.g,
                color.b
            ),

        rotate:
            PDFLib.degrees(
                activeElement.rotation
            ),

        opacity:
            activeElement.opacity
    }
);

After processing every target page, save the modified document:

const finalPdfBytes =
    await pdfDoc.save();

const finalPdfBlob =
    new Blob(
        [finalPdfBytes],
        {
            type:
                "application/pdf"
        }
    );

await showFinalPreview(
    finalPdfBlob
);

At this point, the selected signature, text, or stamp has been added to the generated PDF.

Previewing the Final PDF

Before downloading the document, the application displays the completed PDF in a separate preview area.

This allows users to confirm that the element appears on the correct page and in the expected position.

Load the generated file with PDF.js:

let finalPdfDocument = null;

let finalPage = 1;

async function showFinalPreview(
    blob
) {

    const bytes =
        await blob.arrayBuffer();

    finalPdfDocument =
        await pdfjsLib
            .getDocument({
                data: bytes
            })
            .promise;

    finalPage = 1;

    editorSection.hidden =
        true;

    resultSection.hidden =
        false;

    await renderFinalPage(
        finalPage
    );

}

Render the current result page:

async function renderFinalPage(
    pageNumber
) {

    const page =
        await finalPdfDocument
            .getPage(
                pageNumber
            );

    const viewport =
        page.getViewport({
            scale: 1.4
        });

    finalCanvas.width =
        viewport.width;

    finalCanvas.height =
        viewport.height;

    await page.render({

        canvasContext:
            finalCanvas
                .getContext("2d"),

        viewport

    }).promise;

    finalPageInfo.textContent =
        `Page ${pageNumber} of ${finalPdfDocument.numPages}`;

}

Previous and next controls can use the same navigation pattern as the original PDF preview.

Renaming and Downloading the Final PDF

After reviewing the processed document, users can rename the file before downloading it.

For example:

document_signed.pdf

Create the filename input:

Make sure the filename has the correct extension:

function getOutputFilename() {

    let filename =
        outputFilename
            .value
            .trim();

    if (!filename) {

        filename =
            "document_signed.pdf";

    }

    if (
        !filename
            .toLowerCase()
            .endsWith(".pdf")
    ) {

        filename += ".pdf";

    }

    return filename;

}

The result section can also display the total page count and generated file size.

function formatFileSize(
    bytes
) {

    if (
        bytes <
        1024 * 1024
    ) {

        return (
            bytes / 1024
        ).toFixed(2) + " KB";

    }

    return (
        bytes /
        1024 /
        1024
    ).toFixed(2) + " MB";

}

Update the file information:

filePageCount.textContent =
    `Total Pages: ${finalPdfDocument.numPages}`;

fileSize.textContent =
    `File Size: ${
        formatFileSize(
            finalPdfBlob.size
        )
    }`;

Download the file using a temporary object URL:

downloadButton.addEventListener(
    "click",
    () => {

        const url =
            URL.createObjectURL(
                finalPdfBlob
            );

        const link =
            document.createElement(
                "a"
            );

        link.href =
            url;

        link.download =
            getOutputFilename();

        link.click();

        URL.revokeObjectURL(
            url
        );

    }
);

After downloading, the Start Over button resets the application so another PDF can be processed.

Demo: How the PDF Signature Tool Works

Let's walk through the complete workflow from upload to download.

Step 1: Upload the PDF

Users begin by dragging a PDF into the upload area or clicking Select PDF.

The browser reads the document and prepares it for local processing.

Step 2: Preview and Navigate the Document

After upload, the current page appears in the PDF preview.

Previous and next controls allow users to navigate through multi-page documents and find the page where an element needs to be added.

Step 3: Choose What to Add

The user chooses between Signature and Text/Stamp.

This determines which creation controls appear in the editor.

Step 4: Create the Signature

If Signature is selected, users can choose Draw, Type, or Upload.

Drawing works directly inside the signature canvas. The Type option creates a signature-style element from entered text, while Upload accepts an existing signature image.

Step 5: Add Custom Text or a Preset Stamp

Instead of a signature, users can select Text/Stamp.

They can enter custom content or choose a preset such as APPROVED, CONFIDENTIAL, DRAFT, or PAID.

Step 6: Position and Style the Element

The created element appears over the PDF preview.

Users can drag it to the required position and adjust properties such as scale, rotation, opacity, X position, and Y position.

Text elements also support configurable font size and color.

Step 7: Choose the Target Pages

The element can be applied to the current page, every page, or a specific page selection.

For example:

1, 3-5, 10

This is useful when the same stamp or document label needs to appear on several pages.

Step 8: Apply and Finalize

After checking the element and target pages, users click Apply & Finalize.

The browser converts the preview position into PDF coordinates and generates the modified document.

Step 9: Preview the Completed PDF

The generated document appears in a final preview.

Users can navigate through the pages and verify that the signature, text, or stamp appears correctly before downloading.

Step 10: Rename and Download

The final section allows users to change the output filename and review the total number of pages and file size.

Clicking Download saves the generated PDF locally.

Afterward, Start Over clears the session and returns to the upload interface.

Handling Signature Transparency

Uploaded signatures often look best when the background is transparent.

A transparent PNG contains only the visible signature strokes, allowing the original PDF content to remain visible around the signature.

A JPEG image, by comparison, usually includes a solid background. If the image was scanned from white paper, placing it on a colored PDF area may create a visible white rectangle.

For uploaded signatures, transparent PNG files are therefore usually the better option.

The same principle applies to drawn and typed signatures. When converting a canvas to PNG, avoid filling the canvas with a background color unless that background is intentionally required.

const signatureImage =
    signatureCanvas.toDataURL(
        "image/png"
    );

The transparent canvas can then be embedded directly into the PDF.

Important Notes and Common Mistakes

One common mistake is assuming that the browser preview and the actual PDF use identical coordinates.

Always calculate the relationship between the canvas dimensions and the target PDF page before placing the final element.

const scaleX =
    pdfWidth /
    pdfCanvas.width;

const scaleY =
    pdfHeight /
    pdfCanvas.height;

Another issue occurs when the same element is applied to pages with different dimensions. A position that looks correct on an A4 page may not appear in the same visual location on a landscape or differently sized page.

Uploaded signature images should also be validated before processing.

const allowedTypes = [
    "image/png",
    "image/jpeg"
];

if (
    !allowedTypes.includes(
        file.type
    )
) {

    alert(
        "Please upload a PNG or JPEG image."
    );

    return;

}

Very large image files should be resized before embedding to avoid unnecessarily increasing the final PDF size.

Users should also review the completed document before downloading it. Rotation, scaling, or coordinate conversion errors are much easier to identify in the final preview than after the file has already been shared.

Finally, remember that this project adds a visual electronic signature to a PDF. It does not create a certificate-based cryptographic digital signature or provide automatic identity verification.

Conclusion

In this tutorial, you built a browser-based PDF Signature Tool using JavaScript.

You learned how to upload and preview PDF documents, navigate between pages, create signatures by drawing, typing, or uploading an image, add custom text and preset stamps, position elements directly over a PDF preview, adjust their appearance, choose target pages, and generate the completed document with PDF-lib.

You also learned how browser coordinates are converted into PDF coordinates and why signature transparency matters when embedding images into a document.

The final workflow allows users to preview the completed PDF, rename the output file, review its page count and size, and download it directly from the browser.

You can explore the complete workflow using the PDF Signature Tool.

The project can be extended further with multiple elements per page, reusable signature profiles, date fields, initials, custom fonts, signature removal before finalization, or certificate-based digital signing through a dedicated signing infrastructure.

How to Build a Multi-Tenant SaaS API with Node.js, RBAC, and Audit Logging

Zia Ullah — Mon, 20 Jul 2026 20:46:12 +0000

A colleague asked me to help debug what looked like a permissions issue in their SaaS project management tool. Users were seeing resources they hadn't created.

I pulled up the query logs expecting something subtle. It was not. The list endpoint had no tenant_id filter at all. Every tenant in the database could read every other tenant's projects. The application never threw an error. It just returned whatever was there.

Missing tenant filters don't throw errors. They return the wrong data without any complaint, and nothing in your logs will flag it. I've seen this run in production for weeks before a support ticket pointed anyone at the query logs.

When it does surface, who finds it first matters a lot. A customer noticing it is bad. A compliance auditor noticing it during a SOC 2 review is a different kind of problem.

Isolation built in from the start is a day of work. The time I spent helping a team retrofit it after a compliance review was considerably longer than that, and involved more customer emails than anyone wanted to write.

The stack is Node.js with PostgreSQL. CRUD is the easy part. Tenant isolation, RBAC, and audit logging take more care, and where those checks run in the stack matters. I put all three in middleware, before any route handler fires. A handler that never calls the isolation logic directly can't accidentally skip it.

Prerequisites

Node.js 18+
PostgreSQL 14+
Basic knowledge of Express.js and JWT

What We Will Build

A multi-tenant Express REST API that enforces:

Tenant isolation: every database query scopes to the tenant_id from the verified JWT. The client can't influence which tenant the query runs against.
RBAC: four roles, each with a numeric level (SuperAdmin is highest, Viewer lowest). Middleware checks the level before the handler runs.
Audit logging: any write or sensitive read appends a row to the audit table. The app can't modify those rows afterward. The database enforces this directly. If a bug in the app tries to UPDATE an audit row, the database refuses it. Application-level enforcement alone can't give you that guarantee.
Per-tenant rate limiting: request counts in Redis, keyed to the tenant. I've seen IP-based limiting break an enterprise rollout when fifty users came through a single corporate proxy.
Tenant isolation tests: a dedicated test file that proves cross-tenant data can't leak. Wire it into CI and it catches broken isolation before it ships.

How Multi-Tenancy Works
Architecture Overview
Database Schema Design
Project Setup
JWT Design for Multi-Tenancy
Auth and RBAC Middleware
The Tenant-Safe Repository Layer
Audit Logging Service
Per-Tenant Rate Limiting
Building the Routes
Testing Tenant Isolation
Troubleshooting
Wrapping Up

How Multi-Tenancy Works

This tutorial uses a shared database with row-level isolation: a tenant_id column on every table, a filter on every query. The database holds everyone's data together. The application decides what each tenant can see.

Two other approaches exist: schema-per-tenant and database-per-tenant. I've talked to teams on schema-per-tenant who ended up spending more engineering time on migration tooling than on their actual product. Database-per-tenant gives stronger guarantees but a connection pool that balloons with every new customer signup.

Neither scales cheaply. Row-level isolation scales further than most teams expect. The ones I know who moved off it did so years in, usually under specific regulatory pressure, not because the approach stopped working.

The one thing in this design that can't be optional: tenant_id must always come from the verified JWT. Not from the request body, not from the URL. Users control what they put in both of those. They don't control what gets signed into a JWT on your server.

Architecture Overview

HTTP Request
     │
     ▼
┌─────────────────────────────────────────┐
│           Express Middleware Stack       │
│                                         │
│  1. Rate Limiter (per tenant_id)        │
│  2. Auth Middleware (verify JWT)        │
│     └─► Extracts: userId, tenantId,    │
│          role, permissions              │
│  3. RBAC Middleware (check role)        │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│           Route Handler                  │
│                                         │
│  1. Call Repository (tenant-safe query) │
│  2. Call Audit Service (fire & forget)  │
│  3. Return response                     │
└──────────────┬──────────────────────────┘
               │
     ┌─────────┴──────────┐
     ▼                    ▼
┌─────────┐        ┌────────────┐
│ Projects│        │ Audit Logs │
│  Table  │        │   Table    │
│(+tenant)│        │(append only│
└─────────┘        └────────────┘

Rate limiting, auth, and RBAC all run before any handler sees the request. Writes pass through the audit service. The repository takes tenantId from req.user and the handler never touches tenant scoping directly, so there's no path around it.

Database Schema Design

-- Tenants table
CREATE TABLE tenants (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name        VARCHAR(255) NOT NULL,
  plan        VARCHAR(50) NOT NULL DEFAULT 'free', -- 'free', 'pro', 'enterprise'
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Users table
CREATE TABLE users (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id   UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
  email       VARCHAR(255) NOT NULL,
  role        VARCHAR(50) NOT NULL DEFAULT 'Member', -- 'SuperAdmin','TenantAdmin','Member','Viewer'
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE(tenant_id, email)
);

CREATE INDEX idx_users_tenant ON users(tenant_id);

-- Projects table (example resource — replace with your domain entity)
CREATE TABLE projects (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id   UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
  name        VARCHAR(255) NOT NULL,
  description TEXT,
  created_by  UUID NOT NULL REFERENCES users(id),
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_projects_tenant ON projects(tenant_id);

-- Audit log table (append-only — never UPDATE or DELETE rows here)
CREATE TABLE audit_logs (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id   UUID NOT NULL,
  user_id     UUID NOT NULL,
  user_email  TEXT NOT NULL,
  user_role   TEXT NOT NULL,        -- role at time of action
  action      TEXT NOT NULL,        -- 'CREATE', 'UPDATE', 'DELETE', 'VIEW'
  resource    TEXT NOT NULL,        -- table name
  resource_id TEXT,
  old_values  JSONB,
  new_values  JSONB,
  ip_address  INET,
  user_agent  TEXT,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_audit_tenant ON audit_logs(tenant_id);
CREATE INDEX idx_audit_created ON audit_logs(created_at DESC);

-- Protect audit log at database level
-- Use a DO block so this runs safely in Docker where app_user is the superuser
DO $$
BEGIN
  IF current_user <> 'app_user' THEN
    REVOKE DELETE, UPDATE ON audit_logs FROM app_user;
  END IF;
END $$;

The REVOKE matters. Application bugs happen. If something in your codebase accidentally tries to UPDATE an audit row, you want the database to refuse it outright, not silently comply.

Project Setup

mkdir nodejs-multitenant-saas-api
cd nodejs-multitenant-saas-api
npm init -y
npm install express pg jsonwebtoken bcryptjs express-rate-limit rate-limit-redis ioredis dotenv
npm install --save-dev jest supertest

Starting PostgreSQL and Redis with Docker

Skip the local installs. One docker-compose.yml in the project root brings up both PostgreSQL and Redis:

services:
  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: saas_api
      POSTGRES_USER: app_user
      POSTGRES_PASSWORD: app_password
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./schema.sql:/docker-entrypoint-initdb.d/01_schema.sql

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

volumes:
  postgres_data:

That schema.sql mount runs your SQL automatically when the container first starts. No psql required.

docker compose up -d

.env in the project root:

DATABASE_URL=postgresql://app_user:app_password@localhost:5432/saas_api
REDIS_URL=redis://localhost:6379
JWT_SECRET=your_random_secret_here
PORT=3000
NODE_ENV=development

Don't type a JWT_SECRET by hand. Run this to generate one:

node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"

File structure:

nodejs-multitenant-saas-api/
├── src/
│   ├── middleware/
│   │   ├── auth.js          # JWT verification + tenant extraction
│   │   ├── rbac.js          # Role enforcement
│   │   └── rateLimiter.js   # Per-tenant rate limiting
│   ├── services/
│   │   └── auditService.js  # Append-only audit logger
│   ├── repositories/
│   │   └── projectRepo.js   # Tenant-safe DB queries
│   ├── routes/
│   │   └── projects.js      # Route handlers
│   └── utils/
│       └── token.js         # JWT token generation
├── db/
│   ├── index.js             # PostgreSQL pool
│   └── redis.js             # Redis client
├── docker-compose.yml
├── app.js
├── server.js
└── tests/
    └── tenantIsolation.test.js

Boilerplate Files

There are four files the tutorial doesn't cover in detail, but the test file needs all of them to run:

// db/index.js
const { Pool } = require('pg');

const pool = new Pool({ connectionString: process.env.DATABASE_URL });

pool.on('error', (err) => console.error('PostgreSQL error:', err.message));

module.exports = { pool };

// db/redis.js
const Redis = require('ioredis');

const redisClient = new Redis(process.env.REDIS_URL);

redisClient.on('error', (err) => console.error('Redis error:', err.message));

module.exports = { redisClient };

// app.js
require('dotenv').config();
const express = require('express');
const projectsRouter = require('./src/routes/projects');

const app = express();
app.use(express.json());

app.use('/api/projects', projectsRouter);

// Global error handler — must have 4 parameters to be recognised by Express
app.use((err, req, res, next) => {
  console.error(err.stack);
  res.status(500).json({ error: 'Internal server error' });
});

module.exports = app;

// server.js
const app = require('./app');

const PORT = process.env.PORT || 3000;
app.listen(PORT, () => console.log(`Server running on port ${PORT}`));

bcryptjs is included for a login endpoint with proper password hashing. That part isn't covered here, but the GitHub repo has a working /api/auth/login example.

JWT Design for Multi-Tenancy

Both tenantId and role go into the JWT payload. Everything downstream reads from these two fields. Get them wrong, and nothing behaves correctly.

// Example JWT payload
{
  "userId": "usr_abc123",
  "tenantId": "ten_xyz789",
  "email": "alice@acme.com",
  "role": "TenantAdmin",
  "iat": 1720000000,
  "exp": 1720086400
}

The roles in order of privilege:

SuperAdmin: cross-tenant access for your internal team only
TenantAdmin: full access within their tenant
Member: read and write within their tenant
Viewer: read-only within their tenant

Generate a token (used for testing and your auth endpoint):

// src/utils/token.js
const jwt = require('jsonwebtoken');

function generateToken({ userId, tenantId, email, role }) {
  return jwt.sign(
    { userId, tenantId, email, role },
    process.env.JWT_SECRET,
    { expiresIn: '24h' }
  );
}

module.exports = { generateToken };

Auth and RBAC Middleware

The auth middleware does two things: verifies the JWT signature and extracts the tenant context into req.user.

That second part is what the entire system depends on. Every query downstream reads req.user.tenantId. The client has no say in what that value is. They send a token the server signed, and the server reads back what it put in.

// src/middleware/auth.js
const jwt = require('jsonwebtoken');

function authMiddleware(req, res, next) {
  const authHeader = req.headers.authorization;
  if (!authHeader?.startsWith('Bearer ')) {
    return res.status(401).json({ error: 'Missing or malformed Authorization header' });
  }

  const token = authHeader.split(' ')[1];

  try {
    const decoded = jwt.verify(token, process.env.JWT_SECRET);

    // tenantId always comes from the verified token — never req.body or req.params
    req.user = {
      userId:   decoded.userId,
      tenantId: decoded.tenantId,
      email:    decoded.email,
      role:     decoded.role,
    };

    next();
  } catch (err) {
    return res.status(401).json({ error: 'Invalid or expired token' });
  }
}

module.exports = { authMiddleware };

The RBAC middleware is separate from auth by design. Auth runs on every route. Role enforcement only applies where a minimum role is required. You pass the allowed roles to requireRole() and it compares the user's level against the hierarchy. A Viewer trying to delete something hits the 403 before the handler ever runs.

// src/middleware/rbac.js
const ROLE_HIERARCHY = {
  SuperAdmin:   4,
  TenantAdmin:  3,
  Member:       2,
  Viewer:       1,
};

// requireRole('TenantAdmin') — user must be TenantAdmin or higher
function requireRole(...roles) {
  return (req, res, next) => {
    const userLevel = ROLE_HIERARCHY[req.user?.role] ?? 0;
    const requiredLevel = Math.min(...roles.map(r => ROLE_HIERARCHY[r] ?? 999));

    if (userLevel < requiredLevel) {
      return res.status(403).json({
        error: 'Insufficient permissions',
        required: roles,
        current: req.user?.role,
      });
    }

    next();
  };
}

module.exports = { requireRole };

The Tenant-Safe Repository Layer

Isolation lives here. Every function takes tenantId as a required argument, pulled from req.user by the handler. There's no way to call these without providing a tenant scope. I've watched teams try to handle this with a URL parameter instead (GET /api/projects?tenantId=xyz) and call it isolated. It is not. Any client sends whatever it wants in a query string.

// src/repositories/projectRepo.js
const { pool } = require('../../db');

// List all projects for a tenant — tenantId is ALWAYS from the JWT
async function listProjects(tenantId) {
  const result = await pool.query(
    `SELECT id, name, description, created_by, created_at
     FROM projects
     WHERE tenant_id = $1
     ORDER BY created_at DESC`,
    [tenantId]
  );
  return result.rows;
}

// Get a single project — returns null if it belongs to a different tenant
// NOTE: Returns 404 (not 403) intentionally — don't reveal the resource exists
async function getProject(id, tenantId) {
  const result = await pool.query(
    `SELECT id, name, description, created_by, created_at
     FROM projects
     WHERE id = $1 AND tenant_id = $2`,
    [id, tenantId]
  );
  return result.rows[0] || null;
}

async function createProject({ tenantId, name, description, createdBy }) {
  const result = await pool.query(
    `INSERT INTO projects (tenant_id, name, description, created_by)
     VALUES ($1, $2, $3, $4)
     RETURNING *`,
    [tenantId, name, description, createdBy]
  );
  return result.rows[0];
}

async function updateProject(id, tenantId, updates) {
  const result = await pool.query(
    `UPDATE projects
     SET name = COALESCE($3, name),
         description = COALESCE($4, description),
         updated_at = NOW()
     WHERE id = $1 AND tenant_id = $2
     RETURNING *`,
    [id, tenantId, updates.name, updates.description]
  );
  return result.rows[0] || null;
}

async function deleteProject(id, tenantId) {
  const result = await pool.query(
    `DELETE FROM projects WHERE id = $1 AND tenant_id = $2 RETURNING id`,
    [id, tenantId]
  );
  return result.rows[0] || null;
}

module.exports = { listProjects, getProject, createProject, updateProject, deleteProject };

Notice what getProject does when Tenant A tries to fetch a Tenant B resource. The query runs with Tenant A's tenantId. The condition id = $1 AND tenant_id = $2 matches nothing, null comes back, and the handler sends a 404. Not a 403. A 403 tells the caller the resource exists, but they can't access it, which is information they shouldn't have.

Audit Logging Service

// src/services/auditService.js
const { pool } = require('../../db');

async function log({
  tenantId,
  userId,
  userEmail,
  userRole,          // role at time of action — roles change, log should not
  action,            // 'CREATE' | 'UPDATE' | 'DELETE' | 'VIEW'
  resource,          // table name
  resourceId = null,
  oldValues = null,
  newValues = null,
  ipAddress = null,
  userAgent = null,
}) {
  const query = `
    INSERT INTO audit_logs
      (tenant_id, user_id, user_email, user_role, action, resource,
       resource_id, old_values, new_values, ip_address, user_agent)
    VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
  `;

  const values = [
    tenantId, userId, userEmail, userRole, action, resource,
    resourceId,
    oldValues  ? JSON.stringify(oldValues)  : null,
    newValues  ? JSON.stringify(newValues)  : null,
    ipAddress,
    userAgent,
  ];

  // Fire-and-forget — audit logging must never block or fail a user request
  pool.query(query, values).catch((err) => {
    console.error('[AuditService] Failed to write log:', err.message);
  });
}

module.exports = { log };

Capturing userRole at write time matters more than it looks. User roles change after the fact: someone gets demoted, a permission is revoked. If the log only records the user ID, you lose the context of what privilege they held when the action happened. Store the role at the time of the action, and you always know.

Per-Tenant Rate Limiting

IP-based rate limiting breaks down in SaaS. A corporate customer might route hundreds of users through a single NAT gateway, sharing one IP address. One heavy tenant throttles everyone else on that address.

I've watched teams discover this the hard way when an enterprise customer suddenly floods the API, and their other tenants start getting 429s with no explanation. Scope limits to tenant_id instead.

// src/middleware/rateLimiter.js
const rateLimit = require('express-rate-limit');
const { RedisStore } = require('rate-limit-redis');
const { redisClient } = require('../../db/redis');

// Rate limits by plan — extend as needed
const PLAN_LIMITS = {
  free:       { max: 100,  windowMs: 15 * 60 * 1000 }, // 100 req / 15 min
  pro:        { max: 500,  windowMs: 15 * 60 * 1000 }, // 500 req / 15 min
  enterprise: { max: 2000, windowMs: 15 * 60 * 1000 }, // 2000 req / 15 min
};

function createTenantRateLimiter(plan = 'free') {
  const limits = PLAN_LIMITS[plan] || PLAN_LIMITS.free;

  return rateLimit({
    windowMs: limits.windowMs,
    max: limits.max,
    // Key = tenant_id from verified JWT — NOT the IP address
    keyGenerator: (req) => `tenant:${req.user?.tenantId || req.ip}`,
    store: new RedisStore({
      sendCommand: (...args) => redisClient.call(...args),
    }),
    handler: (req, res) => {
      res.status(429).json({
        error: 'Too many requests',
        retryAfter: Math.ceil(limits.windowMs / 1000),
      });
    },
  });
}

// Default limiter for all API routes
const defaultLimiter = createTenantRateLimiter('free');

module.exports = { defaultLimiter, createTenantRateLimiter };

Building the Routes

This is where everything connects. Auth and rate limiting apply to the whole router. Role checks go on individual routes. The audit log fires after every write. tenantId never comes from the request body or URL. req.user.tenantId is the only source, set by the auth middleware from the verified token, so there's no path around it.

One practical detail for Express 4: it doesn't catch async errors automatically. Every handler wraps its logic in try/catch and passes failures to next(err). Skip that and an unhandled promise rejection returns a blank 500 with no log entry and no audit trail. The comment at the top of the router is a reminder that the pattern is intentional.

// src/routes/projects.js
const express = require('express');
const { authMiddleware }  = require('../middleware/auth');
const { requireRole }     = require('../middleware/rbac');
const { defaultLimiter }  = require('../middleware/rateLimiter');
const audit               = require('../services/auditService');
const repo                = require('../repositories/projectRepo');

const router = express.Router();

// All routes require authentication
router.use(authMiddleware);
router.use(defaultLimiter);

// Express 4 does not catch async errors automatically.
// Every handler must wrap await calls in try/catch and pass errors to next().
// Without this, an unhandled promise rejection silently returns 500
// with no useful message and no audit log entry.

// GET /api/projects — list all (Viewer and above)
router.get('/', async (req, res, next) => {
  try {
    const projects = await repo.listProjects(req.user.tenantId);

    audit.log({
      tenantId:   req.user.tenantId,
      userId:     req.user.userId,
      userEmail:  req.user.email,
      userRole:   req.user.role,
      action:     'VIEW',
      resource:   'projects',
      ipAddress:  req.ip,
      userAgent:  req.headers['user-agent'],
    });

    res.json(projects);
  } catch (err) {
    next(err);
  }
});

// GET /api/projects/:id — single project (Viewer and above)
router.get('/:id', async (req, res, next) => {
  try {
    const project = await repo.getProject(req.params.id, req.user.tenantId);
    if (!project) return res.status(404).json({ error: 'Not found' });
    res.json(project);
  } catch (err) {
    next(err);
  }
});

// POST /api/projects — create (Member and above)
router.post('/', requireRole('Member', 'TenantAdmin', 'SuperAdmin'), async (req, res, next) => {
  try {
    const { name, description } = req.body;
    if (!name) return res.status(400).json({ error: 'name is required' });

    const project = await repo.createProject({
      tenantId:    req.user.tenantId,
      name,
      description,
      createdBy:   req.user.userId,
    });

    audit.log({
      tenantId:    req.user.tenantId,
      userId:      req.user.userId,
      userEmail:   req.user.email,
      userRole:    req.user.role,
      action:      'CREATE',
      resource:    'projects',
      resourceId:  project.id,
      newValues:   project,
      ipAddress:   req.ip,
      userAgent:   req.headers['user-agent'],
    });

    res.status(201).json(project);
  } catch (err) {
    next(err);
  }
});

// PUT /api/projects/:id — update (Member and above)
router.put('/:id', requireRole('Member', 'TenantAdmin', 'SuperAdmin'), async (req, res, next) => {
  try {
    const oldProject = await repo.getProject(req.params.id, req.user.tenantId);
    if (!oldProject) return res.status(404).json({ error: 'Not found' });

    const updated = await repo.updateProject(req.params.id, req.user.tenantId, req.body);

    audit.log({
      tenantId:    req.user.tenantId,
      userId:      req.user.userId,
      userEmail:   req.user.email,
      userRole:    req.user.role,
      action:      'UPDATE',
      resource:    'projects',
      resourceId:  req.params.id,
      oldValues:   oldProject,
      newValues:   updated,
      ipAddress:   req.ip,
      userAgent:   req.headers['user-agent'],
    });

    res.json(updated);
  } catch (err) {
    next(err);
  }
});

// DELETE /api/projects/:id — TenantAdmin and above only
router.delete('/:id', requireRole('TenantAdmin', 'SuperAdmin'), async (req, res, next) => {
  try {
    const project = await repo.getProject(req.params.id, req.user.tenantId);
    if (!project) return res.status(404).json({ error: 'Not found' });

    await repo.deleteProject(req.params.id, req.user.tenantId);

    audit.log({
      tenantId:    req.user.tenantId,
      userId:      req.user.userId,
      userEmail:   req.user.email,
      userRole:    req.user.role,
      action:      'DELETE',
      resource:    'projects',
      resourceId:  req.params.id,
      oldValues:   project,
      ipAddress:   req.ip,
      userAgent:   req.headers['user-agent'],
    });

    res.json({ deleted: true });
  } catch (err) {
    next(err);
  }
});

module.exports = router;

Testing Tenant Isolation

Skip the isolation tests and you're flying blind. The application keeps running, nothing throws an error, but two customers are reading each other's data.

I've watched this sit undetected in production for months because nothing actually broke. The wrong data just showed up quietly. Automated tests on every pull request are the only reliable way to catch it early.

// tests/tenantIsolation.test.js
require('dotenv').config();  // must be first — loads DATABASE_URL and REDIS_URL
const request = require('supertest');
const app     = require('../app');
const { generateToken } = require('../src/utils/token');
const { pool }        = require('../db');
const { redisClient } = require('../db/redis');

// Test fixture: two isolated tenants, one project in Tenant B
async function seedTestData() {
  // Clean up from any previous run to avoid unique-constraint failures
  await pool.query(`DELETE FROM projects WHERE name LIKE 'TEST-%'`);
  await pool.query(`DELETE FROM tenants WHERE name IN ('Tenant A', 'Tenant B')`);

  const tenantA = (await pool.query(
    `INSERT INTO tenants (name, plan) VALUES ('Tenant A', 'pro') RETURNING id`
  )).rows[0].id;

  const tenantB = (await pool.query(
    `INSERT INTO tenants (name, plan) VALUES ('Tenant B', 'pro') RETURNING id`
  )).rows[0].id;

  const userA = (await pool.query(
    `INSERT INTO users (tenant_id, email, role) VALUES ($1, 'usera@a.com', 'Member') RETURNING id`,
    [tenantA]
  )).rows[0].id;

  // userB owns the project in Tenant B — satisfies the created_by FK constraint
  const userB = (await pool.query(
    `INSERT INTO users (tenant_id, email, role) VALUES ($1, 'userb@b.com', 'Member') RETURNING id`,
    [tenantB]
  )).rows[0].id;

  const projectB = (await pool.query(
    `INSERT INTO projects (tenant_id, name, created_by)
     VALUES ($1, 'TEST-Secret Project', $2) RETURNING id`,
    [tenantB, userB]
  )).rows[0].id;

  return { tenantA, tenantB, userA, projectB };
}

describe('Tenant Isolation', () => {
  let data;

  beforeAll(async () => {
    data = await seedTestData();
  });

  afterAll(async () => {
    await pool.query(`DELETE FROM tenants WHERE name IN ('Tenant A', 'Tenant B')`);
    await pool.end();
    await redisClient.quit();  // close Redis connection so Jest exits cleanly
  });

  test('Tenant A user cannot read Tenant B project', async () => {
    const token = generateToken({
      userId:   data.userA,
      tenantId: data.tenantA,   // ← Tenant A token
      email:    'usera@a.com',
      role:     'Member',
    });

    const res = await request(app)
      .get(`/api/projects/${data.projectB}`)  // ← Tenant B's project ID
      .set('Authorization', `Bearer ${token}`);

    // Must be 404, not 200 or 403
    expect(res.status).toBe(404);
  });

  test('Tenant A user cannot list Tenant B projects', async () => {
    const token = generateToken({
      userId:   data.userA,
      tenantId: data.tenantA,
      email:    'usera@a.com',
      role:     'TenantAdmin',
    });

    const res = await request(app)
      .get('/api/projects')
      .set('Authorization', `Bearer ${token}`);

    expect(res.status).toBe(200);
    // Response must contain zero Tenant B projects
    const names = res.body.map(p => p.name);
    expect(names).not.toContain('TEST-Secret Project');
  });

  test('Viewer cannot delete a project', async () => {
    const token = generateToken({
      userId:   data.userA,
      tenantId: data.tenantA,
      email:    'usera@a.com',
      role:     'Viewer',         // ← Viewer role
    });

    const res = await request(app)
      .delete(`/api/projects/${data.projectB}`)
      .set('Authorization', `Bearer ${token}`);

    expect(res.status).toBe(403);
  });
});

Run the tests:

npm test

Three tests, three boundaries confirmed. Wire these into CI so they run on every pull request. A future refactor that quietly drops the tenant_id filter will get caught before it ships.

Troubleshooting

Tenant A can see Tenant B's data

One query is missing the AND tenant_id = $N clause. Search every repository file for SELECT statements and check each one. It's almost always this.

`403 Forbidden` on a route that should be accessible

The role string in the JWT doesn't match what requireRole() is checking. Check the exact string in the token payload. 'member' and 'Member' aren't the same thing. Paste your token into jwt.io and look at the role field directly.

Rate limiter isn't working

Redis is probably not connected. Log redisClient.status before the server starts. If it's not ready, the limiter has fallen back to in-memory, which means restarts reset all counters and tenant-scoped limiting stops working.

Audit log table growing very large

Expected behaviour. Audit tables grow, that's the point. Once it gets large, ship rows older than a year to S3 or Azure Blob and keep querying against a smaller hot table. Most compliance requirements want at least 12 months of accessible logs anyway. Just don't DELETE from the table itself.

`jwt.verify` throws `JsonWebTokenError: invalid signature`

The secret that signed the token doesn't match JWT_SECRET in the environment where you're verifying it. This comes up most when switching between environments or when a second service has a different value in its .env. Every service that calls jwt.verify needs the exact same secret. Copy it across, don't retype it.

Wrapping Up

The system you've built: row-level isolation in the repository, role checks before the handler runs, an audit table the app can't touch, and rate limits per tenant. That's the whole thing.

The tests are what I see dropped most often. Teams build the isolation, ship it, and never write something that actually proves cross-tenant data can't leak. Then a query gets refactored six months later and the tenant_id filter quietly disappears. CI catches it. Manual code review rarely does.

Schema-per-tenant comes up eventually if your product grows large enough. But not at the start. Row-level isolation handles more scale than most teams will ever hit, and it costs a fraction of the operational overhead.

The full working code is available on GitHub: nodejs-multitenant-saas-api

HRV Data Is Everywhere. Here's What It Actually Means

Shradha Puri — Mon, 20 Jul 2026 20:45:33 +0000

Health data is having a moment. Of all the metrics receiving the most developer interest at present, there’s nothing like heart rate variability (HRV). It’s a feature found on every major SDK for wearables, every health platform, and every wellness app pitch deck.

But a surprising percentage of people building around this metric don’t really know what it means or why it even matters for the apps they're building. So consider this more as a grounding in what HRV actually means. It should be useful whether you're designing the feature, writing the copy, or just trying to make sense of your own ring data.

What HRV Actually Measures
Why the Context Around HRV Data Matters More Than the Number
Where HRV Data Gets Misused
Principles for Working with HRV
A Note on Privacy
Wrap Up

What HRV Actually Measures

Heart Rate Variability (HRV) isn't heart rate. Instead, it’s the variability of time intervals between subsequent heartbeats. In case your heart works at 60 bpm, it doesn’t imply that each heartbeat happens exactly once per second. The intervals may vary from 900 milliseconds to 1100 milliseconds, and that’s what HRV actually is.

Increased HRV usually indicates proper functioning of the autonomic nervous system and the ability to change states efficiently, switching from stress to relaxation. Decreased HRV is often an indicator of being exhausted, sick, or under increased physiological stress.

This is the measure which top athletes obsessively monitor. Also, it can be helpful for those who suffer from chronic conditions, insomnia, and burnout.

Here’s the part that trips people up: HRV isn’t one number. It’s a family of metrics, each calculated differently.

RMSSD stands for Root Mean Square of Successive Differences and is the most common metric that you'll come across. RMSSD indicates short-term variation and forms the basis for the majority of HRV scores on wearable consumer devices.
SDNN stands for Standard Deviation of NN intervals and indicates general variability, being used primarily in clinical research settings.
The LF/HF ratio refers to the HRV frequency domains, dividing the HRV into two parts of different frequencies.

All of the major HRV providers, such as Apple Health, Garmin, Fitbit, and Oura, provide HRV scores, yet they don’t always agree on which metric they’re surfacing. And they don’t always tell you.

Why the Context Around HRV Data Matters More Than the Number

HRV, in its raw form, is almost entirely meaningless. A reading of 45ms could either be an indication of peak physical health in one person or a warning sign of poor physical well-being in another. Factors such as age, physical fitness, timing of measurements, and even sleeping position influence the normal HRV value.

Understanding that this is perhaps the biggest factor in interpreting HRV is the first step when developing features around it.

Commercial wearables have managed to address this issue by establishing a personal baseline based on readings taken in 30-90 days of wearing the device and presenting deviation from this baseline rather than absolute values.

The lesson here is simple: if your product has anything to do with health (recovery apps, coaching platforms, and so on) then you must follow the same logic, otherwise your users will get confused.

Showing them a raw reading of 38ms won’t make much sense anyway. The better pattern: track trends over time, flag deviations, and let the data explain itself relative to the user’s own history. Not population averages, not clinical reference ranges, but their own.

Where HRV Data Gets Misused

Treating HRV as Real-time Data

HRV isn't intended for real-time measurements. The most reliable HRV values can be obtained by collecting overnight data, as this allows minimizing external factors’ impact on the result.

This is why companies like Oura, Apple, and WHOOP rely precisely on nighttime HRV values. If a product is measuring HRV in the middle of workouts and business meetings, then you're most probably dealing with noise rather than insights.

Ignoring Measurement Method Differences

ECG-based HRV, which can be measured by a chest strap or a professional-grade ECG monitor, is much more precise compared to PPG-based HRV measured by optical sensors incorporated into consumer wearables.

During nighttime, the accuracy difference between these types of data is minimal but grows when a person becomes more active. If your app needs precision – say, you’re building for clinical or research contexts – know your source.

Overcomplicating the Output

Users aren’t cardiologists. Having RMSSD, SDNN, and LF/HF appear in your dashboard may seem complete, but really, it just makes things confusing and causes analysis paralysis.

The most successful consumer HRV applications boil everything down to a readiness or recovery metric. Having more than two HRV metrics on one screen should make you think twice.

Skipping Data Quality Checks

Wearable data is inherently messy due to motion artifacts, loose placement, uneven wear, and so on. Before including a reading in a calculation, do your homework and see whether data quality was flagged by the wearable. Apple’s HealthKit provides metadata for this purpose, as does the Oura API.

Principles for Working with HRV

There are a few patterns that hold up across most use cases:

Build for the baseline first: Put a data window threshold on any feature using HRV metrics as its basis. Fourteen days may be a good minimum, but thirty is preferable. No trends can be shown without sufficient historical data.
Normalize before comparing: When comparing HRV across users (let’s say for a team wellness dashboard), it makes much more sense to use a z-score normalization with respect to a baseline of each user than just compare absolute numbers. A reading of 55ms for one user and 40ms for another might actually signify the same physiological state, once you account for each person's baseline.
Design for trends, not single data points: One bad HRV day is almost certainly random. But three or four consecutive days of bad readings coming from an athlete used to having significantly higher readings is definitely something to pay attention to. Again, sparklines and rolling averages for seven days will help more than a single point comparison.
Be honest about what HRV can’t tell you: It could show signs of physiological stress, but it can't differentiate between causes of this stress, such as intense training, poor sleep, general anxiety, or even developing illness.

A Note on Privacy

HRV resides within the grey area that most product teams tend to overlook. While it may not be classified as PHI by HIPAA in consumer-oriented scenarios, it's very personal biometric information. HRV patterns may give an indication of stress levels, mental well-being, and even provide predictive information regarding the onset of diseases.

If you’re storing or processing HRV data, it's a good idea to consider your data retention practices, the third parties you share the information with, and whether your disclosures to users have been clear enough. Users are getting smarter about this. Regulators are, too.

Wrap Up

HRV is actually valuable data. This isn’t just marketing talk. There’s plenty of science behind HRV and the technology to measure it has been getting more refined. But it’s worth remembering that, as with most health data, it’s only valuable if used intelligently.

Know what you’re building upon. Design for personal context, not universal benchmarks. Make sure the data is easy to consume. And don’t take it any less seriously than your users do when they wear these devices every day, hoping it will make them feel better.

That’s really what it comes down to.

That's Embarrassing: Why Frontier AI Still Makes Things Up, and What to Do About It

Omer Rosenbaum — Mon, 20 Jul 2026 16:59:02 +0000

It's mid 2026, and the best frontier models out there still hallucinate. I want you to gain two things from reading this article: understanding that AI hallucinations are still real and possibly harmful, and an intuition as to why they might be so ubiquitous.

Before we get into AI at all, I want you to do something with me.

Listen to this clip of a football crowd chanting. What are they saying?

If you’re like most people, you have no idea. It’s a smear of sound. So let me help you: keep listening, and read along.

Bart Simpson bouncing?

Listen again.

Baptism piracy?

Again.

Lobsters in motion?

Lactates in pharmacy?

Rotating pirate ship?

The crowd is chanting the exact same phrase every single time. The audio never changes, but every time you read a different caption, your brain heard something different, and it heard it confidently. You didn’t experience doubt. You experienced “oh, they’re clearly saying Bart Simpson bouncing.”

What are they actually chanting? These are fans of Derby County, a UK football team, and they’re singing [1]:

“That is embarrassing.”

Play the clip one more time with that in mind, and you’ll hear it perfectly.

This article is based on my talk “Embarrassing AI.” If you prefer the video, you can watch it here. All the stories below are real, all of them happened on frontier models, and most of them happened in the last month or two.

Every source, plus a few cases that didn’t make the cut, live on the companion resources page. Inline citations below point to the References at the end.

You Just Hallucinated

What you just experienced has a name: phonemic restoration [2]. Your auditory system got an ambiguous input (the chant) and something to disambiguate it (the caption on the screen), so it filled the “gap”. It predicted the most plausible meaning given the context, and then it reported that prediction to you as if it were the thing you actually heard.

That move, where you meet an input you can’t fully resolve and fill the gap with something plausible and confident instead of reporting “I can’t tell,” is something that your brain experiences (as you’ve just seen), and also something that LLMs experience.

Image 1: The same top-down move in a brain and a model: an ambiguous input, a gap filled by prediction, and a confident output that is never flagged as a guess. (Source: Brief)

(Note: all images in this post were created by me, and included in my talk.)

So let me make a claim that should be uncontroversial by the end of this article: no, we're not past the embarrassing AI tales.

As of writing these words, it’s June 2026. The models are astonishing, honestly more capable than I predicted they’d be by now. And they still make things up, confidently, in production, in ways that range from funny to business-ending.

This article has two parts:

The tales, a short parade of recent failures, in two acts: chatbots that answer wrong, then agents that act wrong.
Why it happens: the intuition first, then an actual look inside the model, and finally what to do about it if you’re shipping AI yourself.

Watch the dates as we go. Some of these are a year old. Most are very, very recent.

Part 1: The Tales

Act I — Chatbots (when AI answers)

1. Cursor, April 2025

Say you use Cursor, the agentic IDE. You switch laptops, log in on the new one, and Cursor logs you out of the old one. That’s pretty annoying 😒

So you ask support: “I get logged out every time I switch laptops. Why?”

The reply:

“Cursor is designed to work with one device per subscription, as a core security feature.”

Plausible! Except it’s completely false. There's no such policy. “Support” was an AI bot, and it had invented the policy on the spot, handing the same fabricated rule to multiple users, as if reading from a manual that didn’t exist.

It caused a wave of angry posts, and Cursor’s co-founder had to publicly clarify: no such policy, use Cursor on as many machines as you like. [3]

🤦 That's embarrassing. 🫢

2. A company I know, April 2026

This one’s from a friend’s company, so I’ll keep the details vague. They sell software to other businesses, and they have a support chatbot. The bot answers questions based on information it retrieves from an internal database.

They shipped a new feature and forgot to update that database. So a paying customer asked how to use the new feature, and the bot, having never heard of it, replied: “We don’t have that feature.” The customer pushed back: “What? I’m paying for it after my upgrade.” And the bot, this was on Opus 4.6, not long ago, replied:

“Honestly? They’re ripping you off.”

The “they” is the company running the bot. The support agent took the customer’s side against its own employer, because it didn’t know about the feature and filled the gap with the most coherent story it could assemble.

🤦 That's embarrassing. 🫢

3. Virgin Money, January 2025

Virgin Money is a real UK high-street bank. A customer with two ISAs (tax-free savings accounts) asked the bank’s chatbot, on the bank’s own site, to merge them:

Customer: “I have two ISAs with Virgin Money, can I merge them into one?”

Virgin Money: “Please don’t use words like that. I won’t be able to continue our chat if you use this language.”

The offending word? Virgin, the name of the bank. The filter saw a token its prior associated with profanity and never checked whether it fit the context. Note that this is the opposite failure of the Cursor bot: Cursor over-answered, this one over-refused. But it’s the same missing check: does this reading actually fit here? [4]

🤦 That's embarrassing. 🫢

4. Sullivan & Cromwell, April 2026

This is one of the most prestigious law firms on Earth, the lawyers other lawyers hire. They’re OpenAI’s own outside counsel.

In April 2026 they filed an urgent court brief, drafted with AI, that contained over 40 fake citations: case names that don’t exist, misquoted authorities, and so on.

The opposing lawyers caught it, and S&C had to write the judge a letter that amounts to “please don’t sanction us for the AI hallucinations.” [5]

If some random filing had fake citations, I wouldn’t bother putting it here. It’s not legitimate, yet it happens. But these are the people who advise OpenAI on how to use it responsibly, and they filed fabricated citations in court.

🤦 That's embarrassing. 🫢

And it’s not just them. There’s a public database, maintained by Damien Charlotin, of court cases where a judge has explicitly written that they received fabricated or inaccurate AI-generated content.

As of late June 2026, it stood at 1,633 cases, up from around 700 in January. That’s roughly five to six new documented cases per day, and the maintainers say they can’t keep up. [6]

Image 2: A cumulative curve of catalogued hallucinated court filings climbing from a flat line in early 2025 to 1,633 by mid-June 2026. (Source: Brief)

🤦 That's embarrassing. 🫢

Act II — Agents (when AI acts)

So far you've seen that chatbots hallucinate in embarrassing ways, but all they do is answer questions. What can happen when we allow AI to take action?

1. PocketOS, April 2026

Jer Crane runs PocketOS, car-rental software with real customers renting real cars. He gave Claude Opus 4.6, working in Cursor, a routine task in the staging environment. He went to lunch, came back, and the production database was gone. The backups too, because Railway kept them in the same volume. He never touched production. The agent reached in from staging and deleted it.

The whole thing took nine seconds. Here’s the chain, from his post-mortem:

Working a routine task in staging, the agent hits a credential mismatch, irrelevant to the actual task.
On its own, it decides the fix is to delete and recreate the volume. It guessed the delete would be scoped to staging. It never checked.
It searches the filesystem for an API token and finds an unrelated, over-scoped one, created for domain management but with blanket destructive permissions across the whole API.
It fires a destructive call against the production volume, with no confirmation.
Backups lived in that same volume, so they went with it.
Nine seconds, end to end.

Image 3: The nine-second kill chain: staging credential mismatch, an unchecked decision to delete the volume, an over-scoped token grabbed from an unrelated file, a destructive call against production, and backups gone with it. (Source: Brief)

When Crane later asked it why, the agent wrote:

“I decided to do it on my own to ‘fix’ the mismatch, when I should have asked you first.” — Claude Opus 4.6

PocketOS survived only because Railway’s CEO restored the data by hand from Railway’s own internal backups. Their latest recoverable backup was three months old. That’s the precise mood of 2026: an AI confessing, in fluent cursive, after destroying your business. [7]

🤦 That's embarrassing. 🫢

2. Replit, July 2025

Going back a year, for contrast. Jason Lemkin, founder of SaaStr, was trying Replit’s AI agent. He put it in a code freeze. During the freeze, the agent deleted the production database anyway. Lemkin asked if there was a backup:

Agent: “Rollback won’t work.”

He tried rollback anyway. Rollback worked fine.

So here’s my slightly sarcastic read of “progress”: in July 2025, the agent deleted your data and then lied that it couldn’t be recovered. By April 2026, the agent deletes your data and it’s telling the truth, it’s really gone.

When someone tells me these are “GPT-2 problems” that we’ve moved past, this is what I point to. They still happen, today, on the best models we have. [8]

Part 2: Why It Happens

I’ve hopefully convinced you these tales are both funny and severe. So why do they happen? While this isn’t a heavy math post, I want to give you some intuition, and then actually open the box thanks to some tools and the latest research on the topic.

It doesn’t look things up, it predicts the next token

A lot has been written about how LLMs operate, but there are a few things I find worth reiterating in this context (pun intended).

When a model generates text without tools, it isn’t retrieving facts. At each step, it looks at the context and produces a probability for every token in its vocabulary as the next one. Given “The capital of France is”, the distribution spikes hard on Paris, and that happens to be true. [9]

Now take the Cursor bot. Given “Why do I get logged out on my second device?”, the distribution might spike just as hard on “a core security feature.” (It’s not one token, but bear with me as I write it for simplicity, while meaning: core, then security, then feature, each a confident continuation.)

Note that both distributions can have the same confident peak. One continuation is true, the other is fabricated, and the shape of the distribution can't tell you which is which. Confidence isn't knowledge.

Moreover, the model doesn’t have to pick the token with the highest probability. And also, when it picks a token, you don’t know if it was a clear peak within the distribution, or yet another token with a relatively low probability.

Image 4: Two next-token distributions with the same tall, confident peak: one over a true continuation, one over an invented one, and the shape gives no way to tell them apart. (Source: Brief)

The model was trained to guess

Why does it lean toward answering at all, instead of saying “I don’t know”? Think about how we grade LLMs: benchmarks, largely multiple-choice. Picture a question you have no clue about. Let’s say I give you this question when you have no knowledge in Chemistry:

Which enzyme fixes CO2 in the Calvin cycle?

Leave it blank: 0 points.
Guess and get it wrong: 0 points.
Guess and get it right: +1 point.

Under that scoring, guessing strictly dominates abstaining. If you don’t know, you should always take a shot. Train a model against millions of such items and it internalizes exactly that: a confident answer is worth more than “I can’t tell.” We rewarded hallucination, then act surprised when we get it. [10]

And it’s not only the benchmarks: the raw pretrained model is fairly well-calibrated, then human-feedback fine-tuning flattens that calibration. We literally train the hedging out. [11]

Image 5: A multiple-choice benchmark question where a correct answer scores +1, a wrong answer scores 0, and “I don’t know” also scores 0, so any guess can only help. (Source: Brief)

Opening the box: a quick tour of interpretability

For a long time, LLMs were boxes we couldn’t really understand or peek inside directly. The field of interpretability lets us look inside, and there are now public tools (and a series of excellent papers, much of it from Anthropic) that let anyone play with this on open models.

Here’s just enough to make the hallucination mechanism click. We’ll build it in three steps: how the model represents a single word, how those representations cluster into concepts we can read and even steer, and how one such concept misfiring becomes a hallucination.

Embeddings vs. activations

Every token maps to a vector called an embedding. Note that the token bank has the same embedding regardless of context, even though in the sentence “I sat by the riverbank” and in “I deposited cash at the bank“, this token means very different things.

The disambiguation happens inside the network. As the token flows up through the transformer’s layers, it picks up activations, and the activations for bank in those two sentences diverge. Context reshapes the representation as it climbs. [12]

Image 6: The word “bank” starts as one fixed embedding, then in “river bank” versus “cash at the bank” flows up through the layers into two different activation vectors. (Source: Brief)

This isn't unique to machines. Read this sentence:

The old man the ship.

Most people parse “the old man” as a noun phrase and then hit a wall. Re-read it: “the old” are the people, and “man” is the verb, as in the old crew or sail the ship.

These are called garden-path sentences (my linguistics thesis was on them, so I’ll admit a bias: I enjoy them more than most people). The word man, given the prior the old, gets a very high probability of being a noun. The context primes a prediction, and the prediction is wrong.

It’s the same move as the chant, and the same move the model makes at every token: the words around man reshape what it means, exactly as they reshaped bank a moment ago.

Features

So back to those activations inside the model: recurring patterns of them correspond to interpretable concepts, called features. Tools like Neuronpedia act as a free, public microscope for open models (Gemma, Llama, and friends, not Opus or GPT). [13]

How do we know what a feature means? We feed the model thousands of texts and watch where a given feature lights up (that is, gets activated). If it fires on bear, rabbit, and elephant but ignores most other tokens, when we ask another model to label it from those activations, it may come up with “animals / living things,” and now we have a name for that internal feature.

By using tools like Neuronpedia, we can play with these features and actually see them on a real model.

Image 7: A real feature dashboard on Neuronpedia, showing the text snippets where one feature activates and the label inferred from them. (Source: Brief)

Features are causal

And you don’t have to take my word for it, you can do it yourself: Neuronpedia’s steering interface lets you grab a feature in an open model, clamp its weight up, and watch the output visibly bend toward that concept.

That's the same move Anthropic described when they took the Golden Gate Bridge feature within the model, and turned its weight way up, and suddenly asking that model for a chocolate-covered-pretzels recipe routed the chocolate over the bridge, and asking how it would spend $10 got you a suggestion to drive across the Golden Gate Bridge and pay the toll. (This was the real, public “Golden Gate Claude.”)

Turning a feature up changed the output, so these internal representations aren’t passive read-outs. They steer generation. [14]

The same was shown with a clean causal swap. Give the model “The capital of the state containing Dallas is…” and internally a Texas feature fires, leading to the output Austin. How do we know Texas was really the hidden step? We reach in and force that feature from Texas to California, and the output changes to Sacramento. The wiring is real: context fires features, and features guide what comes out.

Image 8: The prompt about Dallas is unchanged, but forcing the internal “Texas” feature to “California” by hand flips the output from Austin to Sacramento. (Source: Brief)

The hallucination circuit

Now everything comes together. Anthropic’s interpretability work surfaced something like two interacting circuits [15]:

A default “I can’t tell” reflex that is on by default. You can think of it as a brake – guiding the model not to make stuff up.
A “do I know this?” feature that, when it fires, suppresses that brake so the model provides an answer.

In the healthy case this is exactly right: you ask something the model knows, “do I know this?” fires, the brake releases, you get a correct answer. The claim about hallucination is that it’s this switch misfiring, firing on a familiar shape with nothing real behind it.

And if that’s the mechanism, we should be able to force the misfire, and Anthropic did just that.

Ask: “What sport does Michael Batkin play?” That name doesn’t correspond to anyone the model knows, so “do I know this?” stays quiet, the brake stays on, and you get the right behavior: “I can’t find a record of anyone named Michael Batkin.”

Image 9: The resting circuit on the same question: the “can’t answer” brake is ON, the “do I know this?” feature stays quiet because the name is unfamiliar, and the model correctly declines. (Source: Brief)

Now researchers reach in and force the “do I know this?” feature on. The brake releases, and out comes a confident “Michael Batkin plays chess.” The model never actually knew a sport. It knew, falsely, that it knew the person, and that was enough to release the brake and fabricate the rest.

Image 10: Forcing the misfire on “What sport does Michael Batkin play?”: the “I can’t tell” brake is suppressed, the “do I know this?” feature is clamped on for a person who doesn’t exist, and the model invents a confident answer. (Source: Brief)

Map that straight back to the Cursor bot:

Consider someone asks “How do I change the theme?” If the model genuinely “knows” this, the brake releases and you get the correct answer. ✅
But when someone asks “Is two-device login blocked?”, the words device, login, blocked all look familiar. So “do I know this?” fires on familiarity, not knowledge, the brake releases, and you get “Yes, it’s a core security feature.” ❌

This is of course not proved, as we don’t have access to the model and its features. But given the same logic that we do know works given the research on the subject, we can assume that the tokens were known, even though the policy didn't exist.

Image 11: Inside the Cursor bot: familiar words make the “do I know this?” feature misfire, which suppresses the default “I can’t tell” brake, and the bot invents “a core security feature. (Source: Brief)

Can we catch it in production?

There are different ways to go about it, and I want to highlight one that I find very elegant – namely, to watch the entropy of meanings. [16]

Ask the Cursor bot “How do I change the theme?” five times. Presuming that the bot “knows” the answer, you won’t get identical wording (it’s probabilistic). But if you cluster the answers by meaning, say with another model, you get one meaning: “go to Settings then Theme.” Low semantic entropy means a greater chance that the model actually knows this, so you can trust it.

Now ask “Is two-device login blocked?” five times. You might get “Yes, security policy,” “No, it’s allowed,” “One device per plan,” “It’s just a setting,” “Maybe, not sure.” That’s high semantic entropy, five different meanings, which is a strong signal the model is making it up.

The cost of using this method in production is real (multiple calls, more tokens, more latency, higher cost), but if you only want to surface high-confidence answers to users, sampling-and-clustering is a useful guardrail.

Image 12: Sampling a known question five times yields answers that cluster into one meaning (low entropy, trustworthy), while a made-up one scatters into many meanings (high entropy, likely confabulated). (Source: Brief).

So What Do You Actually Do About It?

It’s June 2026, the models still confabulate, and you want to ship something anyway. Here’s the short checklist.

Give the model a real way to say “I can’t tell.” Tell it to ground answers in retrieved sources and to abstain when it can’t. But prompting is necessary, not sufficient, which is why the next point matters more.
Stress-test the abstention. After you’ve told it to ground answers and cite sources, actively try to make it hallucinate. Throw questions at it whose answers don’t exist, repeatedly, until you’ve convinced yourself the “I can’t tell” path actually fires. Do it continuously to make sure your guardrails don’t break.
If a human’s name goes on the output, a human verifies it. If you’re a lawyer filing with a court, you can't, at least for now, hand that to a model and trust it.
Don’t give agents permission to cause damage. This is the hard one, because agents need to do things to be useful. But the PocketOS lesson is unambiguous: scope tokens narrowly, require confirmation on destructive operations, keep production unreachable from playgrounds, and put backups in separate volumes. If you let an agent delete production, then occasionally it will delete production.

Wrapping Up

We started with a football crowd and ended inside a transformer. Phonemic restoration in your auditory cortex and next-token prediction in a model are the same top-down move: meet an input you can’t fully resolve, and fill the gap with the most plausible, confident thing instead of admitting you can’t tell.

The tales (Cursor, Virgin Money, Sullivan & Cromwell, the 1,633 court cases, PocketOS in nine seconds, Replit) are funny until they cost a business.

The why is now legible: models were trained to prefer answering over abstaining, and inside them a “do I know this?” switch can fire on familiarity rather than knowledge, releasing the brake and letting a confident fabrication out.

And the fixes are mostly not magic. They’re abstention you actually tested, human verification where it counts, and agents whose blast radius you deliberately shrank.

We're not past the embarrassing tales. But we now understand them well enough that shipping one is, increasingly, a choice.

References

Every case here, plus a few that didn’t make the article, has primary sources collected on the companion resources page.

“That is embarrassing” — the Derby County chant. Laughing Squid, Football Crowd Chanting “This Is Embarrassing”; audio via the Filter Stories podcast, episode.
Phonemic restoration effect. Wikipedia. Related illusions: the McGurk effect and Yanny vs. Laurel.
Cursor’s support bot invents a policy (Apr 2025). The Register, “Cursor AI support bot lies”; AI Incident Database #1039.
Virgin Money’s chatbot blocks its own name (Jan 2025). Fortune; CX Today.
Sullivan & Cromwell’s “please don’t sanction us” letter (Apr 2026). Above the Law, “Sullivan & Cromwell Files Emergency … Letter”; CNN Business.
The AI Hallucination Cases database, maintained by Damien Charlotin: damiencharlotin.com/hallucinations. On why courts can’t keep up: Cronkite News.
PocketOS — production database gone in nine seconds (Apr 2026). The Register, “Cursor/Opus agent snuffs out PocketOS”; Tom’s Hardware; Fast Company.
Replit’s agent deletes prod during a code freeze (Jul 2025). Fortune; eWeek; AI Incident Database #1152.
Next-token prediction, explained. Jay Alammar, “The Illustrated GPT-2” — a visual walkthrough of how a language model emits a probability distribution over its vocabulary and samples the next token. Foundational paper: Bengio, Ducharme, Vincent & Jauvin, “A Neural Probabilistic Language Model” (JMLR, 2003).
Kalai, Nachum, Vempala & Zhang, “Why Language Models Hallucinate” (OpenAI, 2025). arXiv:2509.04664.
OpenAI, “GPT-4 Technical Report / System Card” (2023) — the pretrained model is well-calibrated. RLHF fine-tuning flattens that calibration (see the calibration figure).
Embeddings vs. activations. Static token embeddings give each word one fixed vector: Mikolov, Chen, Corrado & Dean, “Efficient Estimation of Word Representations in Vector Space” (word2vec, 2013); accessible walkthrough: Jay Alammar, “The Illustrated Word2vec”. That representation becomes context-dependent inside the network, resolving cases like bank: Peters et al., “Deep contextualized word representations” (ELMo, 2018).
Neuronpedia — a free, public microscope for the features of open models.
Anthropic, “Golden Gate Claude” (2024) — feature steering made public.
Anthropic, “On the Biology of a Large Language Model” (2025) — the known-entity feature that suppresses the “I can’t tell” circuit, the Dallas→Austin swap, and the Michael Batkin misfire. Readable companion: “Tracing the thoughts of a language model”.
Farquhar, Kossen, Kuhn & Gal, “Detecting hallucinations in large language models using semantic entropy” (Nature, 2024).

If you enjoyed this, I go deeper on systems and internals on my Brief YouTube channel. Questions or pushback? I’d love to hear them, leave a comment. Thanks for reading!

From Manufacturing to Microservices: Universal Lessons About Reliability

Manish Shivanandhan — Mon, 20 Jul 2026 13:53:02 +0000

Software engineers often think reliability is a modern challenge.

We discuss uptime, distributed systems, observability, and fault tolerance as if they belong exclusively to cloud computing.

In reality, engineers have been solving reliability problems for centuries. Manufacturing plants, civil engineering projects, and industrial assembly lines have all faced the same fundamental question: how do you build systems that continue working even when individual components fail?

Whether you're assembling a bridge, manufacturing a vehicle, or deploying a microservice architecture, reliability is never accidental. It comes from thoughtful design, continuous testing, and a willingness to learn from failure.

The technology has changed, but the engineering principles have remained remarkably consistent.

In this article, we'll explore the timeless engineering principles that make systems reliable, whether they're factory assembly lines or cloud-native applications.

You'll see how concepts like redundancy, root cause analysis, realistic testing, and observability have guided engineers for decades, and why these lessons are just as valuable when building modern software.

By the end, you'll have a broader perspective on reliability and practical ideas you can apply to design more resilient systems.

What We'll Cover:

Every System Is Only as Reliable as Its Weakest Link
Small Defects Become Big Problems
Root Cause Analysis Is More Important Than Finding Someone to Blame
Redundancy Is an Investment, Not a Waste
Testing Should Simulate Reality
Observability Is Better Than Guesswork
Reliability Is a Continuous Process
Great Engineering Is Predictable Engineering

Every System Is Only as Reliable as Its Weakest Link

A modern application may consist of dozens or even hundreds of services. Each service depends on databases, APIs, queues, caches, storage systems, and network infrastructure. A failure in any one of these components can ripple throughout the entire application.

Manufacturing systems work in much the same way. A perfectly designed product can still fail if one component is installed incorrectly or if quality checks are skipped during production.

This highlights an important lesson for software engineers: reliability isn't about building perfect components. It's about ensuring the entire system can tolerate imperfections.

Experienced engineering teams rarely assume everything will work perfectly. Instead, they ask questions like:

What happens if this service becomes unavailable?
Can another component take over?
How quickly can the system recover?
Can users continue working while the issue is resolved?

Designing around failure is often more valuable than trying to eliminate every possible failure.

Small Defects Become Big Problems

Many major outages begin with something surprisingly small.

A configuration value is incorrect. A certificate expires. A retry loop overwhelms a downstream service. A cache becomes stale. An API starts returning unexpected responses.

None of these issues appear catastrophic on their own. The real damage comes when multiple small problems combine into a larger system failure.

Manufacturing follows the same pattern. A slightly misaligned component may seem harmless during assembly, but over time it can increase wear, reduce efficiency, and eventually cause an expensive breakdown.

Software systems behave similarly. Small technical debt accumulates until reliability begins to suffer.

This is why experienced teams invest in routine maintenance. Refactoring, dependency updates, infrastructure improvements, and automated testing may not deliver visible product features, but they significantly reduce operational risk.

Reliability is built through consistent attention to small details.

Root Cause Analysis Is More Important Than Finding Someone to Blame

When production systems fail, organisations often rush to identify who made the mistake.

The better question is why the mistake was possible in the first place.

Perhaps deployment safeguards were missing. Or monitoring failed to detect unusual behaviour. Or the documentation was outdated.

Perhaps code reviews overlooked an important edge case.

Strong engineering cultures focus on improving systems rather than assigning blame.

This philosophy exists throughout engineering disciplines. Manufacturing companies spend significant effort studying common failures in material assembly because understanding why defects occur leads to stronger processes, better inspections, and fewer future failures.

Software teams benefit from the same mindset. Every production incident becomes an opportunity to improve automation, monitoring, documentation, and testing rather than simply fixing the immediate issue.

Blameless postmortems encourage engineers to report problems early because they know the goal is learning rather than punishment.

Over time, this creates systems that become progressively more reliable.

Redundancy Is an Investment, Not a Waste

At first glance, redundancy appears inefficient.

Why run multiple application instances? Why maintain replica databases? Why deploy services across multiple regions? Why store multiple backups?

The answer becomes clear when failures occur.

If every critical component has only one instance, every failure becomes a complete outage.

Manufacturing plants frequently maintain backup equipment for exactly this reason. Downtime often costs far more than maintaining spare capacity.

Cloud infrastructure follows the same principle. Load balancers distribute requests across multiple servers. Database replicas reduce the impact of hardware failures. Message queues prevent temporary spikes from overwhelming downstream systems.

Multiple availability zones protect against regional outages.

Redundancy increases costs, but it dramatically improves resilience.

Organisations must decide whether the cost of additional infrastructure is lower than the potential cost of downtime.

For customer-facing applications, the answer is usually yes.

Testing Should Simulate Reality

Passing unit tests doesn't necessarily mean software is reliable.

Many production failures occur because real-world environments behave differently than development machines.

Networks become slow. External APIs return unexpected responses. Databases experience temporary latency. Users generate traffic patterns nobody anticipated.

Reliable engineering requires testing under realistic conditions.

Integration tests verify communication between services. Load testing evaluates system behavior under heavy traffic. Chaos engineering intentionally introduces failures to measure resilience.

Disaster recovery exercises ensure backup procedures actually work.

Manufacturing industries also perform stress testing before products reach customers. Components are exposed to extreme temperatures, vibration, pressure, and repeated use to identify weaknesses before they become field failures.

Software deserves the same level of scrutiny. The closer testing resembles production, the fewer surprises engineers encounter after deployment.

Observability Is Better Than Guesswork

When a production issue occurs, every minute matters. Without visibility into system behaviour, engineers are forced to make educated guesses. Guessing rarely solves outages quickly.

Modern observability combines logs, metrics, traces, and alerts into a complete picture of system health.

Logs explain what happened. Metrics reveal performance trends. Distributed tracing follows requests across multiple services. Dashboards expose unusual behavior before customers notice problems.

Together, these tools dramatically reduce the time required to diagnose incidents. The goal isn't collecting more data. The goal is collecting meaningful data that answers important operational questions:

Can engineers identify the failing service?
Can they measure customer impact?
Can they determine when the problem began?
Can they verify that a fix actually resolved the issue?

Observability transforms debugging from detective work into engineering.

Reliability Is a Continuous Process

Many organisations mistakenly treat reliability as a one-time project. They improve monitoring after an outage. They add automated tests after discovering a regression. They introduce deployment pipelines after a failed release.

These improvements help, but reliability isn't something you complete once and forget.

Every new feature introduces additional complexity. Every dependency update changes system behavior. Every scaling decision creates new operational challenges.

Reliable systems require continuous evaluation.

Engineering teams regularly review incidents, remove technical debt, improve automation, and update operational documentation because yesterday's reliable architecture may not meet tomorrow's demands.

Reliability evolves alongside the software itself.

Great Engineering Is Predictable Engineering

Users rarely notice reliable systems. Nobody celebrates an application that simply works every day.

Instead, attention often focuses on new features, product launches, and innovative technologies.

Yet reliability remains one of the strongest competitive advantages any engineering organisation can build.

Customers trust applications that remain available. Developers enjoy working on systems that behave predictably. Businesses avoid the financial and reputational costs associated with outages.

Manufacturing has long understood that quality is built into every stage of production rather than inspected in at the end. Software engineering follows exactly the same principle. Reliability emerges from thoughtful architecture, disciplined testing, effective monitoring, continuous learning, and a culture that treats every failure as an opportunity to improve.

From factory floors to cloud-native microservices, the lesson remains unchanged. Strong systems aren't defined by the absence of failure. They're defined by how well they anticipate it, absorb it, and recover from it.

The technologies may continue to evolve, but the fundamentals of reliable engineering are timeless.

Hope you enjoyed this article. You can connect with me on LinkedIn.

How to Manage Secrets Securely with Azure Key Vault in Node.js

Zia Ullah — Mon, 20 Jul 2026 13:50:42 +0000

Last year a client called me about exactly this. Someone ran git log -p on a hunch and found a .env committed two years earlier, never caught. Database password, Stripe secret, JWT signing key — all still active. All still in production.

IBM's 2024 breach cost report put the average data breach at $4.88 million — and that's the average, not the worst cases.

Exposed credentials are consistently near the top of root causes. GitHub found over a million secrets leaked in public repos in 2023 alone, before you even count the private ones nobody ever discovered.

It's not a people problem. The developers I've worked with aren't careless — the architecture is just set up to fail them. A .env file gets committed once by accident. Credentials get copied and pasted into a Slack message to unblock a teammate. A Docker image gets published with secrets baked into a layer. A server gets shut down, and nobody rotates the credentials it was holding.

Azure Key Vault solves this differently. Your application fetches credentials at runtime from a centralized, encrypted service — the .env file stops being a liability because it stops holding anything worth stealing.

What you'll build is a Node.js Express API that fetches every secret from Azure Key Vault at startup. No passwords in the code. When someone quits, there's nothing in the repo to rotate. The .env ends up with one line — the vault name.

Prerequisites

Node.js 18+
An Azure account (free tier works)
Azure CLI installed and logged in (az login)
Basic knowledge of Express.js
Docker (optional — only needed for the local database test section)

What We Will Build

A Node.js Express API that:

Connects to PostgreSQL using credentials fetched from Key Vault at startup
Uses Managed Identity for authentication — no client secrets or passwords anywhere
Caches secrets in memory, so Key Vault isn't called on every request
Works locally via Azure CLI auth and in production via Managed Identity — same code, zero changes

How the Architecture Works
What Is Azure Key Vault?
Set Up the Key Vault
Create the Node.js Project
Connect to Key Vault with Managed Identity
Cache Secrets at Startup
Use Secrets in Your Express API
Test Locally
Deploy to Azure App Service
Grant Key Vault Access to the App
Rotate Secrets Without Redeploying
Troubleshooting
Wrapping Up

How the Architecture Works

Before writing any code, it helps to see the full picture:

 LOCAL DEVELOPMENT
.-------------------------------------------------------.
|                                                        |
|   [Node.js App]                                        |
|        |                                               |
|        v                                               |
|   [DefaultAzureCredential] ---> az login session       |
|        |                                               |
|        v                                               |
|   [Azure Key Vault]  ---> Returns secrets              |
|        |                                               |
|        v                                               |
|   [In-memory cache]  ---> App uses secrets at runtime  |
'-------------------------------------------------------'

 PRODUCTION (Azure)
.-------------------------------------------------------.
|                                                        |
|   [Azure App Service]                                  |
|        |                                               |
|        v                                               |
|   [DefaultAzureCredential] ---> Managed Identity       |
|        |                                               |
|        v                                               |
|   [Azure Key Vault]  ---> Returns secrets              |
|        |                                               |
|        v                                               |
|   [In-memory cache]  ---> App uses secrets at runtime  |
'-------------------------------------------------------'

Both environments run the exact same code. DefaultAzureCredential figures out where it is — locally it picks up your az login session, on Azure it uses Managed Identity. You don't switch config files and you don't manage credentials. It just works.

What Is Azure Key Vault?

Azure Key Vault is Microsoft's managed secret store — it handles secrets, keys, and certificates. For this tutorial, we're only using the secrets part: database passwords, API keys, JWT signing keys, anything your app needs to run but has no business being in your Git history.

Compared to .env files, the practical differences are worth understanding before you write any code.

Rotation is the one I notice most on real projects. Update a secret in Key Vault and every app picks it up on the next restart — no hunting down five different environment configs across staging and production.

Access control is the other big one. Each application only gets permission to read the secrets it actually needs. If one service gets compromised, it can't read credentials belonging to other services.

And every read gets logged. When something goes wrong — and eventually something will — you can see exactly which app accessed which secret, and when. That log is what auditors actually want to see.

I've sat in enough security reviews to know that "we use .env files and tell people not to commit them" doesn't satisfy an auditor. SOC 2, HIPAA, GDPR — they all want demonstrable controls. A vault with an access log is demonstrable.

Set Up the Key Vault

Run these commands. The vault name has to be globally unique across all of Azure — not just your own subscription — so pick something specific. Letters, numbers, and hyphens, 3 to 24 characters.

# Create a resource group (skip if you already have one)
az group create \
  --name keyvault-demo-rg \
  --location eastus

# Create the Key Vault (RBAC enabled by default — required for the role assignment later)
az keyvault create \
  --name your-vault-name \
  --resource-group keyvault-demo-rg \
  --location eastus

# Grant yourself permission to manage secrets (required with RBAC — creators are not auto-assigned)
az role assignment create \
  --role "Key Vault Secrets Officer" \
  --assignee-object-id $(az ad signed-in-user show --query id -o tsv) \
  --scope $(az keyvault show \
    --name your-vault-name \
    --resource-group keyvault-demo-rg \
    --query id -o tsv)

# Add your secrets
az keyvault secret set \
  --vault-name your-vault-name \
  --name "DB-HOST" \
  --value "your-db-host.postgres.database.azure.com"

az keyvault secret set \
  --vault-name your-vault-name \
  --name "DB-PASSWORD" \
  --value "your-super-secret-password"

az keyvault secret set \
  --vault-name your-vault-name \
  --name "JWT-SECRET" \
  --value "your-jwt-signing-secret"

Verify the secrets were stored:

az keyvault secret list --vault-name your-vault-name --query "[].name" -o tsv

You should see:

DB-HOST
DB-PASSWORD
JWT-SECRET

Create the Node.js Project

Set up the project structure:

mkdir nodejs-azure-keyvault
cd nodejs-azure-keyvault
npm init -y
npm install express pg jsonwebtoken @azure/keyvault-secrets @azure/identity dotenv

The two Azure packages do all the work:

@azure/keyvault-secrets — connects to your vault and pulls secrets out
@azure/identity — handles auth. Locally, it uses your az login session, in production, it switches to Managed Identity automatically

Add a start script to package.json:

npm pkg set scripts.start="node server.js"

Create the following file structure:

nodejs-azure-keyvault/
|-- src/
|   |-- config/
|   |   `-- secrets.js   # Key Vault client and secret loader
|   |-- db/
|   |   `-- index.js     # PostgreSQL pool using secrets
|   `-- routes/
|       `-- users.js     # Example route
|-- app.js               # Express app
`-- server.js            # Entry point -- loads secrets first

Connect to Key Vault with Managed Identity

Create the secrets config file:

// src/config/secrets.js
const { SecretClient } = require('@azure/keyvault-secrets');
const { DefaultAzureCredential } = require('@azure/identity');

const VAULT_URL = `https://${process.env.KEY_VAULT_NAME}.vault.azure.net`;

const credential = new DefaultAzureCredential();
const client = new SecretClient(VAULT_URL, credential);

async function getSecret(name) {
  const secret = await client.getSecret(name);
  return secret.value;
}

module.exports = { getSecret };

DefaultAzureCredential is the most important part of this setup. It tries a chain of authentication methods in order:

Environment variables (for CI/CD pipelines)
Azure CLI credentials (for local development — az login)
Managed Identity (for deployed apps on Azure)

This means the exact same code works locally and in production with zero changes. Locally, it uses your az login session. In production, it uses the app's Managed Identity. You never touch credentials.

Cache Secrets at Startup

Calling Key Vault on every request adds latency and costs money. Load all secrets once at startup and cache them in memory. Replace src/config/secrets.js with this complete version:

// src/config/secrets.js
const { SecretClient } = require('@azure/keyvault-secrets');
const { DefaultAzureCredential } = require('@azure/identity');

const VAULT_URL = `https://${process.env.KEY_VAULT_NAME}.vault.azure.net`;

const credential = new DefaultAzureCredential();
const client = new SecretClient(VAULT_URL, credential);

// In-memory cache
const cache = {};

async function getSecret(name) {
  if (cache[name]) return cache[name];
  const secret = await client.getSecret(name);
  cache[name] = secret.value;
  return secret.value;
}

async function loadAllSecrets() {
  console.log('Loading secrets from Azure Key Vault...');
  const secretNames = ['DB-HOST', 'DB-PASSWORD', 'JWT-SECRET'];

  await Promise.all(
    secretNames.map(async (name) => {
      cache[name] = await getSecret(name);
      console.log(`  ✓ ${name} loaded`);
    })
  );

  console.log('All secrets loaded successfully.');
}

function getFromCache(name) {
  if (!cache[name]) throw new Error(`Secret "${name}" not loaded. Did loadAllSecrets() run?`);
  return cache[name];
}

module.exports = { loadAllSecrets, getFromCache };

The loadAllSecrets function runs once when the application starts. After that, all secrets are served from the in-memory cache with zero latency and zero Key Vault calls.

Use Secrets in Your Express API

Set up the database connection using the cached secrets:

// src/db/index.js
const { Pool } = require('pg');
const { getFromCache } = require('../config/secrets');

let pool;

function getPool() {
  if (!pool) {
    pool = new Pool({
      host:     getFromCache('DB-HOST'),
      database: process.env.DB_NAME || 'myapp',
      user:     process.env.DB_USER || 'dbadmin',
      password: getFromCache('DB-PASSWORD'),
      port:     parseInt(process.env.DB_PORT || '5432'),
      ssl:      process.env.NODE_ENV === 'production'
                  ? { rejectUnauthorized: false }
                  : false,
    });

    pool.on('error', (err) => {
      console.error('Unexpected database pool error:', err.message);
    });
  }

  return pool;
}

module.exports = { getPool };

Notice the distinction: DB-HOST and DB-PASSWORD come from Key Vault because they're sensitive. The database name, username, and port are not — they don't need to be protected, so they use environment variables with sensible defaults. Key Vault is for credentials, not all configuration.

The SSL flag is environment-aware: forced on in production, off locally so Docker connections work without a certificate. The rejectUnauthorized: false setting accepts Azure Database for PostgreSQL's certificate without verifying the CA chain — this is standard for Azure-managed databases. For stricter environments, you can download the Azure root CA and pass it via the ca option in the pool config instead.

Create a sample route that uses JWT verification with the secret from Key Vault:

// src/routes/users.js
const express = require('express');
const jwt     = require('jsonwebtoken');
const { getFromCache } = require('../config/secrets');
const { getPool }      = require('../db');

const router = express.Router();

// Auth middleware — JWT secret comes from Key Vault, not process.env
function authMiddleware(req, res, next) {
  const authHeader = req.headers.authorization;
  if (!authHeader?.startsWith('Bearer ')) {
    return res.status(401).json({ error: 'Missing or malformed Authorization header' });
  }

  const token = authHeader.split(' ')[1];

  try {
    req.user = jwt.verify(token, getFromCache('JWT-SECRET'));
    next();
  } catch (err) {
    return res.status(401).json({ error: 'Invalid or expired token' });
  }
}

// GET /api/users — list users (authenticated)
router.get('/', authMiddleware, async (req, res) => {
  try {
    const result = await getPool().query(
      'SELECT id, email, created_at FROM users ORDER BY created_at DESC LIMIT 20'
    );
    res.json(result.rows);
  } catch (err) {
    console.error('Database error:', err.message);
    res.status(500).json({ error: 'Internal server error' });
  }
});

// GET /api/users/:id — single user (authenticated)
router.get('/:id', authMiddleware, async (req, res) => {
  try {
    const result = await getPool().query(
      'SELECT id, email, created_at FROM users WHERE id = $1',
      [req.params.id]
    );
    if (!result.rows[0]) return res.status(404).json({ error: 'User not found' });
    res.json(result.rows[0]);
  } catch (err) {
    console.error('Database error:', err.message);
    res.status(500).json({ error: 'Internal server error' });
  }
});

module.exports = router;

Notice the error handler returns 'Internal server error' instead of err.message. Database errors are surprisingly chatty — they'll hand an attacker your table names, column names, and query structure if you let them through.

Set up the Express application. Both files define authMiddleware locally — yes, it's duplicated. In production, I'd pull this into a shared middleware file. For this tutorial, keeping it local means you can read either file without bouncing between three others:

// app.js
const express = require('express');
const jwt = require('jsonwebtoken');
const { getFromCache } = require('./src/config/secrets');
const usersRouter = require('./src/routes/users');

const app = express();
app.use(express.json());

// Auth middleware — JWT secret comes from Key Vault, not process.env
function authMiddleware(req, res, next) {
  const authHeader = req.headers.authorization;
  if (!authHeader?.startsWith('Bearer ')) {
    return res.status(401).json({ error: 'Missing or malformed Authorization header' });
  }
  const token = authHeader.split(' ')[1];
  try {
    req.user = jwt.verify(token, getFromCache('JWT-SECRET'));
    next();
  } catch (err) {
    return res.status(401).json({ error: 'Invalid or expired token' });
  }
}

// Health check — no auth required
app.get('/health', (req, res) => {
  res.json({ status: 'healthy', timestamp: new Date().toISOString() });
});

// Status endpoint — proves Key Vault integration without needing a database
app.get('/api/status', authMiddleware, (req, res) => {
  res.json({
    message: 'All secrets loaded from Azure Key Vault',
    vault: process.env.KEY_VAULT_NAME,
    secrets_loaded: ['DB-HOST', 'DB-PASSWORD', 'JWT-SECRET'],
    authenticated_as: req.user.email,
    timestamp: new Date().toISOString()
  });
});

app.use('/api/users', usersRouter);

app.use((req, res) => res.status(404).json({ error: 'Route not found' }));
app.use((err, req, res, next) => {
  console.error('Unhandled error:', err.message);
  res.status(500).json({ error: 'Internal server error' });
});

module.exports = app;

The entry point loads secrets before starting the server. The server doesn't start unless all secrets load successfully:

// server.js
require('dotenv').config();
const app = require('./app');
const { loadAllSecrets } = require('./src/config/secrets');

const PORT = process.env.PORT || 3000;

async function start() {
  try {
    await loadAllSecrets();
    app.listen(PORT, () => {
      console.log(`Server running on port ${PORT}`);
    });
  } catch (err) {
    console.error('Failed to start server:', err.message);
    console.error('Hint: Run "az login" for local development, or check Managed Identity for Azure deployments.');
    process.exit(1);
  }
}

start();

That process.exit(1) is deliberate. I'd rather the app crash loudly at startup than limp along with missing credentials and fail on the first real request two hours later.

Test Locally

Create a .env file for local development. This only contains the Key Vault name, nothing sensitive:

# .env
KEY_VAULT_NAME=your-vault-name
PORT=3000

Add .env and the deployment zip to .gitignore:

echo ".env" >> .gitignore
echo "app.zip" >> .gitignore

Make sure you're logged into Azure CLI:

az login

Start the application:

npm start

You should see:

Loading secrets from Azure Key Vault...
  ✓ JWT-SECRET loaded
  ✓ DB-PASSWORD loaded
  ✓ DB-HOST loaded
All secrets loaded successfully.
Server running on port 3000

The order secrets load may vary — Promise.all fetches them in parallel and resolves as each one completes. What matters is that all three are confirmed before the server starts.

Test the health endpoint:

curl http://localhost:3000/health
# {"status":"healthy","timestamp":"2026-07-14T19:38:11.659Z"}

Now prove the integration end-to-end. Grab the value you stored as JWT-SECRET and use it to sign a test token — paste it in for YOUR-JWT-SECRET-VALUE. Then hit /api/status with it:

node -e "const jwt = require('jsonwebtoken'); console.log(jwt.sign({id:1, email:'test@test.com'}, 'YOUR-JWT-SECRET-VALUE', {expiresIn:'1h'}));"

On Linux/macOS:

curl -H "Authorization: Bearer YOUR_TOKEN" http://localhost:3000/api/status

On Windows PowerShell:

Invoke-RestMethod -Uri "http://localhost:3000/api/status" -Headers @{Authorization = "Bearer YOUR_TOKEN"}

You should see:

{
  "message": "All secrets loaded from Azure Key Vault",
  "vault": "your-vault-name",
  "secrets_loaded": ["DB-HOST", "DB-PASSWORD", "JWT-SECRET"],
  "authenticated_as": "test@test.com",
  "timestamp": "2026-07-14T19:50:08.687Z"
}

If you got that response, the whole chain worked. The JWT was signed and verified using a secret that lived only in Key Vault — not in your code, not in your.env, not anywhere in the repo. Your az login session handled the auth locally. In production, Managed Identity takes over. Same code, nothing changes.

Test the Full Database Flow with Docker

The app reads DB-HOST and DB-PASSWORD from Key Vault, so those secrets need to match your local Docker container. Update them now:

az keyvault secret set --vault-name your-vault-name --name "DB-HOST" --value "localhost"
az keyvault secret set --vault-name your-vault-name --name "DB-PASSWORD" --value "demopassword123"

Docker up a Postgres container. The password has to match demopassword123 — that's what you just put in Key Vault:

docker run --name pg-demo \
  -e POSTGRES_USER=dbadmin \
  -e POSTGRES_PASSWORD=demopassword123 \
  -e POSTGRES_DB=myapp \
  -p 5432:5432 \
  -d postgres:15

Get the table created and throw in some test rows:

docker exec -it pg-demo psql -U dbadmin -d myapp -c \
  "CREATE TABLE IF NOT EXISTS users (id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, created_at TIMESTAMPTZ DEFAULT NOW());"

docker exec -it pg-demo psql -U dbadmin -d myapp -c \
  "INSERT INTO users (email) VALUES ('alice@example.com'), ('bob@example.com'), ('carol@example.com');"

Kill the server and bring it back up — secrets load at startup, so it needs a fresh run to pick up what you just changed in Key Vault:

npm start

Call the users endpoint with a valid JWT:

# Generate a token (use the same value you stored as JWT-SECRET in Key Vault)
node -e "const jwt = require('jsonwebtoken'); console.log(jwt.sign({id:1, email:'test@test.com'}, 'YOUR-JWT-SECRET-VALUE', {expiresIn:'1h'}));"

On Linux/macOS:

curl -H "Authorization: Bearer YOUR_TOKEN" http://localhost:3000/api/users

On Windows PowerShell:

Invoke-RestMethod -Uri "http://localhost:3000/api/users" -Headers @{Authorization = "Bearer YOUR_TOKEN"}

You should see:

[
  { "id": 1, "email": "alice@example.com", "created_at": "2026-07-14T19:59:21.064Z" },
  { "id": 2, "email": "bob@example.com",   "created_at": "2026-07-14T19:59:21.064Z" },
  { "id": 3, "email": "carol@example.com", "created_at": "2026-07-14T19:59:21.064Z" }
]

That query ran using a password that came straight from Key Vault. It's not in your .env, not hardcoded anywhere, and not in a local variable. The repo has nothing worth stealing.

Before you deploy, put the real production values back in Key Vault:

az keyvault secret set --vault-name your-vault-name --name "DB-HOST" --value "your-db-host.postgres.database.azure.com"
az keyvault secret set --vault-name your-vault-name --name "DB-PASSWORD" --value "your-super-secret-password"

If you skip this, the deployed app will try to connect to localhost and fail immediately — localhost doesn't exist on App Service.

Deploy to Azure App Service

Note: This section creates the App Service infrastructure. The actual code deployment (zip upload) happens at the end of the next section — the app must have Key Vault access configured before its first startup, or it will fail immediately and exit.

Create the App Service:

# Create an App Service Plan (B1 is the cheapest paid tier)
az appservice plan create \
  --name keyvault-demo-plan \
  --resource-group keyvault-demo-rg \
  --sku B1 \
  --is-linux

# Create the Web App
az webapp create \
  --name my-keyvault-node-app \
  --resource-group keyvault-demo-rg \
  --plan keyvault-demo-plan \
  --runtime "NODE:18-lts"

# Set app settings — KEY_VAULT_NAME tells the app which vault to use
# NODE_ENV=production enables SSL for the database connection
az webapp config appsettings set \
  --name my-keyvault-node-app \
  --resource-group keyvault-demo-rg \
  --settings KEY_VAULT_NAME=your-vault-name NODE_ENV=production

Grant Key Vault Access to the App

Enable Managed Identity on the app. This gives it an identity in Microsoft Entra ID that Key Vault can trust:

# Enable system-assigned managed identity
az webapp identity assign \
  --name my-keyvault-node-app \
  --resource-group keyvault-demo-rg

The following commands capture the principalId automatically and use it to grant the role:

# Get the principal ID
PRINCIPAL_ID=$(az webapp identity show \
  --name my-keyvault-node-app \
  --resource-group keyvault-demo-rg \
  --query principalId \
  --output tsv)

# Get the Key Vault resource ID
KV_ID=$(az keyvault show \
  --name your-vault-name \
  --resource-group keyvault-demo-rg \
  --query id \
  --output tsv)

# Grant the app the "Key Vault Secrets User" role
az role assignment create \
  --role "Key Vault Secrets User" \
  --assignee-object-id $PRINCIPAL_ID \
  --scope $KV_ID

The Key Vault Secrets User role allows the app to read secrets. It can't create, update, or delete them. This is the principle of least privilege — the application can only do what it needs to do.

Time to ship it. Linux/macOS can run this directly — Windows users, open Git Bash (it ships with Git for Windows):

zip -r app.zip . -x "node_modules/*" ".git/*" ".env" "app.zip"

Then deploy:

az webapp deployment source config-zip \
  --name my-keyvault-node-app \
  --resource-group keyvault-demo-rg \
  --src app.zip

The deployed application authenticates to Key Vault using its Managed Identity automatically. No passwords, no client secrets, no credentials of any kind in the deployment.

Check the health endpoint to confirm it's running:

curl https://my-keyvault-node-app.azurewebsites.net/health
# {"status":"healthy","timestamp":"..."}

If it won't start, pull the logs:

az webapp log tail --name my-keyvault-node-app --resource-group keyvault-demo-rg

Nine times out of ten, it's that the Key Vault role assignment has not been propagated yet. Give it 2–3 minutes, then restart:

az webapp restart --name my-keyvault-node-app --resource-group keyvault-demo-rg

Rotate Secrets Without Redeploying

One of the biggest practical benefits of Key Vault is secret rotation. When a database password needs to change, you update it in Key Vault — not in your app:

az keyvault secret set \
  --vault-name your-vault-name \
  --name "DB-PASSWORD" \
  --value "new-rotated-password"

The cache builds at startup, so you don't need a redeploy — a restart is enough:

az webapp restart \
  --name my-keyvault-node-app \
  --resource-group keyvault-demo-rg

No code change. No new deployment. The secret is rotated, and the app is using the new value in seconds.

If you need zero-downtime rotation, add a /refresh-secrets endpoint behind admin auth that clears the cache and then calls loadAllSecrets(). The order matters — loadAllSecrets() uses getSecret() which returns cached values if they exist, so you must clear the cache first, or it will reload nothing. This is optional but useful for long-running processes that can't afford a restart.

Troubleshooting

CredentialUnavailableError: DefaultAzureCredential failed to retrieve a token

You're not logged into Azure CLI. Run az login and try again. On Azure App Service, check that Managed Identity is enabled and the role assignment was created correctly.

RestError: Forbidden — The user does not have secrets get permission

The Managed Identity isn't wired up to Key Vault yet. Go back and run the az role assignment create command. If you already did, it might just need time. Azure can take 2–3 minutes to propagate role assignments, so give it a moment before you dig further.

Error: Secret "DB-PASSWORD" not loaded. Did loadAllSecrets() run?

getFromCache() ran before loadAllSecrets() finished, meaning the startup sequence is out of order. Open server.js and confirm await loadAllSecrets() comes before app.listen(). If the order's fine, the secret might just not be in the vault yet. Run az keyvault secret list --vault-name YOUR_VAULT to double-check. (A name mismatch — wrong case, typo — throws SecretNotFound instead, which is the entry below.)

App starts locally but fails on Azure App Service

Almost always, the app setting. Either KEY_VAULT_NAME isn't in App Service configuration at all, or the vault name has a typo. Run az webapp log tail to see the actual startup error — that'll tell you which one.

AuthorizationFailed when running az role assignment create

You are a guest user in your Azure tenant and lack the Owner role needed to assign roles. Switch the existing vault to the access policy model — no need to recreate it or lose your secrets:

az keyvault update \
  --name your-vault-name \
  --resource-group keyvault-demo-rg \
  --enable-rbac-authorization false

If this happened during Set Up the Key Vault (granting yourself access), run:

az keyvault set-policy \
  --name your-vault-name \
  --object-id $(az ad signed-in-user show --query id -o tsv) \
  --secret-permissions get set list delete

If this happened during Grant Key Vault Access to the App (granting the Managed Identity access), run:

az keyvault set-policy \
  --name your-vault-name \
  --object-id $PRINCIPAL_ID \
  --secret-permissions get list

Key Vault returns SecretNotFound

The secret was never added, was deleted, or its name doesn't match exactly what your code requests — Key Vault secret names are case-sensitive. A secret named db-password and a request for DB-PASSWORD are different names. Run az keyvault secret list --vault-name YOUR_VAULT and compare what's actually in the vault against what loadAllSecrets() is asking for in src/config/secrets.js. Usually, it's a casing issue or a stray hyphen.

Wrapping Up

The .env file in this project contains exactly one value: the Key Vault name. That's not sensitive. Every actual secret — database passwords, API keys, signing secrets — lives in Key Vault and never touches your codebase or your deployment pipeline.

This is the pattern I use on Azure projects now. The startup check is the part I find most useful in practice: if Key Vault is unreachable or a secret is missing, the server exits immediately with a clear error instead of starting up broken and failing on the first real request. You find out right away, rather than getting an obscure database connection error two hours later.

To add another secret, put it in Key Vault and drop its name into the secretNames array — that's it. Everything else scales with it.

The full working code is on GitHub: nodejs-azure-keyvault

freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

"Relaxation and its Role in Vision": The 1977 PhD Thesis That Helped Shape Modern AI Research

Thesis Overview

Table of Contents:

The Core Challenge: Why Visual Systems Can't Afford to Guess Too Soon

The First Appearance of Thinking as Optimization

Vision Is Inference, Not Pattern Matching

Why Perception Requires Hypotheses

From Binary Decisions to Degrees of Belief

Distributed Computation Before Neural Networks

Parallelism as the Natural Way to Compute

Constraint Propagation

Local Rules Can Produce Global Intelligence

Why Local Consistency Is Not Enough

Relaxation as a Way of Reasoning

The Importance of Equilibrium

From Symbolic Decisions to Numerical Reasoning

Why Perception Is a Search Problem

Beyond Pattern Recognition: Why Internal Representations Matter More Than the Final Output

The Importance of Intermediate and Hierarchical Representations

Schemas and Stored Knowledge

The SETTLE System

Uncertainty and Ambiguity as the Foundation of Reasoning

The Whole Picture

A Consistent Philosophy Across Five Decades

Permission to Publish

Further Reading

How to Serve a Multi-User AI Agent with FastAPI and Streamlit

Table of Contents

Background

What is FastAPI?

What is Streamlit?

What Is Multi-User Support?

Motivation and Architecture

Step 1: Install Ollama and Pull the Model

Step 2: Install Python Dependencies

Step 3: Build the Agent and API Layer with FastAPI

Step 4: Build Streamlit UI

Step 5: Run the Backend App

Step 6: Run the Frontend App

Sample Output

What to Improve Before Production

Conclusion

How to Use Apple’s Foundation Models in a Web App with a macOS Companion

Table Of Contents

What You Will Build

Prerequisites

Why a macOS Companion App?

Foundation Models Can't Read Images Directly

Project Structure

Build the React App

Check Companion Health

Convert the Image to Base64

Analyze Immediately After Upload

Send the Image to the Companion

Render the JSON Output

Build the macOS Companion App

Check Foundation Models Availability

Extract Text with Apple Vision

Ask Foundation Models to Explain the Vision Output

Return JSON to the Browser

Run the App

Conclusion

Resources

How to Optimize Enterprise Application Performance with T-SQL Query Tuning and Indexing Strategies

Table of Contents

Introduction

Prerequisites

Why Query Performance Matters in Enterprise Applications

How SQL Server Executes Queries

Step 1: Parsing

Step 2: Binding

Step 3: Query Optimization

Step 4: Execution Plan Generation

Understanding Execution Plans

Common Execution Plan Operators

Table Scan

Index Scan

Nested Loop Join

Hash Match

`EXISTS` vs. `IN`

Avoid `SELECT *`

Avoid Scalar Functions in `WHERE` Clauses